8 min read

CITEXT in Postgres: Case-Insensitive Text Without the LOWER Trap

CITEXT is a good fit when a column is almost always compared case-insensitively. It is not a universal Unicode answer, and it is not free.

The bug usually arrives as a duplicate account. One user signs up as Ada@example.com. Another path normalizes to ada@example.com. A support engineer searches for one and sees the other. The application has "email lowercasing" in four places, but one import job missed it.

This is the problem citext can solve well: a column whose business meaning is case-insensitive most of the time.

It does not make text handling simple forever. It moves one specific rule into the database, which is exactly where uniqueness rules should live.

The framework: decide what equality means

Before choosing citext, ask the product question: should these two values be considered the same?

  • Email login identifiers: often yes.
  • Usernames: maybe, depending on product policy.
  • Display names: usually no.
  • Search text across multiple languages: probably needs a more deliberate collation/search design.

If the answer is "this column is case-insensitive by definition," citext is worth considering. If the answer changes by query, use text and explicit indexes.

The LOWER pattern works until it leaks

SELECT *
FROM users
WHERE lower(email) = lower($1);

This works, but it has two operational traps. First, every query must remember the rule. Second, it needs an expression index to be fast.

CREATE UNIQUE INDEX users_email_lower_key
ON users (lower(email));

That can be a perfectly good design. The problem is inconsistency. If one code path uses email = $1 instead of lower(email) = lower($1), the database will do exactly what you asked, not what the product intended.

CITEXT makes the column carry the rule

CREATE EXTENSION IF NOT EXISTS citext;

ALTER TABLE users
  ALTER COLUMN email TYPE citext;

ALTER TABLE users
  ADD CONSTRAINT users_email_key UNIQUE (email);

Now equality, uniqueness, and ordinary B-tree lookup use case-insensitive behavior for that column. The schema says what the product means.

Migration needs a duplicate check first

Do not change a production email column to citext before proving there are no case-folded duplicates.

SELECT lower(email) AS normalized_email, count(*), array_agg(id ORDER BY id)
FROM users
GROUP BY lower(email)
HAVING count(*) > 1;

If this returns rows, the migration is a product cleanup, not a DDL task. Decide which account wins, how merges work, and what support needs to know.

The limitations matter

The official PostgreSQL docs are explicit: citext internally behaves much like applying lower for comparisons. Its behavior depends on the database locale, and it does not handle every Unicode case-folding nuance.

For modern multilingual requirements, PostgreSQL docs suggest considering nondeterministic collations instead of citext. That is especially relevant if you need accent-insensitive comparison, language-specific behavior, or better Unicode semantics.

There is also a performance tradeoff. citext comparisons do extra work compared with plain text. Usually that is fine for identifiers. It is not something I would casually apply to wide text columns or hot analytical paths.

When I still prefer text plus an index

I keep text when the column needs both case-sensitive and case-insensitive behavior in hot paths. You can support both with different indexes, but then you should make that choice explicit.

CREATE INDEX users_email_exact_idx ON users (email);
CREATE INDEX users_email_lower_idx ON users (lower(email));

This is more verbose, but it avoids surprising future queries that expect exact string behavior.

What I check in production

  • Is the equality rule truly case-insensitive for this column?
  • Are there existing duplicates under lower(column)?
  • Does the database locale match the product's language assumptions?
  • Do any queries still need exact case-sensitive comparison?
  • Will the extension be allowed in every environment, including managed Postgres?

The pragmatic default

Use citext for identifiers where case-insensitive equality is the rule, not a query preference. Use text plus explicit indexes when the rule varies. Use nondeterministic collations when Unicode and accent behavior are central to the product.

The mistake is not choosing citext. The mistake is pretending case-insensitive equality is an application convention instead of a database invariant.