The bug looked like a search problem. Users could search for a normalized customer reference in one screen but not another. Both screens used the same table. They did not use the same normalization logic.
One path trimmed whitespace. Another lowercased. A background import stripped punctuation. The database stored the original value, and every caller tried to recreate the derived value on its own. The fix was not another helper function. The fix was making the derived value a schema rule.
The framework: generated column or application code?
I reach for a generated column when the derived value is:
- Deterministic from columns in the same row.
- Needed by more than one application path.
- Useful for filtering, joining, uniqueness, or indexing.
- Part of the data model, not just display formatting.
If the value depends on time, another table, a network lookup, user locale, or product configuration, it does not belong in a generated column. Generated columns are strongest when they remove repeated glue and make a data invariant visible to PostgreSQL.
A practical example
For account references, the application may display the original value but search by a normalized form:
CREATE TABLE customers (
id bigserial PRIMARY KEY,
account_id bigint NOT NULL,
external_ref text NOT NULL,
external_ref_key text GENERATED ALWAYS AS (
lower(regexp_replace(external_ref, '[^a-zA-Z0-9]+', '', 'g'))
) STORED
);
Now every insert and update computes the same key. The query no longer needs to remember the transformation:
CREATE UNIQUE INDEX CONCURRENTLY customers_account_ref_key
ON customers (account_id, external_ref_key);
That index expresses the product rule: within an account, two references that normalize to the same key are the same reference.
Stored versus virtual depends on your Postgres version
Current PostgreSQL documentation describes stored and virtual generated columns. A stored generated column is computed on write and occupies storage. A virtual generated column is computed on read. Many production fleets still run older major versions where generated columns are stored only, so check the target version before designing around virtual columns.
For performance-sensitive filters and uniqueness, stored columns are the common production choice because the value can be indexed and paid for during writes instead of recomputed in every read path.
The immutability rule matters
Generation expressions are intentionally constrained. They must be deterministic and cannot casually reach outside the current row. That limitation is a feature. It keeps the generated value stable enough for storage, indexing, replication, and query planning.
If you are fighting the immutability rule, the value probably belongs somewhere else: a trigger-maintained column, a summary table, or application code with explicit lifecycle rules.
Generated columns can make indexes more honest
An expression index can also solve many problems:
CREATE INDEX CONCURRENTLY customers_ref_expr_idx
ON customers (account_id, lower(external_ref));
That is fine when the expression is used in one place. A generated column becomes more attractive when the derived value has a name, appears in multiple queries, needs uniqueness, or helps analysts understand the schema. The column documents the invariant. The index enforces or accelerates it.
Where they bite
Generated columns move work to writes. That is usually acceptable for low-cost string normalization or simple arithmetic. It is not free on high-ingest tables. Before adding one to a hot table, estimate the write cost and test the backfill path.
Adding a stored generated column to a large table can require rewriting or at least touching a lot of data depending on version and operation details. Treat it like a production migration, not a harmless schema decoration.
The migration pattern
For a large table, I prefer a staged rollout:
- Add the generated column in a maintenance window if the table is large enough to make DDL risky.
- Build indexes concurrently where possible.
- Ship read paths that use the generated value.
- Compare old and new query results during a shadow period.
- Only then remove duplicated application normalization.
The dangerous migration is the one where the application and database compute "the same" value differently for a week.
What I monitor after rollout
SELECT
schemaname,
relname,
n_tup_ins,
n_tup_upd,
n_tup_hot_upd
FROM pg_stat_user_tables
WHERE relname = 'customers';
If update volume is high, watch whether the generated column and its indexes increase write latency or reduce HOT updates. Also watch index size. A generated value that exists mostly to support one rare report may not be worth carrying on every row.
When I do not use one
- The value is presentation-only.
- The expression depends on mutable business settings.
- The table is write-heavy and the read benefit is marginal.
- A simple expression index solves the problem without adding schema surface.
Generated columns are not about saving a few lines of code. They are about moving a durable data rule to the place that can enforce it consistently.