The schemas that age worst are the ones with permissive columns. Everything NULL-able. No CHECK constraints. Foreign keys missing because "we'll enforce it in the app." By year three, the data has drifted. There are rows that violate every assumption the application makes. The team is now scared to fix them.
The schemas that age best are the opposite: aggressive constraints, declared at table creation, and trusted as the source of truth. The application checks fewer things because the database has already checked them.
Here is the set I add by default, and the reasoning.
NOT NULL is the default
A column should be NULL-able only if NULL has a specific meaning that the application uses. "User has not yet provided a phone number" is a meaningful NULL. "We forgot to set this when we wrote the migration" is not.
Leaving columns nullable when they should not be makes every read site write defensive code: "if x is not null, then..." That is application complexity that the database could prevent.
CREATE TABLE orders (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id),
status TEXT NOT NULL,
total_cents INTEGER NOT NULL CHECK (total_cents >= 0),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Every column is NOT NULL because every column has meaning at insert time. The schema is forcing correctness.
CHECK is for invariants the application keeps forgetting
CHECK constraints encode rules that should always be true. Common candidates:
- Numerical ranges:
CHECK (price_cents >= 0),CHECK (rating BETWEEN 1 AND 5). - Set membership when not using ENUM/lookup:
CHECK (status IN ('pending','paid','shipped','delivered','cancelled')). - Cross-column relationships:
CHECK (start_date <= end_date),CHECK (NOT (is_archived AND is_published)). - Format rules:
CHECK (email LIKE '%@%'). Crude but better than nothing.
The trade: a CHECK that fails causes the INSERT/UPDATE to fail with an error. The application has to handle this. That is a feature, not a bug. The alternative is corrupt data nobody notices.
FOREIGN KEY is non-negotiable for referential data
If orders.customer_id is supposed to point to customers.id, declare it. Always.
CREATE TABLE orders (
...
customer_id BIGINT NOT NULL REFERENCES customers(id) ON DELETE RESTRICT
);
The ON DELETE clause is worth thinking about:
RESTRICT(the default) — prevents deleting a customer who has orders. Almost always what you want.CASCADE— deletes the orders too. Use rarely; this hides data loss.SET NULL— setscustomer_idto NULL. Only if the column is nullable, and only if NULL is a meaningful state.NO ACTION— same as RESTRICT for most purposes; deferred check timing differs slightly.
"We will enforce this in the application" is a position I have heard many times and seen succeed exactly never. Every codebase has multiple paths that write data. Enforcing a relationship requires every one of them to be correct, forever, including in maintenance scripts and emergency hotfixes. The database does it once.
UNIQUE for, well, uniqueness
Covered in detail in the constraint-vs-index post. Short version: declare uniqueness in the schema, not in the application. The application's deduplication logic is wrong in subtle ways under concurrency. The database's UNIQUE constraint is not.
Special case: partial uniqueness:
CREATE UNIQUE INDEX users_email_active_unique
ON users (email)
WHERE deleted_at IS NULL;
Useful for soft-delete patterns where you want to allow re-registration with the same email after the original account is deleted.
EXCLUDE for the rules CHECK cannot express
Less-known but powerful: EXCLUDE constraints prevent two rows from satisfying a relationship. The classic example is no-overlap on time ranges:
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE TABLE bookings (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
resource_id BIGINT NOT NULL,
during TSTZRANGE NOT NULL,
EXCLUDE USING gist (resource_id WITH =, during WITH &&)
);
No two bookings of the same resource can have overlapping time ranges. The database enforces it.
I have replaced application-level overlap detection with this constraint multiple times. Every time, the application code that was preventing overlaps had a race condition the EXCLUDE constraint did not.
When to defer constraint checks
Most constraints check immediately on each statement. Sometimes you need a transaction-level check — "these rows must be consistent at COMMIT but can violate the rule mid-transaction." Postgres supports DEFERRABLE INITIALLY DEFERRED:
ALTER TABLE orders
ADD CONSTRAINT orders_customer_fk
FOREIGN KEY (customer_id) REFERENCES customers(id)
DEFERRABLE INITIALLY DEFERRED;
Now the FK check happens at COMMIT, not on every INSERT/UPDATE. Useful for bulk-loading data with circular references, or for migrations that temporarily violate constraints.
Use rarely. The default (immediate check) is what you want most of the time.
How I add constraints to existing data
Adding a constraint to a populated table can fail if existing data violates it. The cleanest approach:
-- Add the constraint as NOT VALID (skips checking existing rows)
ALTER TABLE orders
ADD CONSTRAINT orders_total_positive
CHECK (total_cents >= 0) NOT VALID;
-- Find and fix existing violations
SELECT * FROM orders WHERE total_cents < 0;
-- ... clean up the data ...
-- Validate the constraint (full table scan, but no exclusive lock)
ALTER TABLE orders VALIDATE CONSTRAINT orders_total_positive;
NOT VALID lets the constraint apply to new writes immediately while you clean up old data. VALIDATE CONSTRAINT re-checks all rows; if it succeeds, the constraint is fully active.
This is the right shape for a multi-week constraint addition on a large hot table.
What constraints do not replace
Constraints are checks at write time. They do not protect you from:
- Bad data that arrived before the constraint existed (use the NOT VALID + VALIDATE pattern).
- Logic that requires reading other rows to validate. The check has to be self-contained per row, mostly.
- Application bugs that bypass the database (raw SQL with constraints disabled).
They are defensive, not exhaustive. But every constraint that catches a bug at INSERT time is a bug the application did not have to find at read time, when the cleanup is much harder.
A pragmatic baseline
For any new table, my default is:
- Every column is
NOT NULLunlessNULLhas a specific meaning. - Every relational column has a
REFERENCESdeclaration withON DELETEset deliberately. - Numeric ranges have
CHECKconstraints. - Uniqueness is declared, not enforced in the app.
- Time ranges that should not overlap use
EXCLUDE.
These add maybe 5 minutes to writing a migration. They save weeks of cleanup later. The cost-benefit is not close.