The first time I used a Postgres ENUM, it felt like the right call. Type safety at the database level. Tab-completion in psql. Compact storage. What is not to like.
Three years later I have run the migration to remove it from at least four codebases. The pattern is so consistent that I now treat "start with an ENUM" as a smell unless I have specific reasons.
Here is the rule I use, and the reasoning behind it.
Use an ENUM only for stable internal values
ENUMs are good when:
- The values represent an internal state machine the product team does not negotiate.
'pending','shipped','cancelled'for an order, where the application code branches on each value with explicit logic. - The set of values changes maybe once a year, after a real engineering review.
- Nobody outside engineering needs to touch them.
If those three things are true, the ENUM is fine. Storage is one or two bytes per row. Comparisons are cheap. The grammar is pleasant.
CREATE TYPE order_status AS ENUM ('pending', 'paid', 'shipped', 'delivered', 'cancelled');
CREATE TABLE orders (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
status order_status NOT NULL DEFAULT 'pending'
);
This is the form I will defend.
Use a lookup table for everything product-facing
The moment any of these things appear in a conversation about an ENUM-shaped column, the ENUM is the wrong choice:
- "Can we add a label that shows in the UI?" — ENUMs do not store labels.
- "Can we mark some of these as deprecated but not delete them?" — ENUMs do not have a soft-delete bit.
- "Can we sort them in a custom order?" — ENUMs sort by definition order, which is fine until product wants alphabetical.
- "Can we add translations?" — every ENUM value would need code-side translation tables, defeating the schema-level type safety.
- "Can business stakeholders add new values without an engineer?" — ENUM modifications are DDL, which is engineer-territory.
A lookup table answers all of the above by being a normal table with normal columns:
CREATE TABLE order_statuses (
id SMALLINT PRIMARY KEY,
code TEXT NOT NULL UNIQUE,
label TEXT NOT NULL,
display_order SMALLINT NOT NULL,
is_deprecated BOOLEAN NOT NULL DEFAULT false
);
CREATE TABLE orders (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
status_id SMALLINT NOT NULL REFERENCES order_statuses(id)
);
Now product can add a row, the UI reads label, sort uses display_order, deprecation uses is_deprecated. The application keeps using code for branching. Type safety is preserved by the foreign key.
The migration nobody enjoys
Going from ENUM to lookup table is doable but unpleasant, especially in a hot table. The shape I use:
-- 1. Create the lookup table and seed it from the ENUM
CREATE TABLE order_statuses (
id SMALLINT PRIMARY KEY,
code TEXT NOT NULL UNIQUE,
label TEXT NOT NULL,
display_order SMALLINT NOT NULL,
is_deprecated BOOLEAN NOT NULL DEFAULT false
);
INSERT INTO order_statuses (id, code, label, display_order)
VALUES
(1, 'pending', 'Pending', 10),
(2, 'paid', 'Paid', 20),
(3, 'shipped', 'Shipped', 30),
(4, 'delivered', 'Delivered', 40),
(5, 'cancelled', 'Cancelled', 50);
-- 2. Add the new column
ALTER TABLE orders ADD COLUMN status_id SMALLINT;
-- 3. Backfill in batches
UPDATE orders
SET status_id = order_statuses.id
FROM order_statuses
WHERE order_statuses.code = orders.status::text
AND orders.status_id IS NULL
AND orders.id BETWEEN $start AND $end;
Do the backfill in chunks of 50,000–500,000 rows depending on table size. Then deploy the application change to start writing both columns. Verify writes are consistent. Then make status_id NOT NULL, add the FK constraint, and finally drop the old column.
The pain is real. Plan for it being a multi-week project on a busy table.
A word on CHECK constraints
Some teams reach for a CHECK (status IN ('pending', 'paid', ...)) instead of an ENUM. This is slightly more flexible — adding a value is ALTER TABLE plus updating the CHECK. It still has all the lookup-table problems (no labels, no soft-delete, no display order). I treat it the same as ENUM: fine for stable internal values, wrong for product-facing values.
What I do today
For anything that might appear in a UI dropdown, in a customer-facing API response, or in a sales conversation, I start with a lookup table. For internal state machines I do not expect product to touch, I start with an ENUM. The boundary is whether the value lives in a customer's vocabulary or in an engineer's.
When I get this wrong, the fix is the migration above. It is bounded work, but it is real work. Picking right the first time is genuinely cheaper than picking ENUM and migrating later.