February 1, 20266 min read

JSONB vs Relational Columns: The Tradeoff Most Schemas Get Wrong

JSONB is great for things that genuinely vary per row. It is wrong for things that are stable. Most teams use it for both, and pay for it in queries that should have been simple.

JSONB is the best of both worlds — schemaless flexibility with relational performance. That is the pitch. The reality, when teams reach for it, is more nuanced. Used correctly, JSONB solves real problems. Used as a default, it produces schemas that are slow to query and hard to evolve.

Here is the framing I use.

When JSONB is the right call

1. The shape genuinely varies per row. Different categories of products with different attributes. User-defined custom fields. Webhook payloads from third-party services that you do not control.

CREATE TABLE products (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  category_id BIGINT NOT NULL REFERENCES categories(id),
  name TEXT NOT NULL,
  attributes JSONB NOT NULL DEFAULT '{}'  -- per-category flexibility
);

A book has isbn and pages. A shirt has size and color. The shared columns are typed; the variable parts are in JSONB. This is a legitimate use.

2. The shape is genuinely unknown. Logging an external system's webhook before parsing it. Auditing a user's input verbatim. JSONB lets you store now and decide later.

3. The data is read in bulk and rarely queried by individual fields. "Show me the full event payload" is a JSONB-friendly access pattern. "Filter events where payload.user_id = X" is borderline.

When JSONB is wrong

1. The fields are stable across all rows. If every row has the same set of keys, those keys want to be columns. JSONB just makes the queries harder and the planner less informed.

-- WRONG: stable fields stuffed into JSONB
CREATE TABLE events (
  id BIGINT,
  data JSONB  -- {created_at, user_id, event_type, ...}
);

-- RIGHT: typed columns
CREATE TABLE events (
  id BIGINT,
  created_at TIMESTAMPTZ NOT NULL,
  user_id BIGINT NOT NULL,
  event_type TEXT NOT NULL,
  metadata JSONB  -- only for the actually-variable parts
);

The second form is faster to query, easier to index, and produces better plans.

2. The fields are queried by exact value. Equality lookups on JSONB paths require expression indexes that match the exact cast. A typed column on user_id BIGINT is simpler.

3. The fields appear in foreign keys. JSONB cannot have foreign keys. If payload.user_id should reference users(id), that field needs to be a real column.

The hybrid pattern

For most events/logs/audit-like tables, the right structure is:

CREATE TABLE events (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  -- Stable, queried fields as typed columns
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  user_id BIGINT NOT NULL REFERENCES users(id),
  event_type TEXT NOT NULL,
  -- Variable, rarely-queried fields in JSONB
  payload JSONB NOT NULL DEFAULT '{}'
);

CREATE INDEX events_user_created ON events (user_id, created_at DESC);
CREATE INDEX events_payload_gin ON events USING GIN (payload jsonb_path_ops);

The queries that hit hot paths use the typed columns and indexes. The queries that need flexibility hit the JSONB. The schema makes both fast.

Most "JSONB-everywhere" tables can be refactored to this shape. The result is usually a much faster query layer.

Migrating from JSONB to columns

If you have a JSONB column with fields that are now stable, the migration:

Add a new typed column (ALTER TABLE events ADD COLUMN user_id BIGINT).
Backfill from JSONB: UPDATE events SET user_id = (payload->>'user_id')::BIGINT WHERE user_id IS NULL.
Make application write both columns.
Verify the columns are consistent.
Make the typed column NOT NULL if appropriate.
Drop the JSONB key (or leave it if it was just a copy).

For large tables, the backfill needs to be batched. A multi-week project but doable.

A slightly cleaner shape: use a generated column to extract from JSONB:

ALTER TABLE events
  ADD COLUMN user_id BIGINT
  GENERATED ALWAYS AS ((payload->>'user_id')::BIGINT) STORED;

This avoids the migration entirely — Postgres maintains the column from JSONB automatically. The trade-off is a small write cost on every insert/update.

Indexing JSONB efficiently

If you do keep JSONB and need to query it:

For containment (@>) queries: use jsonb_path_ops GIN. Smaller and faster than the default GIN.
For specific path lookups: expression indexes on the exact path.
For queries that mix both: combine — expression indexes for the hot paths, GIN for the rest.

See the dedicated JSONB indexing post for details.

What I see most often

The pattern that gets to me on code reviews:

Table named something like events or audit_log or webhooks.
One column: data JSONB.
Every query fishes fields out of data with data->>'key'.
Indexes are full GIN, expensive on writes.
Queries are slower than they should be because every WHERE has to extract from JSONB.

The fix is usually to promote the 5-10 most-queried fields to typed columns. The team's velocity improves because queries are easier to write. The database's velocity improves because the planner can reason about typed columns.

A pragmatic decision

For any new schema, the question I ask:

What fields will every row have?
What fields are queried by indexed predicate?
What fields are stable across rows?

Fields that are present in every row, queried often, and stable → typed columns. Fields that vary per row, are rarely queried by content, or are unknown shape → JSONB. The intersection is rare; usually the answer is clear.

The schemas that age best are the ones where this decision was made deliberately. The schemas that age worst are the ones where everything went into a single data JSONB.

← Back to all articles