7 min read

Logical Replication in Postgres: When It Is the Right Tool and When It Isn't

Logical replication is more flexible than physical and more fragile. Use it when you need partial replication, cross-version, or selective sync. Don't use it for HA.

Logical replication is one of those Postgres features that gets pitched as the answer to many things. Sometimes it is. Often it is the wrong tool, and the team finds out three months in when the replication slot has fallen behind by 200GB.

Here is the framing I use.

What it actually is

Logical replication ships SQL-level changes (INSERT, UPDATE, DELETE) from a publisher to a subscriber. The subscriber applies them as if it had received the SQL itself. Compared to physical replication (streaming WAL), logical:

  • Replicates per-table, not per-cluster.
  • Works across Postgres major versions.
  • Allows DDL and writes on the subscriber (which is dangerous, but possible).
  • Does not preserve transaction boundaries from the publisher in all cases.
  • Does not replicate sequences, large objects, or some system tables.

The last point catches people. Sequences are not replicated; you have to handle them yourself during failover or reset.

Setting it up

-- On the publisher
CREATE PUBLICATION my_pub FOR TABLE orders, customers, order_items;

-- On the subscriber
CREATE SUBSCRIPTION my_sub
  CONNECTION 'host=publisher dbname=monpg user=replicator password=...'
  PUBLICATION my_pub;

The subscriber takes a snapshot of the publisher's data, replicates it, then streams changes ongoing.

The publisher's wal_level must be logical. The subscriber needs the schema to exist (with appropriate columns and types). Replication will fail if a column is missing on either side.

When logical replication is right

1. Cross-version migration. This is the most common legitimate use. Replicate from Postgres 14 to Postgres 16, then cut over with minimal downtime. Physical replication cannot do this; logical can.

2. Partial replication. "I only need orders and customers replicated to the analytics database." Physical replicates everything; logical lets you pick.

3. Per-table sync to a different schema. Reshape data on the way over, transform during apply, send to a different table name. Doable but operationally hairy.

4. Multi-publisher topologies. Logical can have multiple publishers feeding one subscriber. Physical cannot.

When logical replication is wrong

1. High availability. Logical replication has higher latency, more failure modes, and worse operational ergonomics for failover than physical streaming. For HA replicas, use streaming replication.

2. Large initial data sync. The initial copy on subscription start uses standard SQL COPY over the network. For TB-scale tables, this can take many hours, during which the publisher is generating WAL that the subscriber will need to catch up on. The catch-up sometimes never happens.

For large initial syncs, the right pattern is: take a base backup of the publisher, restore it on the subscriber to provide the initial state, then attach logical replication to catch up.

3. Heavy write workloads. Logical replication serializes through a single replication worker by default. On a hot table with very high write throughput, the worker becomes a bottleneck. Postgres 14+ supports streaming + parallel apply, but it is not a free lunch.

4. Anything that depends on transaction boundaries across multiple tables. Logical replication groups changes by transaction, but the apply order on the subscriber is not always the original order. For workloads where ordering matters, this breaks invariants.

The failure modes

Things I have seen go wrong:

Subscriber falls behind, slot grows huge. The publisher's pg_replication_slots shows the slot is active but the lag in bytes is enormous. WAL on the publisher cannot be recycled, disk fills.

SELECT slot_name, active,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag
FROM pg_replication_slots;

Fix: investigate why the subscriber cannot keep up. Usually it is a single hot table that the apply worker can not process in real time.

Schema drift. A column added on the publisher but not the subscriber → replication errors out. Logical replication does not replicate DDL; you have to apply schema changes manually on both sides.

The reliable pattern: apply DDL on the subscriber first, then on the publisher. That way, when the data change for a new column arrives, the subscriber already has the column.

Replication slot left behind. The subscriber goes offline. The publisher's slot retains WAL forever. WAL accumulates, disk fills. Same failure mode as the abandoned-streaming-replica scenario.

Monitor every slot. Drop slots whose subscribers are gone.

Conflicting writes on the subscriber. Logical replication does not prevent writes on the subscriber. If the application writes there too, conflicts (primary key violations, FK violations) cause replication to halt.

The right discipline is to make the subscriber read-only for the application except for the publisher's writes. ALTER USER app_user CONNECTION LIMIT 0 or use a separate role.

Migration cutover with logical replication

The playbook for using logical replication for cross-version migration:

  1. Set up logical replication from old (publisher) to new (subscriber).
  2. Wait for initial copy + catch-up. Monitor pg_replication_slots lag until near-zero.
  3. Apply any DDL schema changes on both sides as needed during catch-up.
  4. Cutover window:
    • Stop application writes to the old.
    • Wait for replication to fully drain (lag = 0).
    • Reset sequences on the new to where the old left off (very important).
    • Point the application at the new.
    • Drop the subscription.
  5. Decommission the old after a confidence window.

The step that surprises people: sequences. Logical replication does not move them. After cutover, your id sequence on the new database is at 1 unless you reset it:

SELECT setval('orders_id_seq', (SELECT max(id) FROM orders));

Forget this and you will have primary key collisions on the next insert. I have seen this break production cutovers more than once.

What I recommend for HA

For failover-grade HA, use physical streaming replication. It is simpler, more robust, lower latency, and the failover tooling (Patroni, repmgr, pg_auto_failover) is built around it.

Logical replication is a specialized tool for specific use cases. Once you accept that, it is great at what it does. Trying to use it as a general-purpose replication strategy is the path to operational pain.