Read replicas sound like a clean scaling story. Send writes to primary, send reads to replicas, reduce pressure on the main database. It works well until a user saves a change and the next page says it never happened.
Replica lag is not just an infrastructure metric. It is a product consistency boundary. If the application routes a read to a replica before that replica has replayed the relevant WAL, the user sees the past.
The fix is not to avoid replicas. The fix is to decide which reads can be stale, which reads must be fresh, and how the application knows the difference.
Measure lag in bytes and time
Byte lag tells you how much WAL the replica must still process. Time lag tells you how stale user-visible reads may be. Both matter.
SELECT
application_name,
state,
sync_state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_bytes_lag,
write_lag,
flush_lag,
replay_lag
FROM pg_stat_replication;
Lag has different causes
- The primary generates WAL faster than the network can ship it.
- The replica writes WAL slowly because storage is saturated.
- The replica replays slowly because long queries or conflicts get in the way.
- A bulk job creates a temporary WAL spike.
- A replication slot retains WAL because a consumer is down.
Read routing needs consistency rules
Dashboards, analytics, and search pages can often tolerate replica reads. Immediately-after-write flows usually cannot. For those paths, route to primary for a short read-your-write window or carry a consistency token that tells the app whether the replica has caught up.
The runbook
- Check primary WAL generation rate.
- Check network, write, flush, and replay lag separately.
- Find long queries on replicas.
- Check hot standby conflicts.
- Pause or slow bulk jobs if lag is growing.
- Fail reads back to primary only with a clear capacity limit.
The practical standard
The best PostgreSQL performance work is boring in the right way. Name the failure mode, capture the before plan or metric, make one change, and compare the exact same signal afterward. Anything else is just a more confident guess.