February 25, 20266 min read

Read Replica Staleness in Postgres: The Bug You Will Eventually Meet

Read replicas are eventually consistent. The application's view of "after I wrote, my read should see it" is often wrong by milliseconds, sometimes by minutes.

The first user complaint I got about read replica staleness was "I just updated my profile and it still shows the old name." The investigation took a day. The root cause: the application sent the write to the primary and the immediate read to a replica. Replication lag was 80ms. The user clicked through faster than that.

This is the bug class. Eventually consistent reads are not the same as strongly consistent reads, and most application code assumes the latter. Here is how to think about the problem.

What "replica is stale" actually means

Physical streaming replication ships WAL from primary to replica. The replica applies it. The latency between "primary commits" and "replica has applied that commit" is the lag.

For healthy replication on the same network, lag is sub-second — typically 10-100ms. Under load, 1-5 seconds is normal. During a heavy operation (large index build, big VACUUM), lag can spike to minutes.

A query routed to the replica during this window sees data that is some time behind the primary. For most queries, this does not matter. For "my own write should be visible to my own read," it matters a lot.

Measuring lag

The diagnostic on the primary:

SELECT
  application_name,
  state,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS replay_lag_bytes,
  extract(epoch FROM (now() - pg_last_xact_replay_timestamp())) AS replay_lag_seconds
FROM pg_stat_replication;

Wait, the second is on the replica. On the primary:

SELECT *
FROM pg_stat_replication;

The replay_lag column gives you a duration. Anything above 1 second on a healthy network is concerning. Anything above 30 seconds is a problem.

For replicas you control, monitoring this in your dashboards is essential.

What causes lag

The usual suspects:

Heavy WAL generation on primary. Big DDL (CREATE INDEX, REINDEX, ALTER TYPE that rewrites table), big DML (bulk inserts/updates), VACUUM with FREEZE. The replica receives the WAL fast but applies it serially.

Replica is CPU-bound. Single-threaded WAL apply on the replica cannot keep up with multi-threaded write activity on the primary. Specifically affects busy tables. Postgres 14+ supports parallel apply for some operations.

Long-running query on the replica blocks WAL apply. Postgres pauses WAL apply if a query on the replica might be reading rows the WAL is about to modify.

Network bandwidth or latency. Cross-region replicas typically have 100ms+ baseline lag from network alone.

For each cause, the fix is different. Identifying which one you have is the first step.

The application-side problem

Most application code is written assuming "my write is visible to my read." This is true on the primary, false on the replica.

The pattern that breaks:

# 1. Write to primary
db_primary.execute("UPDATE users SET name = $1 WHERE id = $2", name, uid)

# 2. Read from replica (different connection, milliseconds later)
result = db_replica.execute("SELECT name FROM users WHERE id = $1", uid)
# result.name is the OLD value because replication hasn't caught up

The user sees stale data. The bug looks intermittent — works some of the time when lag happens to be low.

Three patterns that fix it

1. Read-your-writes from primary.

For any user action that writes data and then reads back, route both to the primary. Replica is for unrelated reads only.

This is the simplest pattern and the one I recommend by default. Every read-after-write in the same session/request goes to the primary.

2. Sticky session for a window.

After a user writes, route their reads to the primary for some time window (e.g., 5 seconds). After that, replica is fine.

Application-level — set a flag in the user's session, route accordingly.

3. Wait for replication.

After the write, capture the WAL LSN. Before the next read, wait for the replica to apply up to that LSN.

# After write on primary
lsn = primary.execute("SELECT pg_current_wal_lsn()").fetchone()

# Before read on replica
replica.execute("SELECT pg_wal_replay_resume()")  # not quite right
# More accurately: poll until replay_lsn >= lsn

This is precise but adds latency. Useful when you must use the replica and freshness matters.

Most teams pick pattern 1 because it is the easiest to reason about. The other two are for specific high-scale or strict-consistency cases.

On managed services

RDS read replicas are async. There is no "strict consistency" mode. Aurora has "reader endpoints" that route to replicas; the lag is typically sub-second but not zero.

The same patterns apply: route read-after-write to the writer, accept lag for everything else.

What I usually find

When a team reports "replicas show stale data," the diagnosis is almost always:

Lag is normal (<1s).
The application is reading from the replica immediately after writing to the primary.
The user notices the staleness because they are clicking through fast.

The fix is application-side: for read-after-write paths, use the primary.

When the diagnosis is "lag is high," the cause is one of:

A specific heavy operation on the primary (the lag spike is temporary).
A long-running query on the replica blocking apply.
Replica is undersized for the WAL rate.

Each has its own fix.

What I tell teams

For any system with read replicas:

Document which queries go to primary vs replica.
Read-after-write consistently goes to primary.
Monitor replica lag, alert on >5 seconds sustained.
Test the application's behavior under simulated lag (kill the replica's apply briefly, see if anything breaks).
Have a plan for what "the replica is too lagged" looks like — fail the read, or queue, or fall back to primary.

The technology is mature. The issue is mostly that application code does not naturally distinguish "my write" from "any read." Once that distinction is explicit, the bugs go away.

← Back to all articles