By dbivioJune 6, 20269 min read

PostgreSQL LISTEN/NOTIFY at Scale: The Queue You Didn't Know Had Limits

LISTEN/NOTIFY is a lovely way to push events without extra infrastructure, until the global notify queue fills, a slow listener blocks commits, and you learn it was never built to be your message bus.

We used LISTEN/NOTIFY to invalidate caches. A row changes, a trigger fires NOTIFY cache_invalidate, app servers listening on that channel drop the relevant entry. Elegant, no Redis, no Kafka, ships in an afternoon.

It ran beautifully for a year. Then one morning commits started hanging. Not queries, commits. Sessions that did nothing but write a single row were stuck. The database was healthy by every CPU and IO metric we had.

The culprit was a listener that had wedged. It was still connected, still subscribed, but not consuming. And in PostgreSQL, a stuck listener can reach back and stall the writers. That is the part of LISTEN/NOTIFY nobody reads about until it bites.

How the mechanism actually works

NOTIFY is transactional. When you call it, the notification is not sent immediately. It is queued and delivered only when the transaction commits. If the transaction rolls back, no notification goes out. This is genuinely useful: you never get told about a change that did not happen.

Notifications go into a single, cluster-wide queue stored on disk under pg_notify. Every backend that has issued LISTEN reads from its own position in that queue. The queue is shared by all databases in the cluster, not per-database.

The 8GB wall

That shared queue has a hard size limit, historically 8GB. The queue can only be truncated up to the position of the slowest listener. So if one listener stops consuming, the queue cannot be cleaned up, and it grows.

As it fills toward the limit, PostgreSQL starts slowing down NOTIFY-issuing transactions to apply back-pressure, and eventually a NOTIFY (and therefore the COMMIT that carries it) will block or error. One slow consumer turns into stalled producers across the entire cluster.

-- How full is the notify queue? (fraction of the max)
SELECT pg_notification_queue_usage();
-- 0.0 = empty, approaching 1.0 = approaching the hard limit

-- Who is listening, and is anyone falling behind?
SELECT pid, application_name, state, wait_event,
       now() - query_start AS since_last
FROM pg_stat_activity
WHERE pid IN (SELECT pid FROM pg_listening_channels_pids());

Slow listeners are the real risk

The failure mode is almost never message volume on its own. It is a listener that connects, subscribes, and then blocks on something else, a deadlocked thread, an exhausted connection pool, a debugger paused on a breakpoint in staging that someone forgot about.

Because the queue cannot advance past the slowest reader, a single sleepy listener accumulates everyone's notifications. Your monitoring has to watch pg_notification_queue_usage() and alert well before it reaches dangerous levels, not after.

Keep payloads tiny and idempotent

A NOTIFY payload is capped at 8000 bytes, but you should treat the real budget as far smaller. Do not ship the changed row. Ship an identifier and let the listener fetch the current state. This keeps the queue small and avoids stale-data bugs where the payload disagrees with the table.

Also collapse duplicates. PostgreSQL deduplicates identical notifications within a single transaction, but across transactions you can still get a storm. Design listeners to be idempotent: receiving the same "invalidate key 42" twice should be harmless.

-- Trigger that notifies with a minimal payload
CREATE OR REPLACE FUNCTION notify_change() RETURNS trigger AS $$
BEGIN
  PERFORM pg_notify('cache_invalidate', NEW.id::text);
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER products_notify
AFTER INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION notify_change();

Mind the transactional timing

Because delivery is tied to commit, listeners can observe a notification before they can see the committed data on a read replica, or before a long sibling transaction has finished its own work. If a listener immediately queries the primary by id, it is fine. If it queries a lagging replica, it may not find the row yet.

This is a classic source of "the event fired but the data wasn't there" bugs. The fix is to read from the primary on notification, or to tolerate a not-found and retry.

When to stop using it as a bus

LISTEN/NOTIFY is excellent for low-to-moderate volume, fire-and-forget signaling where occasional loss on a restart is acceptable. It is not durable: a listener that is disconnected during a NOTIFY simply misses it, because there is no per-consumer offset that survives a reconnect.

Once you need durability, replay, consumer groups, or sustained high throughput, you have outgrown it. At that point a real broker, or a durable queue table polled with SELECT ... FOR UPDATE SKIP LOCKED, is the right tool. Knowing where that line is keeps a convenient feature from becoming an incident.

Use it for: cache invalidation, light real-time fan-out, waking a worker.
Avoid it for: anything that must not be lost across a reconnect.
Always monitor: pg_notification_queue_usage() with an early alert.
Always design: tiny payloads, idempotent handlers, read-from-primary on receipt.

The practical standard

The best PostgreSQL performance work is boring in the right way. Name the failure mode, capture the before plan or metric, make one change, and compare the exact same signal afterward. Anything else is just a more confident guess.

← Back to all articles