Locks and Transactions10 min read

PostgreSQL Subtransactions: The SAVEPOINT Trap That Tanks Throughput

Subtransactions are cheap until you cross 64 of them in one transaction or pile up enough of them under load. Then SubtransSLRU contention shows up as global slowness nobody can explain.

This is the most expensive thing I have ever debugged that produced a perfectly normal-looking EXPLAIN. A reporting database would run fine for hours, then throughput would collapse, every backend stuck on the same wait event, and the only way out was killing long-running sessions.

The wait event was SubtransSLRU. The cause was a batch job that wrapped thousands of statements in a single transaction, each one inside a PL/pgSQL BEGIN ... EXCEPTION block. Every one of those blocks quietly opened a subtransaction.

Subtransactions are one of the sharpest hidden edges in PostgreSQL. They are invisible in your code, free at small scale, and brutal past a threshold most people have never heard of.

Where subtransactions come from

You rarely type SAVEPOINT directly. The common sources are implicit. A PL/pgSQL block with an EXCEPTION clause starts a subtransaction so it can roll back to that point on error. Many ORMs and drivers wrap individual statements in savepoints to emulate per-statement rollback inside a larger transaction.

So a function that loops over rows and does a BEGIN ... EXCEPTION WHEN others THEN ... END per row is opening one subtransaction per iteration. Ten thousand rows, ten thousand subtransactions, all under one top-level transaction.

-- This innocent-looking pattern opens a subtransaction per row
CREATE OR REPLACE FUNCTION import_rows() RETURNS void AS $$
DECLARE r record;
BEGIN
  FOR r IN SELECT * FROM staging LOOP
    BEGIN
      INSERT INTO target VALUES (r.id, r.payload);
    EXCEPTION WHEN unique_violation THEN
      -- swallow and continue
    END;          -- <-- each iteration = one subtransaction
  END LOOP;
END;
$$ LANGUAGE plpgsql;

The magic number is 64

Each backend caches up to 64 subtransaction IDs (subxids) in shared memory per top-level transaction. As long as you stay under that, visibility checks are cheap because everything is in the cache.

Cross 64, and the backend marks itself as "overflowed." Now, whenever any other transaction needs to determine whether a row written by a subtransaction is visible, it can no longer answer from the cache. It has to consult the pg_subtrans SLRU on disk.

Why one bad session hurts everyone

This is the part that surprises people. The session with thousands of subtransactions might be a background job nobody is watching. But once it overflows, every other backend doing visibility checks against its rows has to walk the subtrans SLRU. The SLRU is a small, shared, lock-protected cache. Under concurrency it becomes a single point of contention.

So a batch job in one corner of the system can cause OLTP queries everywhere else to pile up on SubtransSLRU waits. The slow queries are not the cause. They are victims.

-- Catch overflow in the act
SELECT wait_event_type, wait_event, count(*)
FROM pg_stat_activity
WHERE state = 'active'
GROUP BY 1, 2
ORDER BY count(*) DESC;
-- A pile-up on SubtransSLRU / SubtransBuffer is the signature

How to find the offender

You want the transaction holding the most subtransactions, which usually correlates with the longest-running write transaction. Look for old xact_start times in pg_stat_activity, and cross-reference with what that session is running.

On newer PostgreSQL you can inspect SLRU activity directly through pg_stat_slru, where a hot Subtrans entry with rising blks_read confirms you are paying disk for visibility checks.

SELECT name, blks_zeroed, blks_hit, blks_read, blks_written
FROM pg_stat_slru
WHERE name = 'Subtrans';

-- Long-running write transactions are the usual suspects
SELECT pid, now() - xact_start AS xact_age, state, left(query, 80)
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start
LIMIT 10;

The fixes, in order of preference

First, stop generating so many subtransactions. If a per-row EXCEPTION block exists only to skip duplicates, replace it with INSERT ... ON CONFLICT DO NOTHING, which needs no subtransaction at all. That single change has resolved more of these incidents than any tuning.

Second, restructure long transactions into smaller batches. A job that commits every few thousand rows resets the subxid count and bounds the blast radius. If you do not need one giant atomic transaction, do not open one.

Third, if you genuinely need per-statement error handling at volume, move the error handling into the application layer where each statement runs in its own transaction, rather than nesting savepoints inside one transaction.

-- No subtransaction, no exception block, same intent
INSERT INTO target (id, payload)
SELECT id, payload FROM staging
ON CONFLICT (id) DO NOTHING;

What I tell teams now

Treat BEGIN ... EXCEPTION inside a loop as a code smell that needs justification, not a default. Treat any single transaction that touches tens of thousands of rows as something to chunk. And put a SubtransSLRU alert on the wait-event dashboard, because by the time humans notice the slowness, the cache has already overflowed.

The practical standard

The best PostgreSQL performance work is boring in the right way. Name the failure mode, capture the before plan or metric, make one change, and compare the exact same signal afterward. Anything else is just a more confident guess.