I had handled transaction ID wraparound before. I knew the scary 'database is not accepting commands' message and the emergency vacuum that follows. So when a replica refused to come up and the log said MultiXactId members limit exceeded, I genuinely did not know what I was looking at.
It turns out PostgreSQL has two counters that can wrap around, not one. Transaction IDs are the famous one. Multixact IDs are the quiet one, and they are usually consumed by something completely innocent: foreign keys.
A multixact is how PostgreSQL records that more than one transaction holds a shared lock on the same row at the same time. If your workload locks rows for sharing a lot, you can exhaust multixact space long before anyone thinks to look for it.
What a multixact actually is
When a single transaction locks a row, PostgreSQL stores that transaction's ID directly in the row header. But when multiple transactions need to hold a shared lock on the same row simultaneously, one slot is not enough. PostgreSQL allocates a multixact: a small structure that lists all the member transactions, and stores the multixact ID in the row instead.
The most common source is not SELECT ... FOR SHARE that you wrote by hand. It is foreign keys. When a row references a parent row, PostgreSQL takes a shared lock on that parent to stop it from disappearing. Many children referencing the same hot parent row means many shared lockers, which means multixacts.
Two limits, not one
Multixacts have their own 32-bit ID space that can wrap around exactly like transaction IDs, governed by autovacuum_multixact_freeze_max_age. That is the limit most people eventually hear about.
But there is a second, sneakier limit: the members space. Each multixact references a list of member transactions stored in the pg_multixact/members SLRU. That storage is finite. A workload that creates huge multixacts (one hot parent row locked by thousands of children) can exhaust the members space while the ID counter still looks healthy. That is the members limit exceeded error, and it triggers aggressive anti-wraparound autovacuum on its own.
How to see where you stand
You monitor multixact age the same way you monitor transaction age, but against the multixact horizon. The number you care about is how close the oldest multixact is to the freeze threshold across your databases.
-- Oldest multixact age per database (closer to the limit = worse)
SELECT datname,
mxid_age(datminmxid) AS multixact_age,
age(datfrozenxid) AS xid_age
FROM pg_database
ORDER BY mxid_age(datminmxid) DESC;
-- The freeze threshold autovacuum aims to stay under
SHOW autovacuum_multixact_freeze_max_age;
Find what is generating them
Before you treat the symptom, find the source. The usual culprits are a foreign key pointing at a small set of very hot parent rows, or application code that uses SELECT ... FOR SHARE / FOR KEY SHARE heavily under concurrency. Both create shared lockers that pile up into large multixacts.
If a single parent row (say a 'default tenant' or a 'system' account) is referenced by millions of children being inserted concurrently, that one row becomes a multixact factory. Spreading the references, or rethinking the foreign key, attacks the root cause.
Recovering and preventing
The cure is the same machinery as transaction wraparound: VACUUM freezes old multixacts and advances the horizon. If you are close to the limit, a targeted VACUUM (FREEZE) on the high-churn tables clears the backlog faster than waiting for autovacuum to get to them.
To prevent recurrence, make autovacuum more aggressive on the tables that generate multixacts, and lower autovacuum_multixact_freeze_max_age if your workload is multixact-heavy so freezing starts earlier. The goal is to never let the members space get close enough to matter.
- Monitor mxid_age(datminmxid), not just transaction age.
- Remember the members SLRU is a second, separate limit.
- Foreign keys on hot parent rows are the usual multixact factory.
- VACUUM (FREEZE) the high-churn tables to advance the horizon.
- Tune autovacuum_multixact_freeze_max_age down for multixact-heavy workloads.
The practical standard
The best PostgreSQL performance work is boring in the right way. Name the failure mode, capture the before plan or metric, make one change, and compare the exact same signal afterward. Anything else is just a more confident guess.