March 14, 202613 min read

PostgreSQL VACUUM: Dead Tuples, Bloat, and Wraparound

VACUUM is routine maintenance until it is not. Understand dead tuples, autovacuum pressure, bloat, and transaction age before they become production risk.

Nobody gets excited about VACUUM. It's not glamorous. It doesn't show up in conference talks about exciting new features. But I've seen more production outages caused by neglected VACUUM configuration than by almost anything else. One team I worked with had a table that was 8x its logical size because autovacuum was essentially disabled via a badly copy-pasted config. Queries that should have taken 50ms were taking 45 seconds.

Let me explain why this happens.

MVCC: The Root Cause of Dead Tuples

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to let readers and writers work simultaneously without blocking each other. The way it works is clever — when you UPDATE a row, PostgreSQL doesn't modify it in place. It writes a new version of the row and marks the old version as dead.

Same with DELETE. The row isn't removed. It's marked dead and stays in the page until VACUUM comes along to clean it up.

This is what makes PostgreSQL's concurrency so good. But it also means that a table under heavy write load accumulates dead tuples continuously. On a table processing 10,000 updates per second, you're generating 10,000 dead tuples per second. VACUUM has to keep up with that.

Check your dead tuple accumulation:

SELECT
  schemaname,
  tablename,
  n_live_tup,
  n_dead_tup,
  round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
  last_vacuum,
  last_autovacuum,
  last_analyze,
  last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

If you see tables with 50%+ dead tuples that haven't been vacuumed in days, that's a problem. You're effectively reading twice the data you need to.

VACUUM vs VACUUM FULL: Know the Difference

This trips people up all the time.

VACUUM (regular): Marks dead tuple space as reusable within the same table. It does NOT return space to the OS. The table file stays the same size. But future inserts can reuse that space. This runs concurrently with reads and writes — it doesn't lock the table.

VACUUM FULL: Rewrites the entire table into a new file, returning space to the OS. The table is locked for the entire duration. On a 100GB table, that could be 30+ minutes of downtime.

My take: VACUUM FULL is almost never the right answer in production. The only time I use it is on tables where bloat is truly pathological and the table will never shrink through normal operations. Even then, I prefer pg_repack because it doesn't lock the table.

The normal VACUUM is what autovacuum runs, and it's what you should tune properly rather than reaching for VACUUM FULL.

Autovacuum: How It Decides When to Run

Autovacuum triggers on a table when:

dead tuples > autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor × n_live_tup)

With the defaults:

autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2

That means: vacuum fires when there are more than 50 + 20% of live rows dead tuples.

For a small table with 1,000 rows: triggers at 250 dead tuples. Fine. For a table with 100 million rows: triggers at 20,000,050 dead tuples. That's 20 million dead tuples before anything happens. On a busy table, that's a lot of bloat.

The defaults are fine for small databases. They're wrong for anything with tables over a few million rows. I know the docs say the default is fine. The docs are optimistic.

Tuning autovacuum properly

For high-churn large tables, set per-table storage parameters:

-- Aggressive autovacuum for a high-churn table
ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,    -- trigger at 1% dead tuples
  autovacuum_vacuum_threshold = 1000,
  autovacuum_analyze_scale_factor = 0.005,
  autovacuum_vacuum_cost_delay = 2          -- less throttling (default is 2ms)
);

And make sure you have enough autovacuum workers globally:

# postgresql.conf
autovacuum_max_workers = 6        # default is 3, increase for busy databases
autovacuum_vacuum_cost_delay = 2  # milliseconds between cost limit hits
autovacuum_vacuum_cost_limit = 400 # work per "nap" (default 200)

The cost delay and cost limit control how aggressively autovacuum runs. The defaults are conservative — designed to not impact foreground queries. But on databases where bloat is a problem, you often need to be more aggressive.

Monitoring Autovacuum Health

Don't just check that autovacuum is running. Check that it's keeping up.

-- Tables where autovacuum is falling behind
SELECT
  schemaname,
  tablename,
  n_dead_tup,
  n_live_tup,
  round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS bloat_pct,
  last_autovacuum,
  now() - last_autovacuum AS time_since_vacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

-- Is autovacuum currently running?
SELECT pid, relid::regclass AS table, phase, heap_blks_total,
       heap_blks_scanned, heap_blks_vacuumed
FROM pg_stat_progress_vacuum;

Also look at the PostgreSQL logs with log_autovacuum_min_duration:

log_autovacuum_min_duration = 1000  # log autovacuum runs taking > 1 second

This is invaluable. You'll see exactly which tables autovacuum is spending time on and how much it's doing.

XID Wraparound: The Thing That Will Actually Kill Your Database

Here's the scary one. PostgreSQL uses 32-bit transaction IDs (XIDs). There are about 2 billion of them. When you run out, PostgreSQL shuts down your database in emergency mode to prevent data corruption. Not degrades. Shuts down.

This is called XID wraparound. It's happened to Mailchimp, Sentry, and a bunch of companies who learned about it the hard way.

PostgreSQL prevents this by running VACUUM on tables approaching the wraparound limit. But if autovacuum can't keep up — because it's too slow, or the table is locked too often, or autovacuum is disabled — you can get into trouble.

Check your wraparound risk:

SELECT
  datname,
  age(datfrozenxid) AS xid_age,
  2147483648 - age(datfrozenxid) AS xids_remaining,
  round((age(datfrozenxid)::numeric / 2147483648) * 100, 1) AS pct_toward_wraparound
FROM pg_database
ORDER BY xid_age DESC;

If pct_toward_wraparound is above 50%, take notice. Above 80%, drop what you're doing and figure out why VACUUM isn't keeping up. Above 95%... you have a very bad day ahead.

PostgreSQL starts warning at 40M transactions remaining, and the autovacuum_freeze_max_age kicks in as a hard limit:

autovacuum_freeze_max_age = 200000000  # default; when to force freeze
vacuum_freeze_min_age = 50000000        # min age before freezing

Don't mess with these unless you know exactly what you're doing.

Per-table XID monitoring

SELECT
  schemaname,
  tablename,
  age(relfrozenxid) AS table_xid_age,
  relfrozenxid
FROM pg_stat_user_tables st
JOIN pg_class c ON st.relid = c.oid
ORDER BY age(relfrozenxid) DESC
LIMIT 20;

Table Bloat Estimation

The dead tuple count in pg_stat_user_tables is an approximation. For a more accurate bloat estimate:

-- Simple bloat check using pg_relation_size vs expected size
SELECT
  tablename,
  pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS actual_size,
  pg_size_pretty(n_live_tup * 200) AS estimated_live_size,  -- rough estimate
  round(pg_relation_size(schemaname||'.'||tablename)::numeric /
        NULLIF(n_live_tup * 200, 0), 1) AS bloat_ratio
FROM pg_stat_user_tables
WHERE n_live_tup > 10000
ORDER BY pg_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;

For more accurate bloat, the pgstattuple extension is your friend:

CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstattuple('your_table_name');

This actually reads the table to count live and dead tuples properly. It's slow on large tables, but accurate.

Practical Approach

What I actually do when a client has VACUUM problems:

Check pg_stat_user_tables for tables with high dead tuple counts
Look at last_autovacuum — is it running at all? Is it running frequently enough?
Check the logs for autovacuum messages
Look at XID age across databases
Tune per-table autovacuum parameters for the worst offenders
Consider increasing autovacuum_max_workers globally

MonPG surfaces all of this automatically — dead tuple trending, vacuum lag, XID age — so you don't have to remember to query it manually. But even without a monitoring tool, the queries above will tell you everything you need to know.

The worst thing you can do is ignore VACUUM until you have a performance problem. By then, you might be looking at hours of VACUUM FULL downtime or a wraparound emergency. Prevention is genuinely easier than the cure.

← Back to all articles