March 8, 20266 min read

Measuring Table Bloat in Postgres: The Question, Answered Honestly

The popular bloat estimation queries are off by 10-30% under typical workloads. Here is what pgstattuple actually measures, when the approximate variant is enough, and the cleanup decisions that follow.

Bloat is the gap between what a table actually contains (live rows) and what it occupies on disk. A table with 1GB of live data and 4GB on disk is 75% bloated. The disk that should be holding indexes or different tables is holding tombstones from rows deleted months ago.

Most teams I have worked with measure bloat with one of two queries that come up first on Google. Both queries are estimations. Both are wrong by 10-30% under typical workloads. Here is what actually works.

Why estimation queries are unreliable

The popular bloat queries (the "check_postgres_bloat" SQL, the bloat queries on PostgreSQL Wiki) work by estimating row size from the average row width and comparing to the table's physical size. The math is approximately:

estimated_live_size = avg_row_width * live_tuple_count
bloat_pct = (physical_size - estimated_live_size) / physical_size

This is wrong for several reasons:

Row width estimates are based on pg_stats, which can be stale.
Tuples have header and per-attribute overhead that varies.
Free space within pages from old deletions does not always match what the estimator assumes.
TOAST out-of-line storage skews the estimates.

The error is usually within 30%, which is fine for finding the worst tables. It is wrong enough that decisions like "is it worth pg_repack-ing" are sometimes mistaken.

The reliable answer: `pgstattuple`

pgstattuple is an extension that scans the table and reports actual statistics:

CREATE EXTENSION pgstattuple;

SELECT *
FROM pgstattuple('orders');

The relevant columns:

table_len — total physical size of the table.
tuple_count, tuple_len — count and bytes of live tuples.
tuple_percent — percentage of table that is live data.
dead_tuple_count, dead_tuple_len — dead (deleted/updated) tuples.
free_space — free space within pages.

Real bloat is (dead_tuple_len + free_space) / table_len.

The accuracy is high. The cost is that pgstattuple() reads every page of the table. On a 100GB table this is significant I/O. For very large tables, sample-based variants are useful:

-- pgstattuple_approx samples instead of full scan
SELECT *
FROM pgstattuple_approx('orders');

pgstattuple_approx requires a gin_btree extension and works well for large tables, with much lower I/O cost.

Index bloat is its own measurement

For indexes, pgstatindex gives you bloat information:

SELECT *
FROM pgstatindex('orders_customer_id_idx');

The key column is avg_leaf_density. A healthy B-tree index has this above 80%. Below 50% the index is half-empty and a candidate for REINDEX.

Index bloat develops differently from table bloat. Tables bloat from UPDATE and DELETE; indexes bloat from inserts in random order (UUIDv4 keys, for example) and from heavy UPDATE activity that causes index leaf splits.

A practical bloat audit

The sequence I run when I take over a database:

-- Extension installed?
SELECT * FROM pg_extension WHERE extname = 'pgstattuple';

-- Top 20 tables by total size
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(format('%I.%I', schemaname, tablename)::regclass)) AS total_size,
  pg_size_pretty(pg_relation_size(format('%I.%I', schemaname, tablename)::regclass)) AS table_size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(format('%I.%I', schemaname, tablename)::regclass) DESC
LIMIT 20;

Then for the top candidates, run pgstattuple (or pgstattuple_approx if they are very large) and look at:

tuple_percent below 60% — significant bloat.
dead_tuple_percent above 20% — vacuum is not keeping up.
free_space_percent very high — pages have lots of empty space.

These signals point at tables that benefit from a vacuum or repack.

What to do once you know

For table bloat:

Light bloat (under 30%): not worth doing anything. Vacuum is keeping up.
Moderate bloat (30-60%): tune autovacuum to be more aggressive on this specific table — ALTER TABLE foo SET (autovacuum_vacuum_scale_factor = 0.05).
Heavy bloat (over 60%): pg_repack rewrites the table without locking. Or accept the bloat if the table will be rewritten by retention soon.

For index bloat:

Light bloat: ignore.
Heavy bloat (low leaf density): REINDEX INDEX CONCURRENTLY.
Bloat from random UUIDv4 inserts on a hot table: structural fix is to switch to UUIDv7 (chronological) or to change the index strategy.

Note that for very high-write tables, heavy bloat is sometimes unavoidable on a per-day basis but resolves itself when retention hard-deletes old rows. Measure the trend, not just the snapshot.

A surprise from real operations

One of the most counterintuitive bloat sources I have seen: a table with default_statistics_target set very high. The autovacuum process spent so much time analyzing that vacuum runs got starved, and bloat accumulated despite normal-looking workload. Lowering the statistics target to 100 (default) resolved both the analyze cost and the bloat.

This is rare but worth knowing. Bloat is a vacuum-keeping-up problem; anything that prevents vacuum from running enough produces bloat as a downstream symptom.

What I keep monitoring

For any production database, the bloat metrics I watch:

Top 10 tables by bloat percentage (from pgstattuple_approx if available).
Top 10 indexes by leaf density (from pgstatindex).
Trend of bloat over weeks — is it growing or stable.
autovacuum_count per table — is vacuum running often enough.

When bloat is climbing on tables vacuum is not catching, the usual answer is autovacuum tuning (lower scale factors, more workers, lower cost delay). When bloat is stable but high, the answer is pg_repack or accepting it.

The accurate measurement is the basis for any of these decisions. The estimation queries get you 70% of the way there; pgstattuple closes the gap.

← Back to all articles