Indexes9 min read

PostgreSQL pg_repack: Removing Table and Index Bloat Without the Downtime of VACUUM FULL

VACUUM FULL would have locked a 200GB table for over an hour in the middle of the day. pg_repack rebuilt it online with only a momentary lock, and nobody noticed. Here's how it works and when to trust it.

A 200GB table had bloated to nearly half dead space. VACUUM FULL would reclaim it — and hold an ACCESS EXCLUSIVE lock on the table for over an hour while it did, meaning no reads, no writes, nothing, in the middle of the business day. That was not an option.

pg_repack rebuilt the same table online. It held a real lock for only a fraction of a second at the very end, while the application kept reading and writing the whole time. When it finished, the table was tight again and nobody had noticed.

pg_repack is the standard answer to 'I have bloat I cannot remove without downtime.' It is worth understanding how it pulls that off, because the magic has requirements.

Why VACUUM FULL is so disruptive

Regular VACUUM reclaims dead tuples for reuse but does not return space to the operating system or compact the table. VACUUM FULL does — by rewriting the entire table into a fresh file. To do that safely it takes an ACCESS EXCLUSIVE lock, which blocks every reader and writer for the full duration. On a large table that is an outage.

So you are stuck: regular VACUUM cannot shrink a badly bloated table, and the tool that can requires downtime you cannot afford.

How pg_repack rebuilds online

pg_repack creates a new copy of the table and builds it up while the original stays live. It installs triggers that capture every change made during the rebuild into a log table, copies the existing data into the new table, replays the captured changes to catch up, and then swaps the new table in for the old one.

The only moment it needs a strong lock is that final swap, which is brief. Everything expensive — the copy and the index builds — happens while the application is reading and writing normally.

-- Repack a single bloated table (and its indexes) online
pg_repack -d mydb -t orders

-- Just rebuild the indexes, online, leaving the heap alone
pg_repack -d mydb -t orders --only-indexes

The requirements that bite people

pg_repack needs a primary key or a non-null unique index on the table, because that is how it matches rows between the original and the copy while replaying changes. A table without one cannot be repacked.

It also needs roughly enough free disk for a second copy of the table and its indexes during the operation. If you are repacking because the disk is nearly full from bloat, you may not have room — which is a frustrating chicken-and-egg you want to discover before you start, not halfway through.

Operational caveats

Because it works with triggers and a final swap, an interrupted pg_repack run can leave behind temporary objects and triggers that you must clean up. Run it in a window where you can babysit it, and know how to drop its leftovers if it dies.

It is also write-amplifying while it runs — you are effectively rewriting the whole table plus replaying concurrent changes — so expect extra I/O and WAL during the operation. Schedule heavy repacks for quieter periods even though they do not block traffic.

  • VACUUM FULL reclaims space but locks the table for its full duration.
  • pg_repack rebuilds online; it needs a strong lock only for a brief final swap.
  • Requires a primary key or non-null unique index, and room for a second copy.
  • An interrupted run can leave triggers/temp objects to clean up.
  • It is I/O- and WAL-heavy while running — prefer quieter windows.

The practical standard

The best PostgreSQL performance work is boring in the right way. Name the failure mode, capture the before plan or metric, make one change, and compare the exact same signal afterward. Anything else is just a more confident guess.