The delete statement looked harmless: remove 80,000 old rows from a parent table. It ran for 47 minutes, blocked a deployment, filled the WAL disk faster than the replica could replay it, and left the table slower than before.
The delete was not slow because PostgreSQL forgot how to remove rows. It was slow because every row had to check child tables, fire triggers, maintain indexes, generate WAL, and leave dead tuples for vacuum to clean later.
The framework: find what each row pays for
When a delete is slow, I split the cost into five buckets:
- Foreign key checks and cascades.
- Trigger work.
- Index maintenance.
- WAL generation and replica replay.
- Vacuum work after the delete commits.
The fix depends on which bucket is dominant. Batching a delete that is slow because of a missing child-table index helps less than adding the missing index. Adding an index does not solve a trigger that calls an expensive function for every row.
Start with foreign keys
The most common slow-delete incident is a parent delete with child tables that reference it. PostgreSQL must prove the delete does not violate the foreign key or perform the cascade. If the child-side foreign key column is not indexed, each parent row can force expensive scans.
SELECT
conrelid::regclass AS child_table,
confrelid::regclass AS parent_table,
conname,
pg_get_constraintdef(oid) AS definition
FROM pg_constraint
WHERE contype = 'f'
AND confrelid = 'accounts'::regclass;
Then check whether the child columns in those constraints have useful indexes. The index belongs on the referencing side, not just the parent primary key.
Use EXPLAIN carefully
You can inspect a delete plan, but EXPLAIN ANALYZE DELETE actually runs the delete. Do that only inside a transaction you can roll back, and not casually on production data.
BEGIN;
EXPLAIN (ANALYZE, BUFFERS)
DELETE FROM events
WHERE account_id = 42
AND created_at < now() - interval '180 days';
ROLLBACK;
Look for rows removed, trigger timing, buffer usage, and unexpected scans. Trigger timing in the plan is often the clue people miss.
Batching is a pressure valve, not a cure
PostgreSQL does not have a direct DELETE ... LIMIT clause, but you can batch through a CTE:
WITH doomed AS (
SELECT id
FROM events
WHERE created_at < now() - interval '180 days'
ORDER BY id
LIMIT 5000
)
DELETE FROM events e
USING doomed
WHERE e.id = doomed.id;
Batching reduces lock duration, WAL bursts, replica lag, and vacuum shock. It does not make each row cheaper. If the batch still runs slowly, go back to foreign keys, triggers, and indexes.
Large deletes leave cleanup behind
A committed delete does not physically remove tuples from table files immediately. PostgreSQL leaves dead tuples behind until vacuum can reclaim them. That means the incident can continue after the delete finishes: table scans touch more pages, indexes carry dead entries, and autovacuum has to catch up.
SELECT
relname,
n_live_tup,
n_dead_tup,
last_autovacuum,
vacuum_count,
autovacuum_count
FROM pg_stat_user_tables
WHERE relname = 'events';
If you delete a large percentage of a table, plan the cleanup. Sometimes the right answer is not a delete at all.
When TRUNCATE or partition drops are better
If you are removing everything from a table, TRUNCATE is usually the tool to evaluate. It is faster because it does not delete row by row, but it has different locking and transactional implications. Do not swap it in blindly for application deletes.
If you routinely delete old time-based data, partitioning can turn retention from "delete millions of rows" into "detach or drop an old partition." That is a data lifecycle design, not a query optimization trick.
Watch replicas and WAL
Deletes produce WAL. A delete that is fine on the primary can create unacceptable replica lag, especially when cascading work expands the number of touched rows.
SELECT
application_name,
state,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS replay_lag_bytes
FROM pg_stat_replication;
For large cleanup jobs, I care about the whole system: primary latency, WAL volume, replica replay, autovacuum progress, and how long locks are held.
The rollout I trust
- Find child foreign keys and verify child-side indexes.
- Inspect trigger cost.
- Estimate row count and WAL impact.
- Run a small batch and measure duration, buffers, and replica lag.
- Increase batch size only while the system stays boring.
- Schedule vacuum or partition cleanup intentionally.
The mistake is treating deletes as the opposite of inserts. Inserts add work to the table. Deletes add work to the table, the indexes, the replicas, and the cleanup system. Production deletes deserve a plan.