Write-heavy PostgreSQL incidents have a familiar shape. The app is fine, then latency jumps. Replication falls behind. Checkpoints get noisy. Disk write I/O climbs. Someone raises max_wal_size, someone else blames autovacuum, and the team still cannot explain which part of the write path saturated.
WAL and checkpoints are not background details. They are the durability pipeline. Every insert, update, delete, index change, and full-page write has to move through it. If you do not measure that pipeline, tuning becomes folklore.
The goal is not to eliminate checkpoints or minimize WAL at all costs. The goal is to make write pressure predictable enough that foreground queries do not pay surprise I/O bills.
The framework: separate WAL volume from checkpoint pain
WAL volume and checkpoint pain are related, but they are not the same problem.
- WAL volume: how much durability and replication data the workload generates.
- Checkpoint frequency: how often dirty data must be flushed to satisfy checkpoint policy.
- Checkpoint intensity: how much write I/O a checkpoint must complete.
- Replication pressure: whether standbys, archiving, or slots can keep up with WAL generation.
- Storage latency: whether fsync and writeback times leak into user-facing latency.
Raising one setting without knowing which pressure exists can hide the symptom and preserve the cause.
Start with checkpoint visibility
The first question is whether checkpoints happen because of time or WAL volume. If checkpoints are usually requested before the timeout, the system is generating enough WAL to hit the size boundary early.
SELECT
checkpoints_timed,
checkpoints_req,
checkpoint_write_time,
checkpoint_sync_time,
buffers_checkpoint,
buffers_backend,
buffers_backend_fsync
FROM pg_stat_bgwriter;
A high requested-checkpoint count points toward max_wal_size being too small for the write rate. High backend buffers or backend fsyncs suggest foreground sessions are helping with work you wanted background processes to absorb.
Read WAL generation as a rate
Total WAL bytes are less useful than WAL bytes over time and by workload event. Capture before and after a known interval.
SELECT pg_current_wal_lsn();
-- Run the workload window, then compare.
SELECT pg_size_pretty(
pg_wal_lsn_diff(pg_current_wal_lsn(), '0/7000000')
) AS wal_generated;
For ongoing monitoring, store samples and calculate bytes per minute. Then correlate spikes with bulk imports, index builds, batch updates, vacuum activity, and application deploys.
WAL rate matters for primary storage, archive bandwidth, replication slots, standby replay, and recovery time. It is one number with many operational consequences.
max_wal_size is a pressure valve, not a speed button
Increasing max_wal_size lets PostgreSQL wait longer before forcing a checkpoint because of WAL volume. That can smooth I/O by reducing checkpoint frequency. It does not make the same amount of dirty data disappear.
checkpoint_timeout = '15min'
checkpoint_completion_target = 0.9
max_wal_size = '16GB'
Those values are not universal recommendations. The sizing should come from WAL generation rate, storage throughput, recovery objectives, and how much burst headroom the system needs.
The tradeoff is clear: larger WAL headroom can reduce checkpoint churn, but it can also increase crash recovery work and disk requirements. That is an SLO decision, not just a database tweak.
wal_compression helps specific workloads
Full-page writes can make WAL volume jump after checkpoints, especially for update-heavy workloads. wal_compression can reduce WAL size when changed pages compress well.
wal_compression = on
The cost moves to CPU. That is usually acceptable when storage or replication bandwidth is the bottleneck, but it should be measured. Compression that saves disk but saturates CPU is not a win.
Replication lag is part of WAL tuning
A primary can look healthy while standbys fall behind. WAL generation outpaces network transfer, disk write, or replay. Replication slots can then retain WAL and turn lag into a disk-risk incident.
SELECT
application_name,
state,
sync_state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_bytes_lag,
write_lag,
flush_lag,
replay_lag
FROM pg_stat_replication;
Split lag into write, flush, and replay. Network pressure, standby disk pressure, and replay conflicts have different fixes. Treating all lag as one number slows the incident response.
Batch writes need shape, not hope
Large updates, deletes, backfills, and imports are where write-path tuning gets tested. The operational pattern I trust is explicit batching with pauses, progress metrics, and a rollback plan.
WITH batch AS (
SELECT id
FROM events
WHERE processed_at IS NULL
ORDER BY id
LIMIT 5000
)
UPDATE events e
SET processed_at = now()
FROM batch
WHERE e.id = batch.id;
Small batches reduce lock time, checkpoint bursts, replication lag, and rollback blast radius. They also make it easier to stop when lag crosses a threshold.
The runbook I want
- Current WAL generation rate by minute and by major job.
- Timed versus requested checkpoint trend.
- Checkpoint write and sync time trend.
- Backend writes and backend fsync count.
- Replication replay bytes lag and slot retained bytes.
- Archive command failures and archive delay.
- Storage write latency from the host or cloud provider.
Without these signals, every setting change is a guess. With them, WAL and checkpoint tuning becomes a capacity-management exercise.
The pragmatic default
Increase WAL headroom only after you know checkpoints are too frequent for the write rate. Tune checkpoint completion to spread I/O. Use WAL compression when WAL volume is the bottleneck and CPU has room. Shape batch jobs so they respect replication and storage limits.
The mistake is treating WAL as a log directory that occasionally gets big. WAL is the write path. If it saturates, the product feels it.