CPU, Memory, and I/O6 min read

Postgres High I/O: Diagnosing Where Disk Goes Wrong

High I/O is not a storage verdict. It is a workload question: which query, table, index, vacuum, checkpoint, or spill is reading and writing the bytes?

When Postgres is I/O-bound, the first temptation is to blame the disk.

Sometimes the disk is the problem. More often, the disk is faithfully doing work the database should not have asked for: reading bloated tables, spilling sorts, scanning without useful indexes, checkpointing too aggressively, or replaying a flood of WAL.

Buying faster storage can hide the pain for a while. It does not explain who is spending the I/O budget.

The real pain is not knowing who is reading

High I/O debugging is attribution. Which query fingerprints are reading? Which relations are growing? Which plans switched from index access to scans? Are temp files involved? Is autovacuum doing overdue cleanup?

Once you can name the source, the fix is usually specific. Until then, every storage change is a guess.

Step 1: Confirm I/O is the bottleneck

SELECT wait_event_type, wait_event, count(*)
FROM pg_stat_activity
WHERE state = 'active'
GROUP BY 1, 2
ORDER BY count(*) DESC;

If many queries are waiting on IO events (DataFileRead, DataFileWrite, WALSync), I/O is the constraint. If they are waiting on Lock, LWLock, or other things, the issue is elsewhere.

Step 2: Check buffer cache hit ratio

SELECT
  sum(heap_blks_read) AS reads,
  sum(heap_blks_hit) AS hits,
  round(100.0 * sum(heap_blks_hit) / nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) AS hit_ratio_pct
FROM pg_statio_user_tables;

A healthy production database has > 95% buffer cache hit ratio. Below 90% means a meaningful fraction of reads are coming from disk.

If the hit ratio is low, two possibilities:

  • The data set is much larger than shared_buffers. Increase memory or reduce the working set.
  • A specific query is doing a sequential scan over a huge table, polluting the buffer cache. Find it and fix it.

Step 3: Find the I/O-heavy queries

SELECT
  round(total_exec_time::numeric, 0) AS total_ms,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  shared_blks_read,
  shared_blks_hit,
  query
FROM pg_stat_statements
ORDER BY shared_blks_read DESC
LIMIT 20;

The top queries by shared_blks_read are the ones causing the most physical reads. Each block is 8KB.

For each of the top queries, the diagnostic question:

  • Are they doing sequential scans where indexes would be cheaper? (Check with EXPLAIN.)
  • Are they reading bloated tables/indexes that should be vacuumed?
  • Are they touching cold data that fell out of cache?

Step 4: EXPLAIN the worst offender

EXPLAIN (ANALYZE, BUFFERS)
<the worst query from step 3>;

Look for:

  • Seq Scan over large tables: missing index, almost always.
  • Buffers: shared read=N — N is the number of pages read from disk for this query. If it is large, the query is doing a lot of I/O.
  • Buffers: shared hit=N — pages served from cache. The ratio of hit to read tells you whether this query is being served from cache.

The output usually points directly at the problem. A query reading 100,000 pages to return 10 rows wants an index.

Step 5: Check vacuum status

Bloated tables read more pages than they should — most of those pages are dead tuples that vacuum would have cleaned up.

SELECT
  schemaname, relname,
  n_live_tup, n_dead_tup,
  round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
  last_autovacuum
FROM pg_stat_user_tables
WHERE n_live_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT 20;

Any table with dead_pct > 30% is a vacuum candidate. The query reads on those tables include scanning over dead tuples. A VACUUM (or pg_repack for severe cases) reduces the I/O footprint.

Step 6: Common patterns

The specific patterns that account for most I/O incidents:

1. A new query path with no matching index.

A new feature shipped a query that hits a column without an index. The query does a full sequential scan of a large table, which dominates the I/O.

Fix: add the index. CREATE INDEX CONCURRENTLY to avoid locks.

2. A query that should use an index does not because of a function or expression.

WHERE LOWER(email) = ? does not use an index on email. The query seq-scans because the planner cannot match the expression.

Fix: expression index (LOWER(email)), or generated column with regular index.

3. Bloated index.

The index is many times its functional size. Reads bring in dead pages.

Fix: REINDEX INDEX CONCURRENTLY.

4. A heavy report running during peak.

A single analytical query that scans an entire table runs during peak hours. The buffer cache fills with that table; everything else needs to read from disk.

Fix: schedule the report off-peak, or run it on a read replica.

5. Hot rows being read with cold rows.

A query like SELECT * FROM orders WHERE customer_id = 1 ORDER BY created_at DESC reads through the orders table looking for rows of customer 1. If those rows are scattered (random insertion), the query reads many pages to find a few rows.

Fix: CLUSTER the table on the relevant index. Or partition by customer_id. Or accept the I/O.

Step 7: Check for sequential scans more broadly

SELECT
  schemaname, relname,
  seq_scan,
  seq_tup_read,
  idx_scan,
  pg_size_pretty(pg_relation_size(format('%I.%I', schemaname, relname)::regclass)) AS table_size
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 20;

Tables with seq_tup_read >> idx_scan are taking a lot of sequential scans. For large tables, this is the smoking gun for missing indexes.

Step 8: Verify the disk itself

If the queries are well-indexed and vacuum is happy and the buffer cache hit ratio is good, the bottleneck is the disk's actual capacity.

Check iostat or the cloud provider's I/O metrics. Symptoms of disk saturation:

  • I/O latency above 10-20ms (modern SSD should be sub-millisecond).
  • Queue depth consistently high.
  • Throughput at the disk's stated maximum.

For cloud-managed instances, the fix is usually a higher IOPS tier or a larger instance. For self-managed, possibly upgrading to NVMe or adding parallelism.

A pragmatic checklist

When I diagnose high I/O on someone else's Postgres:

  1. Confirm I/O wait, not lock wait or CPU.
  2. Check buffer cache hit ratio.
  3. Find top queries by shared_blks_read in pg_stat_statements.
  4. EXPLAIN the worst few. Look for sequential scans on big tables.
  5. Check vacuum/bloat status.
  6. Add missing indexes, vacuum bloated tables, schedule heavy reports off-peak.
  7. Only after all of the above, consider scaling up the instance.

In 9 out of 10 cases, the answer is one of: missing index, bloated table, or a heavy query that should not run during peak. The disk is rarely the actual bottleneck; the workload is what generated the I/O.