8 min read

Reading EXPLAIN ANALYZE Output Without Drowning

EXPLAIN ANALYZE produces a wall of indented text and three numbers per node. Here is the actual reading order I use, and what each number means in practice.

EXPLAIN ANALYZE is the most useful Postgres tool for fixing slow queries and the most intimidating to read at first. Indented tree, three sets of numbers per node, plan types that mostly match the documentation. The first time you stare at one for a real query, the temptation to give up and try a few index ideas is strong.

Here is the reading order that actually works for me, and the patterns I look for.

The command itself

EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 123;

Four options worth turning on:

  • ANALYZE — runs the query for real and shows actual times. Without this, you only see estimates.
  • BUFFERS — shows pages read from cache vs disk. Critical for diagnosing I/O problems.
  • VERBOSE — shows column-level information. Useful for complex queries.
  • FORMAT TEXT — the default; sometimes JSON is easier for tooling.

I keep EXPLAIN (ANALYZE, BUFFERS) muscle memory. The other options I add when needed.

Note: EXPLAIN ANALYZE actually runs the query. For a SELECT this is fine. For an UPDATE/DELETE/INSERT, wrap it in a transaction you roll back:

BEGIN;
EXPLAIN (ANALYZE, BUFFERS) UPDATE orders SET ... WHERE ...;
ROLLBACK;

Read bottom up, but start with the top

Plans are trees. Each node feeds rows up to its parent. The leaves do scans (Seq Scan, Index Scan), the middle nodes do joins (Hash Join, Nested Loop, Merge Join), the top is what the query returns.

The execution order is roughly bottom-up: leaves run first, middle nodes consume their output, the top emits rows to the client. But for diagnosis, I read top-down to find which step costs the most, then dive into its children.

The numbers that matter

Each node shows two sets of numbers:

Index Scan using orders_customer_id_idx on orders
  (cost=0.43..120.50 rows=100 width=64)
  (actual time=0.020..0.150 rows=87 loops=1)

The first parenthesis is the planner's estimate. The second is reality.

cost=A..B — A is the startup cost (time to first row), B is the total cost. Cost units are abstract; the planner uses them for relative comparison, not absolute time.

rows=N — estimated rows returned by this node.

actual time=A..B — actual time in milliseconds. A is when the first row appeared, B is when the last row finished.

actual rows=N — actual rows returned.

loops=N — how many times this node executed (matters for nested loop joins).

The most important comparison: estimated rows vs actual rows. If they differ by 10x or more, the planner made a bad call based on bad statistics.

Pattern 1: a Seq Scan that should be an Index Scan

Seq Scan on orders (cost=0.00..15000.00 rows=10000 width=64)
                   (actual time=0.020..2500.150 rows=87 loops=1)
  Filter: (customer_id = 123)
  Rows Removed by Filter: 999913

The scan read a million rows and discarded all but 87. This is a missing index. The fix:

CREATE INDEX CONCURRENTLY orders_customer_id_idx ON orders (customer_id);

Verify by re-running EXPLAIN: should now be Index Scan using orders_customer_id_idx.

Pattern 2: a wrong-direction nested loop

Nested Loop  (rows=100 actual rows=10000 loops=1)
  -> Index Scan on customers (rows=10 actual rows=100)
  -> Index Scan on orders (rows=10 actual rows=100)
         Index Cond: (orders.customer_id = customers.id)

The planner thought there were 100 total rows. Actually 10,000. The nested loop strategy is wrong for that volume; it should have been a hash or merge join. The estimate on customers was 10x off, which made the inner side look cheap.

The fix:

  1. ANALYZE customers; — refresh statistics.
  2. If still wrong, increase statistics target: ALTER TABLE customers ALTER COLUMN status SET STATISTICS 1000; ANALYZE customers;
  3. If still wrong, look at extended statistics for correlated columns.

Pattern 3: BUFFERS reveals the real problem

Index Scan on orders
  (actual time=0.020..1500.150 rows=87 loops=1)
  Buffers: shared read=3000

3000 pages read from disk for 87 rows. The query is doing a lot of I/O even with the index. Likely cause: the index is large because of bloat, or the rows themselves are scattered across many pages.

Fix options:

  • Reindex if the index is bloated.
  • Cluster the table on the index if reads are dominant.
  • Add a covering index (INCLUDE clause) that contains all needed columns:
CREATE INDEX CONCURRENTLY orders_customer_with_total
  ON orders (customer_id) INCLUDE (total_cents, status);

Now the query can answer entirely from the index without touching the heap. Buffers: shared read should drop dramatically.

Pattern 4: a sort that should be an index scan

Sort  (actual time=2000.000..2500.000 rows=87 loops=1)
  Sort Key: created_at DESC
  Sort Method: external merge  Disk: 50MB
  -> Seq Scan on orders

The query needs results sorted by created_at DESC. The plan reads everything, then sorts. "external merge" means the sort spilled to disk because work_mem was insufficient.

Fix: add a matching index:

CREATE INDEX CONCURRENTLY orders_customer_created
  ON orders (customer_id, created_at DESC);

Now the query can scan the index in pre-sorted order, no sort step needed.

Pattern 5: a subquery executed many times

Nested Loop  (loops=1)
  -> Seq Scan on big_table (rows=10000)
  -> SubPlan 1
         Aggregate (loops=10000)
           -> Index Scan on lookup_table

The subquery ran 10,000 times — once per row of the outer loop. Even if each one is fast, the cumulative cost is huge.

Fix: rewrite the query to JOIN instead of correlate, or move the subquery outside the loop with a CTE:

-- Bad: correlated subquery
SELECT *,
  (SELECT count(*) FROM orders WHERE customer_id = c.id) AS order_count
FROM customers c;

-- Better: lateral join
SELECT c.*, o.order_count
FROM customers c
LEFT JOIN LATERAL (
  SELECT count(*) AS order_count
  FROM orders
  WHERE customer_id = c.id
) o ON true;

The planner has more options when the structure is a join.

What I do when nothing obvious is wrong

If row estimates match reality, indexes are used, and the query is still slow, the bottleneck is likely:

  • Sheer data volume — the query has to do this much work no matter what.
  • I/O capacity — the disks are saturated.
  • Lock contention — EXPLAIN ANALYZE shows what the query did, not what it waited for. Check pg_stat_activity and lock waits during the slow window.

For data volume, the answer is sometimes no longer answering the same query. Materialize the result, paginate differently, push work to a read replica.

For I/O, the answer is hardware or workload — there is nothing query-side to do.

A reading checklist

When EXPLAIN ANALYZE lands on my desk:

  1. Look at total time at the top. Is it within an order of magnitude of expected?
  2. Find the most expensive node by actual time.
  3. Compare rows (estimate) to actual rows. Big gap = statistics problem.
  4. Look at Seq Scans — are they on small tables (fine) or big tables (probable index problem)?
  5. Check Buffers: shared read — is the query doing unexpected I/O?
  6. Check loops on inner sides of nested loops — is something running 1000s of times?

Five minutes per query, most of the time, is enough to find the issue. The remaining 10% is where the interesting Postgres conversations happen.