PostgreSQL plan forensics

PostgreSQL Plan Autopsy

Turn a raw EXPLAIN plan into an incident narrative: what the planner believed, where execution diverged, which node created the blast radius, and what evidence to collect before changing production.

Review Question

Which part of this plan is actually responsible for the incident, and what evidence would change the next production decision?

At A Glance

Input
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
Decision
Fix query shape, statistics, index path, or memory pressure
Output
Findings, proof SQL, and risk-ranked plan signals

What you get

  • Planner estimate drift scored against actual row flow
  • Loop, buffer, sort, hash, and temporary I/O signals grouped by risk
  • A production evidence pack for pg_stat_statements, pg_stats, and validation EXPLAIN
  • A ranked remediation path that separates proof from guesswork

Guardrails

  • Runs in the browser; it does not connect to your database.
  • Treats sequential scans and nested loops as evidence, not automatic bugs.
  • Separates production proof from one-off local timing.

Signals covered

The page is designed to answer these production questions and search intents without losing the operational context.

explain analyze incident reviewquery plan forensicsrow estimate mismatchnested loop explosiondisk sort buffers
PostgreSQL guideCompare tools

How To Use It

Paste the plan first. The verdict comes after the evidence.

This page should feel like a structured point-in-time review, not a dashboard mashup. First give it the raw plan text. Then read the verdict, the key signals, and the starter checks that move the investigation forward.

01

Paste plan evidence

Use EXPLAIN ANALYZE output with buffers when possible. The better the evidence, the better the review.

02

Read the verdict

See whether this looks like an incident signature, a risky plan, or just something worth reviewing.

03

Run starter checks

Use the starter SQL to inspect the query family and planner stats before you change production.

Plan Evidence

Paste raw EXPLAIN output here first. This is the input the rest of the page reads.

Plan Evidence

Runs locally

Everything happens in the browser. The page does not connect to your database.

Best input

Prefer EXPLAIN (ANALYZE, BUFFERS, VERBOSE) or FORMAT JSON so loops, reads, and spills are visible.

Main goal

Separate planner shape from runtime evidence before changing indexes, stats, or memory settings.

Review Verdict

Production incident signature

This plan already shows a production-grade failure signature. Fixes should be guided by evidence, not guesswork or generic tuning.

95
out of 100

Execution time

185.1 ms

Prefer EXPLAIN ANALYZE with Execution Time.

Worst estimate gap

18544x

Planner rows compared with actual rows across loops.

Largest read node

12,811

Largest hit signal: 842 blocks.

Rows filtered late

812,456

Rows removed by filter, join filter, or index recheck.

What This Tool Can Settle Now

  • Which node looks suspicious in this one pasted plan
  • Whether row estimates, loops, reads, or spills deserve the next check
  • What to inspect next before changing the query or schema

What Still Needs Workload History

  • Whether the same fingerprint regressed after a deploy
  • How often this plan shape hurts production traffic
  • Whether the fix actually reduced total workload pressure over time

Key Signals

These are the signals the page extracted from the plan text.

Execution time
185.1 ms

Prefer EXPLAIN ANALYZE with Execution Time.

Worst estimate gap
18544x

Planner rows compared with actual rows across loops.

Largest read node
12,811

Largest hit signal: 842 blocks.

Rows filtered late
812,456

Rows removed by filter, join filter, or index recheck.

Loop hotspots
1

Nodes executing 1,000+ loops.

Spill volume
~128 MB

Disk sort/hash spill reported by the plan.

Access paths
1/1

Sequential scans / index-backed scans.

Hash batches
0

Batches above 1 indicate memory pressure.

Findings

Risk-ranked interpretations of what the plan is showing.

Loop amplification

Critical

Index Scan using customers_pkey on customers executed 18,544 loops.

High loop counts turn small per-row costs into the dominant runtime. This is where nested-loop plans often become production incidents.

Next step: Find the outer row source that drives the loops, then validate whether a different join path, index, or row estimate fix removes the amplification.

Planner belief diverged from execution

Critical

Index Scan using customers_pkey on customers expected 1 row and produced 18,544. Worst ratio: 18544x.

When estimates are this wrong, PostgreSQL can choose the wrong join order, wrong access path, or a nested loop that only looks cheap on paper.

Next step: Inspect pg_stats for the filtered columns, run ANALYZE, and check whether extended statistics are needed for correlated predicates.

Memory spill

High

Plan reports approximately 128 MB written by disk sort/hash operations.

Disk-backed sorts and hashes add latency variance and usually get worse under concurrency.

Next step: Reduce rows before sort/hash, check ORDER BY index support, and evaluate work_mem with active concurrency rather than one query in isolation.
2 additional signals were detected. The page keeps the emphasis on the highest-impact findings first.

Starter Checks

Use these queries to continue the investigation. They are starter checks, not a replacement for historical monitoring.

-- 1. Find out whether this plan belongs to a recurring query family.
SELECT queryid, calls, total_exec_time, mean_exec_time,
       rows, shared_blks_hit, shared_blks_read,
       temp_blks_read, temp_blks_written
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- 2. Re-check planner statistics before changing access paths.
SELECT schemaname, tablename, attname,
       n_distinct, correlation, most_common_vals, most_common_freqs
FROM pg_stats
WHERE tablename = 'your_table'
ORDER BY attname;

-- 3. Re-run the same query with enough evidence to compare nodes.
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT ...;

-- 4. If your tooling prefers JSON, keep the same evidence in machine-readable form.
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS, FORMAT JSON)
SELECT ...;

What This Review Should Clarify

The page should move in the same order a real plan review works.

First collect the plan evidence. Then interpret the verdict. Only after that should you decide whether the fix belongs in statistics, query shape, indexes, or memory settings.

Do not add an index until the bad node is obvious.
Do not treat estimated cost as elapsed time.
Do not trust a fix until the same fingerprint improves after deploy.

Related PostgreSQL Hubs

Related Tools

Useful here, still point-in-time.

This page should help with a real decision right now. It should not pretend to replace the historical evidence that production teams need after a deploy, during an incident, or across a longer rollout window.

Helpful On This Page

  • A pasted plan turned into a structured review
  • Immediate signals for row drift, loops, reads, and spills
  • Starter checks that move the next investigation step forward

Still Needs Live History

  • Whether the same fingerprint regressed after a deploy
  • How often the bad plan shape appears in real traffic
  • Whether the fix reduced total time, reads, and waits over time
See Live History in MonPG