January 17, 20267 min read

When the Planner is Wrong: Fixing Bad Row Estimates in Postgres

If the planner thinks a step returns 100 rows and it actually returns 100,000, every choice it makes after that is suspect. Here is how I track it down and fix it.

The Postgres planner is usually right. When it is wrong, it is wrong in spectacular ways. The query that should run in 50ms takes 30 seconds because the planner picked a nested loop expecting 100 rows on the inner side and got 100,000.

I have lost more time to this than to any other category of slow query. The fix is almost always statistics, not the query.

The diagnostic: EXPLAIN ANALYZE, look at the rows

The single most useful thing in EXPLAIN ANALYZE output is the (rows=X width=Y) versus (actual rows=Z ...) comparison. The first number is the planner's estimate. The second is reality. When the ratio is 100x or more in either direction, the planner picked a plan based on bad data.

EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE c.country = 'US'
  AND o.status = 'shipped'
  AND o.created_at >= now() - interval '30 days';

Look for nodes where the estimate and the actual row count diverge. Those are where the planner made the wrong call.

The four common reasons for bad estimates

1. Stale statistics. Auto-vacuum has not analyzed the table recently, or the workload changed faster than the analyze threshold caught.

The fix: ANALYZE table_name;. This usually closes 80% of the cases I see. If it works, the next question is why auto-analyze did not catch it. Often the threshold is too high for a small but high-churn table.

2. Skewed data. The column has very uneven distribution — say 95% of status is 'completed' and 5% is everything else. Default statistics keep 100 most-common values per column, which is usually enough; for very high cardinality with skew, it is not.

The fix: increase default_statistics_target for that column.

ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;

100 is the default. 1000 captures more values and gives the planner sharper distributions.

3. Correlated columns. Postgres assumes columns are independent unless told otherwise. When two columns are correlated — country and state, customer_id and region — the planner multiplies the wrong probabilities and gets selectivity wildly wrong.

The fix: extended statistics.

CREATE STATISTICS orders_country_state
  ON country, state
  FROM orders;
ANALYZE orders;

The planner now knows the dependency. This single change has rescued more than one production query for me.

4. The query has expressions the planner cannot estimate well. Functions on indexed columns, complex CASE expressions, JSONB path lookups. The planner falls back on default selectivity (often 0.5%), which is wrong in interesting ways.

The fix: rewrite the query so the planner has something to work with, or precompute the expression in a generated column with proper statistics.

When `ANALYZE` is not enough

If ANALYZE does not fix the estimate, the next step is to make the planner's life easier. A few moves I reach for:

Add an index that matches the predicate. Indexes have their own statistics that help the planner reason about selectivity.
Break a complex WHERE into a CTE that the planner can analyze separately. Modern Postgres (12+) inlines CTEs, but you can use WITH ... AS MATERIALIZED to force the planner to plan them as separate steps.
Use pg_hint_plan as a last resort. Plan hints are not native Postgres, but the extension lets you tell the planner what to do when statistics will not get there. I treat this as a smell — the right fix is usually statistics — but it is occasionally the right call for queries the team owns and tests.

The auto-analyze conversation

Most stale-statistics problems come down to autovacuum settings being defaults. The autoanalyze threshold is 0.1 * row_count + 50. For a hundred-million-row table, that is ten million row changes before autoanalyze fires. That is far too few statistics updates for a hot table.

For large or high-churn tables, drop the scale factor:

ALTER TABLE orders
  SET (autovacuum_analyze_scale_factor = 0.02,
       autovacuum_vacuum_scale_factor = 0.05);

Now autoanalyze runs after 2% changes instead of 10%. The cost is a bit more analyze work; the benefit is fresher stats and fewer planner surprises.

Verify the fix

Whenever I change statistics, I rerun EXPLAIN ANALYZE and confirm:

The estimate now matches reality.
The plan changed (usually to a hash or merge join instead of a nested loop, or vice versa).
The actual runtime is what we expected.

If the estimate improved but the runtime did not change, the planner was already picking the right plan and the bad estimate was a red herring. That happens — the planner is robust to small estimate errors and only changes course when the difference would tip a join strategy.

What I do when nothing works

If I have analyzed, increased the statistics target, added extended statistics, rewritten the query, and the estimate is still wrong, the answer is usually that the data shape is fundamentally not what the planner can model. Time-series data with sharp distribution shifts. Tenant data where a single tenant is 99% of the rows. JSON columns where the structure varies wildly per row.

For those cases the fix is structural: partition the table, split the JSON into typed columns, materialize the hot path. The planner is not the problem; the schema is.

← Back to all articles