6 min read

Nested Loops in Postgres: The Plan Type People Fear For the Wrong Reasons

A nested loop join is fast or slow depending on the row counts. Here is when it is the right plan, when it is not, and how to read the EXPLAIN output to tell.

A nested loop join has a bad reputation. Most slow-query stories I have heard blame "that big nested loop" and then propose hash-joining everything. Which is fine, except hash joins have their own bad cases, and nested loops are sometimes exactly the right plan.

The correct framing is: a nested loop is the right plan when one side is small. It is the wrong plan when both sides are big. The planner gets this right most of the time, and gets it wrong when the row estimates are off.

How a nested loop actually executes

For each row of the outer side, look up the matching rows in the inner side.

If the outer side is 10 rows and the inner side has an index on the join column, the cost is roughly 10 index lookups. Very fast.

If the outer side is 1,000,000 rows and the inner side has no useful index, the cost is 1,000,000 sequential scans. Catastrophic.

The difference is the row count on the outer side and whether the inner side has a useful index.

When the planner picks nested loop

The planner uses cost estimates. For nested loop, the cost is approximately:

outer_rows * (cost_of_one_inner_lookup)

The planner calculates this and compares to hash join and merge join costs. It picks the cheapest. So nested loop wins when:

  • The outer side is small (low outer_rows).
  • The inner side has a fast lookup (typically an index scan).
  • Other join types would be more expensive (e.g., hash join's hash table doesn't fit in work_mem).

Most of the time this is correct.

When nested loop is wrong

The failure mode is bad row estimates. If the planner thinks the outer side has 10 rows but it actually has 10,000, the cost estimate is off by 1000x. The planner picks nested loop based on the wrong assumption. The query takes forever.

The diagnostic in EXPLAIN ANALYZE:

Nested Loop  (cost=0.43..120.50 rows=10 width=64)
             (actual time=0.020..15000.000 rows=10000 loops=1)
  -> Index Scan on outer_table (rows=10 actual rows=10000)
  -> Index Scan on inner_table (loops=10000)

The planner expected 10 outer rows. There were 10,000. The inner side ran 10,000 times. The query took 15 seconds.

This is the classic "bad nested loop" story. The fix is not to disable nested loops. The fix is to fix the row estimate.

Fixing the estimate

The estimate is wrong because of statistics. Steps:

  1. ANALYZE outer_table; — refresh statistics.
  2. If still wrong, increase the statistics target on the relevant column: ALTER TABLE outer_table ALTER COLUMN status SET STATISTICS 1000; ANALYZE;
  3. If the issue is correlated columns, add extended statistics.
  4. If the predicate is an expression the planner cannot model, rewrite the query.

Most of the time, step 1 fixes it.

The cases where the planner is structurally wrong

A few cases where statistics will not save you:

  • Highly skewed data. A status column where 99.9% of rows are 'completed' and 0.1% are 'pending'. Default statistics may not capture the skew. Increase statistics target or use extended statistics.
  • JOIN on a JSONB path. Path expressions are opaque to the planner. The fix is usually a generated column with proper statistics.
  • Correlated WHERE clauses. WHERE country = 'US' AND state = 'CA'. The planner assumes independence and multiplies probabilities. With extended statistics, it knows the correlation.
  • Time-series with sharp distribution shifts. Yesterday's pattern was different from today's. Auto-analyze may not have caught up.

For these, the answer is sometimes statistics tuning and sometimes a different schema.

When to override the planner

Postgres does not have native query hints, but pg_hint_plan (an extension) provides them. I use it rarely:

/*+ HashJoin(orders customers) */
SELECT * FROM orders JOIN customers ON ...;

This forces a hash join regardless of cost estimates. It is a workaround, not a fix. The right answer is to fix the statistics. But for queries the team owns and tests, where the right plan is known and the statistics are unreliable, hints can be the practical answer.

More commonly: rewrite the query so the planner has fewer choices. Replace a complex join with a CTE that limits one side first. Add a more selective WHERE clause. The fewer plan options the planner has, the less it can mis-estimate.

Reading the plan: a checklist

When I see a nested loop in EXPLAIN ANALYZE:

  1. What is the outer-side row count (estimate vs actual)?
  2. If estimate is way off, that is the bug. Fix statistics.
  3. If estimate is correct, look at the inner side. Does it use an index?
  4. If yes, the plan is fine. The query is doing the work it has to.
  5. If no, add an index on the join column.

The pattern most people miss: nested loop on a small outer with an indexed inner is the optimal plan. Replacing it with a hash join is slower because the hash join has to read all of the inner side to build the table, even if only 10 rows match.

The planner mostly knows this. When it picks nested loop, it usually picked it for a good reason. The interesting cases are when the row estimates are off, which is almost always a statistics fix, not a plan-shape fix.