By dbivioJune 6, 20269 min read

PostgreSQL Parallel Query: When More Workers Make It Slower

Parallel query can turn a 30-second scan into 6 seconds, or it can spawn six workers that fight over the same disk and finish slower than one. Knowing when it helps is mostly knowing when it doesn't.

Someone read that PostgreSQL supports parallel query, set max_parallel_workers_per_gather to 8 globally, and declared the analytics database "tuned." For a week it looked great. Then the monthly close ran twenty concurrent reports and the box fell over, every query slower than before the change.

Parallelism is not free speed. It is trading coordination overhead and shared resources for more CPUs on one query. That trade is fantastic for a big scan on an idle machine and terrible for many medium queries on a busy one.

So let's get specific about when each is true.

How a parallel plan is shaped

A parallel plan has a Gather (or Gather Merge) node. Below it, multiple worker processes each run a copy of the plan on a slice of the data. Above it, the leader collects and combines their results. The planner decides how many workers to request based on table size and your settings.

The key insight is that the leader has to launch those workers, hand out work, and merge results. That setup has a fixed cost, modeled by parallel_setup_cost, plus a per-row transfer cost. For a small query, that overhead dwarfs any benefit.

EXPLAIN (ANALYZE, BUFFERS)
SELECT category_id, count(*)
FROM events
WHERE created_at >= now() - interval '30 days'
GROUP BY category_id;
-- Look for: Gather / Gather Merge, "Workers Planned" vs "Workers Launched"

When parallelism genuinely wins

Parallel query pays off when the work per row is large, the row count is large, and there are spare CPU cores to use. Big sequential scans, large hash joins, and heavy aggregations over millions of rows are the sweet spot. A single analytics query on an otherwise idle warehouse is the textbook win.

It works because the dominant cost is CPU-bound row processing that genuinely divides across workers, and there are idle cores to absorb it.

When it quietly loses

It loses when the query is small, when it returns few rows, or when the system is already busy. Launching four workers to scan 5000 rows is pure overhead. Worse, on a busy OLTP box parallel query steals CPU from everything else, so you speed up a handful of reports and slow down everyone.

It also loses on IO-bound scans on slow storage. Six workers all reading from the same disk do not get six times the bandwidth; they contend, and you can end up slower than a single sequential reader the storage could service cleanly.

The settings that actually matter

max_parallel_workers_per_gather caps workers for one Gather. max_parallel_workers caps the pool across all queries, and it draws from max_worker_processes. If your per-gather setting is high but the pool is exhausted, you get "Workers Planned: 6, Workers Launched: 2" and a plan that was costed for help it never received.

parallel_setup_cost and parallel_tuple_cost tell the planner how expensive coordination is. Raising them makes the planner more conservative, which is often what a mixed OLTP/analytics box wants. min_parallel_table_scan_size sets the floor below which a scan is not worth parallelizing.

-- Per-gather cap (session or role level is safer than global)
SET max_parallel_workers_per_gather = 4;

-- Cluster-wide pool and its source
SHOW max_parallel_workers;       -- pool for parallel query
SHOW max_worker_processes;       -- total bgworker slots (pool draws from here)

-- Make the planner more reluctant on a busy box
SET parallel_setup_cost = 5000;  -- default 1000

Set it per workload, not globally

The mistake that started this article was a single global value. The right approach is to match the setting to the workload. Give the reporting role or the analytics connection a generous per-gather limit, and keep the OLTP role conservative or at zero.

Per-role settings make this clean: the app that serves user requests should rarely go parallel, while the nightly batch can use the whole box because nothing else is running.

-- OLTP app: stay single-threaded per query
ALTER ROLE app_user SET max_parallel_workers_per_gather = 0;

-- Reporting: use the cores when the box is quiet
ALTER ROLE analytics SET max_parallel_workers_per_gather = 6;

Things that silently disable parallelism

Plenty of constructs force a serial plan, and people waste time wondering why their workers never launched. A query that writes (most INSERT/UPDATE/DELETE paths), cursors, functions marked anything other than PARALLEL SAFE, and certain locking clauses can all disable parallelism.

If you expect a parallel plan and do not get one, check for a PARALLEL UNSAFE function in the query, confirm the table is above min_parallel_table_scan_size, and verify the pool is not exhausted by other parallel queries at that moment.

Parallelism helps: huge scans/joins/aggregations on idle cores.
Parallelism hurts: small queries, high concurrency, contended slow disks.
Tune per role, not globally; OLTP conservative, analytics generous.
Watch Workers Planned vs Launched to catch pool exhaustion.
PARALLEL UNSAFE functions and writes quietly force serial plans.

The practical standard

The best PostgreSQL performance work is boring in the right way. Name the failure mode, capture the before plan or metric, make one change, and compare the exact same signal afterward. Anything else is just a more confident guess.

← Back to all articles