Parallel query feels like it should be simple. One worker is slow, so use four. Four are slow, so use eight. I believed that for exactly one week, which is how long it took the monthly reporting load to fall over after I bumped the worker count across the board.
Real systems are less polite than the intuition. Workers need coordination, memory, CPU, and enough useful work to be worth launching. Get any of that wrong and you've added overhead, not speed.
Where parallelism actually pays
The wins are real when the work per row is heavy, the row count is large, and there are idle cores to use. A big sequential scan, a large hash join, a heavy aggregation over millions of rows on an otherwise quiet box — that's the sweet spot, and there parallel query is genuinely transformative.
I confirm it the same way every time: read the plan for a Gather node and compare workers planned against workers launched.
EXPLAIN (ANALYZE, BUFFERS)
SELECT category_id, count(*)
FROM events
WHERE created_at >= now() - interval '30 days'
GROUP BY category_id;
-- Watch "Workers Planned" vs "Workers Launched" under the Gather
Where it quietly loses
My mistake was a global setting. On a busy box, every medium query started recruiting workers, stealing CPU from each other and from the OLTP traffic that actually mattered. Small queries paid coordination overhead for nothing, and concurrent reports fought over the same cores and the same disk.
Six workers reading one disk don't get six times the bandwidth. They contend, and you can end up slower than a single clean sequential reader.
Tune per workload, not globally
The fix that stuck was setting parallelism per role. The OLTP application stays conservative or single-threaded; the analytics role gets generous limits because it runs when the box is quiet. One global number can't serve both, and trying to make it do so is how I broke things in the first place.
ALTER ROLE app_user SET max_parallel_workers_per_gather = 0;
ALTER ROLE analytics SET max_parallel_workers_per_gather = 6;
What I keep coming back to
None of this is clever. Name the failure mode, capture the before number, change one thing, and compare the same signal afterward. The boring version of database work is the version that actually holds up at 2 AM.