When a query needs more memory than work_mem allows, Postgres writes the excess to a temporary file. This is invisible to the application — the query still completes, just slower. For frequent queries, the disk traffic adds up.
Detecting when queries spill is straightforward; the mitigation is mostly about work_mem and indexing. Here is the framework.
How to detect spilling
The most direct signal is log_temp_files:
log_temp_files = 1024 -- log temp files larger than 1KB
After setting this and reloading, the Postgres log includes lines for any temp file created:
LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp1234.0", size 50331648
STATEMENT: SELECT ... ORDER BY ... LIMIT 100;
The size is in bytes (50MB in this example). The statement is the query that produced it.
For an aggregate view across queries:
SELECT temp_files, pg_size_pretty(temp_bytes) AS temp_total
FROM pg_stat_database
WHERE datname = current_database();
This gives total temp files written and total bytes since the last stats reset. Rising fast = lots of spilling.
What causes spilling
Several query operations can spill:
1. Sort that exceeds work_mem. A query with ORDER BY over a large dataset.
Sort (cost=...)
Sort Method: external merge Disk: 200MB
external merge means spilled to disk. quicksort means in-memory.
2. Hash join with too-large hash table.
Hash Join (cost=...)
-> Hash
Buckets: 65536 Batches: 32 Memory Usage: 1024kB
Batches > 1 means the hash table did not fit in memory and was split into disk-resident batches.
3. Hash aggregate that does not fit.
Aggregating GROUP BY with many groups. Same logic as hash join.
4. CTEs and subqueries materialized.
A CTE that the planner materializes (because it is referenced multiple times or marked MATERIALIZED) is held somewhere. If it is large, spillage.
5. Large EXPLAIN ANALYZE outputs in pre-Postgres-13 instances.
The analyze itself materializes data for verbose output. Rare cause but possible.
Mitigation: raise work_mem
The most direct fix:
SET LOCAL work_mem = '256MB';
SELECT ... heavy query ...;
Per-session. Confirm via re-run that the spill is gone (look at the EXPLAIN output for Sort Method: quicksort and Batches: 1).
For persistent spilling on a known heavy query, this can be set in the application's per-query configuration.
Do NOT raise work_mem globally to a high value. As discussed in the memory pressure post, work_mem is per-operation; a high global value can lead to OOM.
Mitigation: better indexing
For sorts that spill, an index that already provides the order eliminates the sort step entirely:
-- Slow: requires a sort
SELECT * FROM events WHERE user_id = 47 ORDER BY created_at DESC LIMIT 10;
-- With this index, the sort is gone
CREATE INDEX events_user_created ON events (user_id, created_at DESC);
No sort = no spill, regardless of work_mem.
Mitigation: smaller intermediate results
For queries that aggregate over wide rows, projecting only the needed columns reduces the intermediate size:
-- Slow: hashes 50-column rows
SELECT customer_id, count(*) FROM (SELECT * FROM big_orders) o GROUP BY customer_id;
-- Faster: hashes 1-column rows
SELECT customer_id, count(*) FROM (SELECT customer_id FROM big_orders) o GROUP BY customer_id;
The planner should optimize this in modern Postgres, but explicit projection helps in cases where it does not.
When spilling is unavoidable
For genuinely large operations (analytical reports over years of data), some spilling is fine. The cost is real but bounded.
What matters: the spill is happening on a tier of disk that can absorb it. SSDs handle temp files well. Spinning disks struggle.
For analytics workloads, allocate temp file space to fast storage:
temp_tablespaces = 'fast_temp' -- a tablespace on SSD
On cloud-managed Postgres, this may or may not be configurable.
What I monitor
The metrics that catch spilling-related issues:
pg_stat_database.temp_filesrate. Rising fast = increasing spillage.- Average
temp_bytes / temp_files. Large average = each spill is big. log_temp_fileslog entries for queries above a threshold.
For production, alert when temp file rate doubles relative to baseline. That usually means a new query path was deployed without proper indexing.
A worked example
Real scenario: a dashboard query was 5 seconds, expected to be milliseconds. EXPLAIN showed:
Sort (actual time=4500.000..4900.000 ...)
Sort Key: ...
Sort Method: external merge Disk: 380MB
The query was sorting 5M rows over a 70MB result set, but the row width times row count put it well above the default 4MB work_mem.
Fix path:
- Tried
SET LOCAL work_mem = '512MB'. Sort method becamequicksort. Query: 800ms. - Added an index on the sort columns. Sort step disappeared entirely. Query: 50ms.
- Reset work_mem to default — no longer needed for this query.
The fundamental fix was the index. The work_mem raise was a stopgap that became unnecessary.
What I do
For any persistently-slow query in production:
- EXPLAIN ANALYZE it. Look for
Sort Method: external mergeorBatches > 1. - If spilling, check whether an index would eliminate the sort or hash entirely.
- If spilling is unavoidable for this query, raise
work_memper-session. - Do not raise
work_memglobally without checking memory implications.
Most spilling is fixable with the right index. The few cases that are not are usually genuine analytics workloads where the work has to be done somewhere.