5 min read

Postgres High CPU: Where the Cycles Actually Go

When Postgres CPU is high, the question is not whether the server is fast enough. It is which queries are using the cycles and whether they should be.

A Postgres instance pegged on CPU is one of the more frustrating performance issues to diagnose because the symptom is generic and the causes are specific. "The database is slow" is what users see; "CPU at 95%" is what monitoring shows. The question is which queries are doing it.

Here is the diagnostic sequence I run.

Step 1: Confirm CPU is the bottleneck

The cloud's metric or top on the host should show CPU near 100%. If CPU is normal but queries are still slow, the issue is elsewhere (locks, I/O, memory).

Within Postgres, look at pg_stat_activity:

SELECT state, count(*)
FROM pg_stat_activity
WHERE backend_type = 'client backend'
GROUP BY state;

Many active queries means the workload is genuinely heavy. Few active but high CPU means a small number of queries are very expensive.

Step 2: Find the CPU-heavy queries

The canonical query in pg_stat_statements:

SELECT
  round(total_exec_time::numeric, 0) AS total_ms,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

The top queries by total_exec_time are where most CPU goes. Two patterns:

  • High total_ms and very high calls: a fast query called millions of times. The aggregate is the issue.
  • High total_ms and modest calls: each execution is expensive. EXPLAIN one.

For case 1, the fix is usually to reduce calls — add caching, batch the requests, eliminate N+1.

For case 2, the fix is the query itself.

Step 3: EXPLAIN the worst offender

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
<query from step 2>;

Look for:

  • Sequential scans on large tables: missing index. CPU goes to scanning rows.
  • Hash join with Batches > 1: hash table spilled to disk. CPU goes to multi-pass aggregation.
  • Nested loop with very high inner loop count: planning bug or missing index. CPU goes to repeated lookups.
  • Sort that exceeds work_mem: external sort. CPU plus disk I/O.

Most "high CPU" queries fall into one of these patterns.

Step 4: Check planning time vs execution time

A non-obvious cause of high CPU: the planner itself.

EXPLAIN ANALYZE <query>;
-- Output ends with:
-- Planning Time: 0.500 ms
-- Execution Time: 50.000 ms

For most queries, planning is a fraction of execution. If planning time is comparable to or larger than execution time, the issue is the planner.

Causes:

  • Many partitions, no partition pruning: the planner has to consider every partition. For 1000+ partitions with cheap queries, planning becomes the dominant cost. Check that the query has predicates that allow pruning.
  • Many indexes on the table: the planner considers each. Drop unused indexes.
  • Highly complex query with many JOINs: planning is intrinsically expensive.

For case 1 and 2, the fix is structural. For case 3, prepared statements help (skip the parse-plan step on repeated execution).

Step 5: Connection storms

A sudden spike in active connections — application restart, network blip — produces a CPU spike from connection setup overhead.

SELECT now() - backend_start AS conn_age, count(*)
FROM pg_stat_activity
WHERE backend_type = 'client backend'
GROUP BY 1
ORDER BY 1
LIMIT 20;

Many very young connections means a storm. Connection establishment is CPU-intensive in Postgres (especially with SSL).

Fix: connection pooler (PgBouncer). Reduces connection churn.

Step 6: Look at the wait events

If pg_wait_sampling extension is installed, it gives you per-query wait event breakdowns:

SELECT
  query,
  count(*) AS samples,
  array_agg(DISTINCT wait_event) AS wait_events
FROM pg_wait_sampling_profile
JOIN pg_stat_statements ON pg_stat_statements.queryid = pg_wait_sampling_profile.queryid
GROUP BY query
ORDER BY count(*) DESC
LIMIT 10;

Queries waiting on CPU events are CPU-bound (literally executing). Queries waiting on LWLock events are stuck on internal locks. Queries waiting on IO are disk-bound.

This tells you whether the CPU usage is doing work or fighting locks.

Step 7: Common patterns

The specific patterns I see most:

1. Missing index leading to seq scan. Check pg_stat_user_tables for tables with very high seq_tup_read.

2. Same query called millions of times. Application N+1 bug. Look at the highest-calls queries and trace back to application code.

3. Heavy report running during peak. A weekly cron that does aggregation pulled the database into peak CPU. Move off-peak or to a replica.

4. Bloat causing extra rows to scan. Vacuum-starved tables read dead tuples. CPU goes to filtering them out.

5. JSONB heavy queries. Path expressions on JSONB are CPU-expensive. Worth optimizing if they are dominant.

What I check first

For any "Postgres CPU is high" investigation:

  1. Top queries by total_exec_time. Pick the worst.
  2. EXPLAIN the worst. Find the seq scan or expensive node.
  3. Check connection count for storm signs.
  4. Check vacuum status for bloat-induced extra work.
  5. Plan a fix (index, query rewrite, application change).

Most cases resolve at step 2 with an index addition.

A worked example

Real incident: CPU at 95% sustained for 3 hours. Application timeouts climbing.

SELECT round(total_exec_time::numeric), calls, query
FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 5;

Top query: 30 million calls in 3 hours, 50ms each. Aggregate: 25 minutes of CPU time on a 4-CPU instance, hence 95% utilization.

The query was a simple lookup that should have been 2ms but was 50ms because it was doing a seq scan (the index had been dropped during a migration).

Fix: CREATE INDEX CONCURRENTLY .... Within minutes, query time dropped to 2ms, CPU dropped to 30%.

Total time from page to fix: 18 minutes.

The lesson: high CPU is rarely "the server is undersized." It is almost always a specific query doing more work than it should.