The most useful sentence in a PostgreSQL incident is often: the database is not doing work, it is waiting. That sounds small, but it changes the whole response. You stop adding indexes to a lock problem. You stop tuning memory for a client that is not reading results.
pg_stat_activity is where I start when latency jumps right now. It tells me which sessions are active, what they are waiting on, when their query started, and whether a transaction has been open long enough to become part of the problem.
Wait events are not a full tracing system, but they are enough to keep an incident from drifting into guesswork.
Get the room view first
Before reading individual queries, group sessions by state and wait type. You want to know whether the cluster is mostly executing, blocked, idle in transaction, or waiting on clients.
SELECT
state,
wait_event_type,
wait_event,
count(*) AS sessions
FROM pg_stat_activity
WHERE pid <> pg_backend_pid()
GROUP BY 1, 2, 3
ORDER BY sessions DESC;
Lock waits need blockers, not theories
If wait_event_type is Lock, the next question is simple: who is blocking whom? Long blockers are often boring sessions: a migration, a forgotten transaction, an admin console, or a job that touched more rows than expected.
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocker.pid AS blocker_pid,
blocker.state AS blocker_state,
blocker.xact_start AS blocker_xact_start,
blocker.query AS blocker_query
FROM pg_stat_activity blocked
JOIN pg_locks blocked_locks
ON blocked_locks.pid = blocked.pid
AND NOT blocked_locks.granted
JOIN pg_locks blocker_locks
ON blocker_locks.locktype = blocked_locks.locktype
AND blocker_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocker_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocker_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocker_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocker_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocker_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocker_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocker_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocker_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocker_locks.pid <> blocked_locks.pid
JOIN pg_stat_activity blocker ON blocker.pid = blocker_locks.pid
WHERE blocker_locks.granted;
Client waits are application performance too
A session waiting on the client can mean PostgreSQL produced rows faster than the application consumed them. That shows up with huge result sets, slow network links, ORM iteration, or an API endpoint streaming too much data.
Do not tune the database blindly when the wait says the database is waiting for the client.
I/O waits need storage context
If many active sessions wait on I/O, check whether the workload changed or the storage slowed down. A new report, a cold cache after restart, a bad plan, or cloud disk throttling can all look similar from inside PostgreSQL.
- Compare wait events with shared block reads in
pg_stat_statements. - Check checkpoint and background writer metrics.
- Check host or cloud storage latency.
- Look for a plan that started scanning a much larger part of a table.
Open transactions are performance hazards
A transaction that sits open can hold locks, prevent vacuum cleanup, and keep old row versions alive. During incidents, I always sort by xact_start, not just query_start.
SELECT
pid,
state,
now() - xact_start AS xact_age,
wait_event_type,
wait_event,
left(query, 180) AS query_sample
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start
LIMIT 20;
The production habit
The teams that get good at PostgreSQL performance do not chase every knob. They turn a vague complaint into a named failure mode, collect one clean measurement, make one change, and then compare the next measurement against the first. That rhythm is slower than guessing for the first hour and much faster by the end of the incident.