CPU, Memory, and I/O9 min read

PostgreSQL Server-Side Cursors: How to Stream Millions of Rows Without Running Out of Memory

A nightly export OOM-killed itself trying to load ten million rows into memory at once. The fix wasn't more RAM — it was telling PostgreSQL to hand the rows over a batch at a time with a server-side cursor.

The nightly export had run fine for a year, then it started getting OOM-killed. Nobody had changed the code. The data had simply grown past ten million rows, and the export tried to load all of them into memory in one go before writing the first line.

The instinct is to throw more memory at it. The right fix is to stop materializing the whole result set at once. A server-side cursor lets PostgreSQL keep the result on its side and hand it to you in batches, so memory stays flat no matter how many rows there are.

Why the default behavior eats memory

By default, most client libraries run a query and pull the entire result set into client memory before your code sees the first row. For a hundred rows that is invisible. For ten million wide rows it is gigabytes, and the process dies before it can do any work.

This is a client-side decision, not a PostgreSQL limit. PostgreSQL is perfectly happy to produce rows incrementally; the driver is the one buffering everything.

DECLARE a cursor and FETCH in batches

A cursor tells PostgreSQL to hold the query open and give you rows on demand. You declare it inside a transaction, then fetch a manageable batch at a time, process it, and fetch the next.

Memory now scales with your batch size, not the result size. Whether the query returns ten thousand rows or ten billion, the export touches the same small amount of memory at any moment.

BEGIN;
DECLARE export_cur CURSOR FOR
  SELECT id, payload, created_at FROM events ORDER BY id;

FETCH 10000 FROM export_cur;   -- process this batch
FETCH 10000 FROM export_cur;   -- then the next
-- ... repeat until it returns no rows ...

CLOSE export_cur;
COMMIT;

Most drivers expose this for you

You rarely write raw DECLARE/FETCH. Drivers wrap it: psycopg calls it a named (server-side) cursor, JDBC enables it when you set a fetch size and disable autocommit, and most ORMs have a streaming or server-side cursor mode. The mechanism underneath is the same — the key is making sure you actually opt into it for big queries.

The planner adjusts for cursors

PostgreSQL knows a cursor is likely to be read incrementally and may not be fully consumed. The cursor_tuple_fraction setting tells it what fraction of rows you probably want, and it biases the planner toward plans that return the first rows quickly rather than plans that minimize total cost.

Usually the default is fine. If you always read the entire cursor, raising it tells the planner to optimize for the full scan instead of fast first rows.

The caveat: cursors keep a transaction open

A server-side cursor lives inside a transaction, which means that transaction stays open for the whole export. If the export takes hours, you now have a long-running transaction — which pins the vacuum horizon and can cause bloat, exactly the idle-in-transaction problem in slow motion.

For very long exports, an alternative is keyset pagination: instead of one long cursor, run many short queries that each fetch the next page by primary key, committing between them. It is more round trips, but no transaction stays open, so vacuum is never blocked.

  • Drivers buffer the whole result set by default — that is the OOM cause.
  • DECLARE/FETCH (or your driver's server-side cursor) streams in batches.
  • Memory scales with batch size, not result size.
  • cursor_tuple_fraction biases the plan toward fast first rows.
  • Cursors hold a transaction open; for very long runs, prefer keyset pagination.

The practical standard

The best PostgreSQL performance work is boring in the right way. Name the failure mode, capture the before plan or metric, make one change, and compare the exact same signal afterward. Anything else is just a more confident guess.