A data migration was projected to take 14 hours. The loader was an ORM doing one INSERT per row, dutifully, 30 million times, each one its own statement with its own round trip and its own WAL record and its own index updates.
We rewrote the load to use COPY into a staging table with no indexes, built the indexes once at the end, and merged into the target in batches. Total time: 22 minutes. Same data, same hardware, same final state.
Bulk loading is one of the places where doing it the obvious way is 30x slower than doing it the right way, and the right way is not complicated. It is just different from how you write normal application code.
COPY is not just a faster INSERT
COPY is a dedicated bulk-load path. It parses input in a tight loop, avoids the per-statement planning and round-trip overhead of individual INSERTs, and writes rows far more efficiently. For loading from a file or a stream, it is the fastest thing PostgreSQL offers, by a wide margin.
Most drivers expose it (COPY FROM STDIN), so you rarely need a file on the server. You stream rows into the connection and PostgreSQL ingests them in bulk. If you take one thing from this article: stop looping INSERTs and use COPY.
-- From a file the server can read
COPY events (id, user_id, payload, created_at)
FROM '/data/events.csv' WITH (FORMAT csv, HEADER true);
-- From a client stream (psql)
\copy events FROM 'events.csv' WITH (FORMAT csv, HEADER true)
If you must INSERT, batch hard
Sometimes COPY does not fit, for example when you need ON CONFLICT handling. In that case, never do one row per statement. Use multi-row INSERTs of hundreds to a few thousand rows per statement, inside a transaction. This collapses round trips and per-statement overhead, and it is the single biggest win available short of COPY.
There is a sweet spot: too few rows per batch leaves overhead on the table, too many makes statements unwieldy and memory-hungry. Hundreds to low thousands per statement is usually right.
-- One statement, many rows, with conflict handling
INSERT INTO events (id, user_id, payload)
VALUES
(1, 10, '...'),
(2, 11, '...'),
-- ... up to ~1000 rows ...
(1000, 9, '...')
ON CONFLICT (id) DO NOTHING;
Build indexes after, not during
Every index on the target table is maintained on every row you insert. Loading into a table with five indexes means five index updates per row, and index maintenance during a load is often the dominant cost. The fix is to load into an index-free table and build the indexes once at the end.
Building an index once over the finished data is far cheaper than incrementally maintaining it across millions of inserts. The same applies to foreign keys and CHECK constraints: add them after the load and let PostgreSQL validate the whole table in one pass.
-- Stage with no indexes/constraints, then add them once
CREATE TABLE events_staging (LIKE events INCLUDING DEFAULTS);
-- ... COPY 30M rows into events_staging ...
CREATE INDEX ON events_staging (user_id);
CREATE INDEX ON events_staging (created_at);
-- now swap/merge into the live table
maintenance_work_mem makes index builds fly
Index creation uses maintenance_work_mem, not work_mem. The default is conservative. For a big load, raising it for the session lets the index build sort in memory instead of spilling to disk, which can cut build time dramatically. Set it high just for the loading session, then leave the global default alone.
SET maintenance_work_mem = '2GB'; -- session-scoped for the load
CREATE INDEX events_user_idx ON events_staging (user_id);
Unlogged and staging tables
An unlogged table skips WAL, which makes writes much faster, at the cost that its contents are lost on a crash. That trade is perfect for a staging table whose data you can simply reload if something fails mid-migration. Load into unlogged staging, transform there, then insert into the real, logged table.
Be clear-eyed about the trade: unlogged tables are not replicated and do not survive a crash, so they are only for transient load-and-transform data, never for the destination. After staging, the move into the durable table goes through WAL as normal.
CREATE UNLOGGED TABLE events_staging (LIKE events INCLUDING DEFAULTS);
-- fast COPY here (no WAL), validate/transform, then:
INSERT INTO events SELECT * FROM events_staging;
Mind WAL and autovacuum during big loads
A huge load generates a flood of WAL and can trigger frequent checkpoints, which throttle the load. Giving the load room, larger max_wal_size and a longer checkpoint window for the duration, smooths it out. After the load, an explicit ANALYZE (and often a VACUUM) gets the table ready so the planner has fresh statistics and the visibility map is set.
Do not skip the post-load ANALYZE. A freshly loaded table with stale or empty statistics will get terrible plans until autovacuum catches up, and you do not want to find that out from the first user query.
- Use COPY; fall back to multi-row batched INSERTs only when you must.
- Load into an index-free, constraint-free staging table.
- Build indexes and add constraints once, after the data is in.
- Raise maintenance_work_mem for the index builds.
- Use unlogged staging for transient data; never for the destination.
- ANALYZE (and usually VACUUM) the table when the load finishes.
The practical standard
The best PostgreSQL performance work is boring in the right way. Name the failure mode, capture the before plan or metric, make one change, and compare the exact same signal afterward. Anything else is just a more confident guess.