5 min read

BRIN Indexes in Postgres: Tiny, Underrated, and Sometimes Exactly Right

BRIN indexes are 100x smaller than B-tree on the right data. They are also useless on the wrong data. The deciding factor is whether the table is naturally sorted.

BRIN (Block Range INdex) is one of the most situational index types in Postgres. Used correctly, it is the smallest, cheapest index that gets the job done. Used incorrectly, it is useless. The whole game is whether the table is naturally sorted.

For time-series data and append-only logs, BRIN can be 100x smaller than a B-tree on the same column with comparable read performance. For randomly-ordered data, it is essentially worthless.

How BRIN works

A standard B-tree index records the value-to-row mapping for every row. A BRIN index records, for each block range (default 128 pages), the minimum and maximum values within that range.

Querying "WHERE created_at > X":

  • B-tree: walks the tree, finds matching rows, fetches them.
  • BRIN: scans the per-range summaries, finds ranges whose [min, max] contains values matching the predicate, scans those ranges sequentially.

BRIN's win: the index itself is tiny because it only stores per-range summaries. For a 100GB table with default 128-page ranges, the BRIN index is on the order of megabytes — the entire index fits in cache.

BRIN's cost: for each range that might contain matching rows, you have to scan the whole range. If the data is naturally clustered (early rows have early values, later rows have later values), most ranges are skipped. If the data is randomly ordered, every range might contain a matching row, and BRIN devolves into a sequential scan.

The right data for BRIN

The canonical case: time-series data where rows are inserted in chronological order. An events table where the first row has the smallest created_at and the last row has the largest.

CREATE INDEX events_created_at_brin
  ON events USING BRIN (created_at);

For SELECT * FROM events WHERE created_at > now() - interval '1 hour', the BRIN index identifies the few ranges at the end of the table that contain recent rows. The query scans only those ranges. Fast.

The index size on a 50GB events table is typically under 1MB. Compare to a B-tree on the same column at potentially 5GB.

When BRIN is wrong

If the data is not naturally sorted, BRIN does not help. Examples:

  • A users table with a created_at column. Users are inserted in chronological order, so this works.
  • A users table with a last_login column. Updated frequently, no relationship to physical order. BRIN is useless here.
  • Any column where the data has no relationship to insertion order.

The diagnostic: if the table is naturally sorted by the column, BRIN works. Otherwise B-tree.

You can verify natural sorting by running:

SELECT correlation
FROM pg_stats
WHERE tablename = 'events' AND attname = 'created_at';

Values near 1.0 (or -1.0) indicate strong sorting; values near 0 indicate no relationship between row order and column value. BRIN is effective when correlation is above ~0.9.

Tuning the range size

The default pages_per_range = 128 is fine for most cases. For tables where the natural ordering is very tight (every block has a narrow range of values), smaller ranges give finer-grained skipping:

CREATE INDEX events_created_at_brin
  ON events USING BRIN (created_at)
  WITH (pages_per_range = 32);

Smaller ranges → larger index → finer query precision. Tradeoff to test on actual data.

BRIN with multiple columns

BRIN can index multiple columns:

CREATE INDEX events_brin
  ON events USING BRIN (created_at, user_id)
  WITH (pages_per_range = 64);

For each range, BRIN stores [min, max] for each column independently. A query with predicates on both columns benefits from skipping ranges where either column rules them out.

This works when both columns have natural ordering. Often only one does, in which case the multi-column BRIN is no better than single-column.

Maintenance: BRIN sees stale ranges

BRIN's per-range summaries do not auto-update when rows are inserted, updated, or deleted. Running:

SELECT brin_summarize_new_values('events_created_at_brin');

Updates ranges that have new data. Postgres also calls this automatically during VACUUM.

For very high-write tables, BRIN can fall behind. Periodic brin_summarize_new_values (or aggressive autovacuum) keeps it current.

A real comparison

A team I worked with had a 200GB events table. They had a B-tree index on created_at at 18GB.

We replaced it with a BRIN:

CREATE INDEX events_created_at_brin
  ON events USING BRIN (created_at);

DROP INDEX events_created_at_btree;

New index size: 4MB. Read performance for date-range queries: same. Write performance: meaningfully better because the BRIN does not need to rebuild posting lists on every insert.

The disk space savings paid for two months of an additional database server.

When I reach for BRIN

The checklist:

  1. Is the table append-only or near-append-only?
  2. Does the indexed column correlate strongly with insertion order?
  3. Is the table at least a few GB? (Below this, B-tree's overhead is not meaningful.)
  4. Are queries doing range scans (>, <, BETWEEN) rather than equality lookups?

All four → BRIN. Any of them missing → B-tree.

The Postgres docs sometimes describe BRIN as "for very large tables." That is correct, but "with naturally-sorted data" is the more important qualifier. A 200GB table of randomly-ordered rows is not a BRIN candidate. A 50GB table of strictly chronological events is.

It is the index type I recommend most often after B-tree, and the one most teams under-use because they have not encountered it before.