The rule for composite index column order is famously misquoted. "Most selective column first" is the version most people remember. It is also wrong about half the time.
The actual rule depends on the queries you run, not on column statistics. Here is what I tell teams.
How composite indexes are organized
A B-tree index on (a, b) is sorted first by a, then by b. To find rows where a = 1, Postgres jumps to the section of the index where a = 1 and scans linearly. Within that section, rows are also sorted by b, so adding b = 'foo' is a refinement of the same scan.
The key consequence: an index on (a, b) can answer queries that filter on a alone, or on a and b together. It cannot answer queries that filter on b alone — Postgres would have to scan the entire index because rows with the same b are scattered across all a values.
This is the actual rule. Every other heuristic is a corollary.
The rule, properly stated
For an index on (col_X, col_Y):
- Queries with
WHERE col_X = ?use the index. - Queries with
WHERE col_X = ? AND col_Y = ?use the index even better. - Queries with
WHERE col_Y = ?(alone) cannot use this index.
The leading column is the one that appears in every query that should use the index. Everything else follows.
A practical example
Suppose your most common queries are:
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped';
SELECT * FROM orders WHERE customer_id = 123 AND created_at > now() - interval '30 days';
All three filter on customer_id. The right composite index has customer_id as the leading column:
CREATE INDEX orders_customer_created
ON orders (customer_id, created_at DESC);
This serves all three queries. The third one (with created_at filter) gets the maximum benefit because Postgres can do an index range scan.
If your most common queries had been the opposite:
SELECT * FROM orders WHERE created_at > now() - interval '30 days';
SELECT * FROM orders WHERE created_at > now() - interval '30 days' AND status = 'shipped';
Then created_at should be the leading column.
The selectivity heuristic, properly applied
The "most selective first" rule is sometimes useful, but only as a tiebreaker. If two columns are both equality-filtered and both equally common in queries, the more selective one as leading column is better — because the initial seek narrows down to a smaller subset.
But the dominant factor is which queries actually exist. If status = ? only appears in 5% of queries and customer_id = ? appears in 95%, the leading column is customer_id regardless of selectivity.
Selectivity matters only when the queries are tied on prevalence.
Inequalities and the trailing column
A detail that catches people: if your query has an equality on one column and a range on another, the equality should lead.
-- This query: equality on customer_id, range on created_at
SELECT * FROM orders
WHERE customer_id = 123
AND created_at >= now() - interval '30 days';
-- Right index:
CREATE INDEX orders_customer_created
ON orders (customer_id, created_at);
-- Wrong index (range column first):
CREATE INDEX orders_created_customer
ON orders (created_at, customer_id);
With the right index, Postgres seeks to customer_id = 123 and scans only the recent rows. With the wrong index, the range scan over all customers' recent orders is wider than necessary.
General rule: equality columns first, then range/inequality columns last.
Direction of trailing columns
For ORDER BY queries, the index direction matters:
SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;
-- Index that supports the ORDER BY:
CREATE INDEX orders_customer_created_desc
ON orders (customer_id, created_at DESC);
With created_at DESC in the index, the query reads rows in pre-sorted order without an extra sort step. With created_at ASC, Postgres has to scan and reverse, which works but is slower.
For most tables, the most-recent direction is the most common, so DESC on time columns is the default.
How wide should the index be?
Including more columns makes the index larger. Diminishing returns kick in fast:
- 2 columns: usually a clear win for queries that match.
- 3 columns: sometimes useful, especially for narrow ORDER BY.
- 4+ columns: rarely worth it; the index gets large and only helps queries that include all four.
For each additional column, ask: does this column appear in the WHERE / ORDER BY of a hot query? If yes, add it. If no, do not.
INCLUDE for covered queries
A newer feature worth knowing: INCLUDE adds columns to the index leaf nodes without making them part of the sort key. The index can serve queries that need those columns without going to the heap.
CREATE INDEX orders_customer_covered
ON orders (customer_id, created_at DESC)
INCLUDE (total_cents, status);
A query like SELECT customer_id, created_at, total_cents, status FROM orders WHERE customer_id = 123 can answer entirely from the index. No heap fetch. Index-only scan, much faster.
The trade: the index is bigger. For hot read paths, often worth it.
Three diagnostic questions
When designing a composite index, ask:
- What queries should this index serve? List them.
- Which columns appear in every one of those queries? Those are leading-column candidates.
- Which columns are filtered by equality vs by range? Equalities lead.
The answer falls out from there. Most of the time, two or three queries share enough structure that a single composite index serves all of them. When the queries diverge significantly, you might need two separate indexes.
The failure mode I see most: adding three single-column indexes when a two-column composite would have handled all three queries faster and cheaper. Composite indexes are usually the right answer for related queries; single-column indexes are for unrelated ones.