The last team I worked with had a 200-line subquery to get "the most recent order per customer." Window functions, CTEs, joins — the whole kitchen. The same query in DISTINCT ON form was four lines. Faster too, when the index was right.
DISTINCT ON is a Postgres-specific feature, not standard SQL, which is probably why people forget it exists. Here is what it does and where it shines.
What it actually does
DISTINCT ON returns one row per unique combination of the columns in the parentheses, choosing which row using the ORDER BY clause:
SELECT DISTINCT ON (customer_id)
customer_id, id, total_cents, created_at
FROM orders
ORDER BY customer_id, created_at DESC;
For each customer_id, this returns the row with the highest created_at. One row per customer, sorted to the most recent.
The key constraint that most people miss: the columns in DISTINCT ON must appear at the start of ORDER BY. The DISTINCT ON columns determine which group; the trailing ORDER BY columns determine which row of the group wins.
The trap: incomplete ORDER BY
-- WRONG: which row wins is undefined
SELECT DISTINCT ON (customer_id)
customer_id, total_cents
FROM orders;
Without an ORDER BY, the row picked from each group is arbitrary. Postgres returns whatever it scanned first. This produces non-deterministic results that are technically valid SQL and almost always wrong for the use case.
Always include the full ORDER BY:
SELECT DISTINCT ON (customer_id)
customer_id, total_cents
FROM orders
ORDER BY customer_id, created_at DESC;
Now the row picked from each group is the most recent.
DISTINCT ON vs window functions
The equivalent window-function approach:
SELECT customer_id, id, total_cents, created_at
FROM (
SELECT *, row_number() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders
) t
WHERE rn = 1;
Functionally equivalent. DISTINCT ON wins on:
- Readability. Four lines vs seven, less indirection.
- Performance with the right index. DISTINCT ON can use an index scan that emits rows in pre-sorted order and skips ahead to the next group. Window functions sometimes cannot.
- No subquery overhead. The plan is simpler.
Window functions win when you need more than one row per group, or when you need rank-style logic (top 3 per customer). DISTINCT ON only does "top 1."
The index that makes it fast
For DISTINCT ON (customer_id) ... ORDER BY customer_id, created_at DESC, the matching index is:
CREATE INDEX orders_customer_created
ON orders (customer_id, created_at DESC);
With this index, Postgres can:
- Scan the index in order.
- For each new
customer_id, take the first row. - Skip ahead to the next
customer_id(using the index's group structure).
This is much faster than scanning the whole table and sorting. The plan node to look for in EXPLAIN is Unique over an Index Scan.
For a million-row orders table with 100,000 customers, this query runs in milliseconds. Without the index, it has to sort the entire table.
A few practical examples
Last login per user:
SELECT DISTINCT ON (user_id) user_id, ip, logged_in_at
FROM login_events
ORDER BY user_id, logged_in_at DESC;
Most expensive order per customer:
SELECT DISTINCT ON (customer_id) customer_id, id, total_cents
FROM orders
ORDER BY customer_id, total_cents DESC;
First version of each document:
SELECT DISTINCT ON (document_id) document_id, version_id, content
FROM document_versions
ORDER BY document_id, created_at ASC;
Latest event of each type:
SELECT DISTINCT ON (event_type) event_type, payload, created_at
FROM events
ORDER BY event_type, created_at DESC;
In each case, four lines for what would otherwise be a more complex query.
What it does not do well
- Top N per group. DISTINCT ON returns exactly one. For top 3, use window functions with
ROW_NUMBER(). - Aggregation. DISTINCT ON picks one row; it does not compute sums or counts. Use GROUP BY for those.
- Cross-row comparisons. "Each row plus the previous row's timestamp" is a window function with
LAG, not DISTINCT ON.
When the team encounters DISTINCT ON for the first time
It looks weird. The ORDER BY rule is non-obvious. The lack of ANSI SQL portability bothers some people. Worth doing a brief team explanation when it first appears in a PR — a five-minute walkthrough of the syntax saves hours of "why does this work" later.
My default recommendation: when the query shape is "one row per X, picked by Y," reach for DISTINCT ON. It is shorter, it is faster with the right index, and it expresses intent more directly than the alternatives.