April 5, 20265 min read

COUNT(*) at Scale in Postgres: When the Honest Answer is "It's Complicated"

A `SELECT count(*)` on a small table is instant. On a 500M-row table it can take a minute. Here is what to do when the count itself is the slow query.

The first time someone asks "why is COUNT(*) so slow," they have usually just run it on a 200M-row table and waited 90 seconds. The bug report follows: "the database is slow."

It is not slow. It is doing the only thing the query asked for. The answer is rarely to make COUNT(*) faster — it is to ask whether you actually need an exact count.

Here is the framework I use.

Why COUNT(*) cannot use an index alone

Postgres uses MVCC. Every row has multiple versions; each version is visible to some transactions and not others. The index does not store visibility information, so the database cannot answer COUNT(*) from the index alone — it has to consult the heap to determine which rows are currently visible.

The result: COUNT(*) on an unfiltered table is roughly an O(n) operation. On 500M rows, that is a lot of work no matter how fast your disk is.

Visibility map and index-only scans help in some cases, but not enough to make 500M-row COUNT(*) fast. The query is doing what it has to do.

When exact COUNT is necessary

Genuine cases:

Pagination that needs total page count.
Compliance reports.
Billing calculations.
Anywhere the user reads the number and trusts it to the digit.

For these, you have to pay the cost. Mitigations:

Run the COUNT on a read replica.
Cache the result for a few seconds at the application layer.
Run it asynchronously and report "loading..." while it works.

When approximate COUNT is fine

Most product UI cases. "This filter has about 1.2 million matching rows" is enough information for a search results page. The user does not care about 1,234,567 vs 1,234,568.

For unfiltered or simply-filtered counts, pg_class.reltuples is the cheap approximation:

SELECT reltuples::BIGINT AS estimate
FROM pg_class
WHERE oid = 'orders'::regclass;

The estimate is updated by ANALYZE. Accuracy is generally within a few percent on healthy tables. Cost is microseconds.

For filtered counts, the EXPLAIN-based estimate works:

EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
-- Look at the rows= field in the output

You can extract this programmatically with EXPLAIN (FORMAT JSON) and parse the result. Some teams build a function that returns the estimated count from EXPLAIN; it is a 20-line plpgsql function that solves a real problem.

When pagination needs better than this

"Page 1 of 1234" pagination requires an exact count. "Page 1 of many" can use a relative count: "showing 25 of approximately 1.2M."

Keyset pagination removes the count problem entirely:

-- Instead of OFFSET, use the last row's id
SELECT * FROM orders
WHERE id > $last_seen_id
ORDER BY id
LIMIT 25;

The UI shows "next page" without claiming to know the total. For most product views, this is the right tradeoff: faster, no count needed, scales to arbitrary table sizes.

The applications that genuinely need page-of-N pagination are rare. Search results, admin dashboards, sometimes audit views. For everything else, keyset is better.

Maintaining a counter table

For counts that change frequently and must be cheap to read, a counter table:

CREATE TABLE table_counts (
  table_name TEXT PRIMARY KEY,
  row_count BIGINT NOT NULL
);

-- Trigger to update on INSERT/DELETE
CREATE OR REPLACE FUNCTION increment_orders_count()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    UPDATE table_counts SET row_count = row_count + 1 WHERE table_name = 'orders';
  ELSIF TG_OP = 'DELETE' THEN
    UPDATE table_counts SET row_count = row_count - 1 WHERE table_name = 'orders';
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER orders_count_trigger
AFTER INSERT OR DELETE ON orders
FOR EACH ROW EXECUTE FUNCTION increment_orders_count();

The trade: every insert and delete now contends on a single row in table_counts. Under high concurrency, this becomes a serialization bottleneck.

For very high-throughput tables, the better shape is a table with multiple rows that are summed:

CREATE TABLE table_counts_sharded (
  table_name TEXT,
  shard SMALLINT,
  delta BIGINT NOT NULL,
  PRIMARY KEY (table_name, shard)
);

-- Insert random shard, periodically aggregate

This is more code, but it removes the contention.

What I do today

For any new feature that mentions a count:

Ask: does the user really need an exact number, or an approximate one?
If approximate: pg_class.reltuples or EXPLAIN-based estimate.
If exact and rarely viewed: COUNT(*) on a read replica with a cache.
If exact and frequently viewed: counter table, possibly sharded.
If pagination: keyset, not OFFSET-based.

The rule that matters: do not let "the user wants a count" automatically mean "COUNT(*) on every page load." That is the path to slow pages, then to a slow database, then to an angry team trying to optimize the unoptimizable.

← Back to all articles