5 min read

COUNT(*) at Scale in Postgres: When the Honest Answer is "It's Complicated"

A `SELECT count(*)` on a small table is instant. On a 500M-row table it can take a minute. Here is what to do when the count itself is the slow query.

The first time someone asks "why is COUNT(*) so slow," they have usually just run it on a 200M-row table and waited 90 seconds. The bug report follows: "the database is slow."

It is not slow. It is doing the only thing the query asked for. The answer is rarely to make COUNT(*) faster — it is to ask whether you actually need an exact count.

Here is the framework I use.

Why COUNT(*) cannot use an index alone

Postgres uses MVCC. Every row has multiple versions; each version is visible to some transactions and not others. The index does not store visibility information, so the database cannot answer COUNT(*) from the index alone — it has to consult the heap to determine which rows are currently visible.

The result: COUNT(*) on an unfiltered table is roughly an O(n) operation. On 500M rows, that is a lot of work no matter how fast your disk is.

Visibility map and index-only scans help in some cases, but not enough to make 500M-row COUNT(*) fast. The query is doing what it has to do.

When exact COUNT is necessary

Genuine cases:

  • Pagination that needs total page count.
  • Compliance reports.
  • Billing calculations.
  • Anywhere the user reads the number and trusts it to the digit.

For these, you have to pay the cost. Mitigations:

  • Run the COUNT on a read replica.
  • Cache the result for a few seconds at the application layer.
  • Run it asynchronously and report "loading..." while it works.

When approximate COUNT is fine

Most product UI cases. "This filter has about 1.2 million matching rows" is enough information for a search results page. The user does not care about 1,234,567 vs 1,234,568.

For unfiltered or simply-filtered counts, pg_class.reltuples is the cheap approximation:

SELECT reltuples::BIGINT AS estimate
FROM pg_class
WHERE oid = 'orders'::regclass;

The estimate is updated by ANALYZE. Accuracy is generally within a few percent on healthy tables. Cost is microseconds.

For filtered counts, the EXPLAIN-based estimate works:

EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
-- Look at the rows= field in the output

You can extract this programmatically with EXPLAIN (FORMAT JSON) and parse the result. Some teams build a function that returns the estimated count from EXPLAIN; it is a 20-line plpgsql function that solves a real problem.

When pagination needs better than this

"Page 1 of 1234" pagination requires an exact count. "Page 1 of many" can use a relative count: "showing 25 of approximately 1.2M."

Keyset pagination removes the count problem entirely:

-- Instead of OFFSET, use the last row's id
SELECT * FROM orders
WHERE id > $last_seen_id
ORDER BY id
LIMIT 25;

The UI shows "next page" without claiming to know the total. For most product views, this is the right tradeoff: faster, no count needed, scales to arbitrary table sizes.

The applications that genuinely need page-of-N pagination are rare. Search results, admin dashboards, sometimes audit views. For everything else, keyset is better.

Maintaining a counter table

For counts that change frequently and must be cheap to read, a counter table:

CREATE TABLE table_counts (
  table_name TEXT PRIMARY KEY,
  row_count BIGINT NOT NULL
);

-- Trigger to update on INSERT/DELETE
CREATE OR REPLACE FUNCTION increment_orders_count()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    UPDATE table_counts SET row_count = row_count + 1 WHERE table_name = 'orders';
  ELSIF TG_OP = 'DELETE' THEN
    UPDATE table_counts SET row_count = row_count - 1 WHERE table_name = 'orders';
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER orders_count_trigger
AFTER INSERT OR DELETE ON orders
FOR EACH ROW EXECUTE FUNCTION increment_orders_count();

The trade: every insert and delete now contends on a single row in table_counts. Under high concurrency, this becomes a serialization bottleneck.

For very high-throughput tables, the better shape is a table with multiple rows that are summed:

CREATE TABLE table_counts_sharded (
  table_name TEXT,
  shard SMALLINT,
  delta BIGINT NOT NULL,
  PRIMARY KEY (table_name, shard)
);

-- Insert random shard, periodically aggregate

This is more code, but it removes the contention.

What I do today

For any new feature that mentions a count:

  1. Ask: does the user really need an exact number, or an approximate one?
  2. If approximate: pg_class.reltuples or EXPLAIN-based estimate.
  3. If exact and rarely viewed: COUNT(*) on a read replica with a cache.
  4. If exact and frequently viewed: counter table, possibly sharded.
  5. If pagination: keyset, not OFFSET-based.

The rule that matters: do not let "the user wants a count" automatically mean "COUNT(*) on every page load." That is the path to slow pages, then to a slow database, then to an angry team trying to optimize the unoptimizable.