March 21, 202614 min read

PostgreSQL Index Optimization: Finding Bloat, Duplicates, and Unused Indexes

Index cleanup starts with evidence: which indexes serve real queries, which duplicate each other, and which slow writes without earning their keep.

One team I worked with had a products table with 40 indexes. Forty. The table had grown organically over 6 years, with every engineer adding "just one more index" to speed up their specific query. Nobody ever removed any.

Every INSERT into that table had to update 40 B-tree structures. Their write throughput was terrible and nobody could figure out why — because on paper, indexes only slow down writes a little, right? Sure, if you have 3-5 indexes. At 40, it was measurably killing their ingest pipeline.

We dropped 32 indexes. Writes got 4x faster overnight.

How B-tree Indexes Work (Briefly)

PostgreSQL's default index type is B-tree. It's a balanced tree structure where leaf nodes contain (key, heap_pointer) pairs pointing to actual table rows. Lookups traverse the tree in O(log n) — fast.

But every write to a table has to update every index on that table. An INSERT on a table with 10 indexes does 11 writes (one to the heap, one per index). An UPDATE that touches indexed columns can trigger 2 writes per index (delete old, insert new).

This is why the number of indexes matters. Every index you add is a write tax on every INSERT and UPDATE. Make sure that tax is worth paying.

GIN indexes are for full-text search and arrays. They're expensive to build and update, but great for @> and @@ operators.

GiST indexes are for geometric types, range types, and custom types. Good for things like IP range lookups or overlapping date ranges.

BRIN indexes are for very large tables with naturally ordered data (like append-only time-series). They're tiny and approximate, but can dramatically reduce I/O.

Finding Unused Indexes

SELECT
  schemaname,
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan AS scans_since_reset,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indisunique = FALSE
ORDER BY pg_relation_size(indexrelid) DESC;

Important caveat: pg_stat_user_indexes resets when PostgreSQL restarts. An index with idx_scan = 0 might be legitimately unused — or it might just be that you restarted recently and haven't seen traffic patterns that would use it yet.

I typically wait 2-4 weeks of normal production load before dropping indexes based on this data. Also, some indexes are only used during rare events (end-of-month reporting, annual audits). Don't drop those.

Before dropping, check:

-- Is this a unique constraint enforcing data integrity?
SELECT indisunique, indisprimary, indisexclusion
FROM pg_index
WHERE indexrelid = 'your_index_name'::regclass;

Never drop unique indexes without understanding what data constraint they're enforcing.

Finding Duplicate Indexes

SELECT
  i1.indexrelid::regclass AS idx1,
  i2.indexrelid::regclass AS idx2,
  t.relname AS table_name,
  pg_size_pretty(pg_relation_size(i1.indexrelid)) AS idx1_size,
  pg_size_pretty(pg_relation_size(i2.indexrelid)) AS idx2_size
FROM pg_index i1
JOIN pg_index i2 ON i1.indrelid = i2.indrelid
  AND i1.indexrelid != i2.indexrelid
  AND i1.indkey::text = i2.indkey::text
JOIN pg_class t ON t.oid = i1.indrelid
WHERE i1.indexrelid < i2.indexrelid;

This finds indexes on the exact same columns in the same order. But there are subtler duplicates:

-- (a, b) index makes a standalone (a) index redundant for most queries
-- Find leading-column overlaps manually
SELECT indexname, indexdef FROM pg_indexes
WHERE tablename = 'your_table'
ORDER BY indexname;

An index on (user_id, created_at) makes a standalone index on just user_id redundant for queries filtering on user_id. The composite index can handle those queries too. The standalone user_id index is just dead weight.

Index Bloat and REINDEX CONCURRENTLY

Indexes, like tables, accumulate bloat. When rows are updated or deleted, the old index entries become dead but stay in the index until a VACUUM pass cleans them up. Over time, heavily-written indexes can become 50%+ empty pages.

Check index bloat:

SELECT
  schemaname,
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  -- This is approximate; pgstattuple gives exact numbers
  round(100 * (1 - idx_tup_read::numeric / NULLIF(idx_scan * 1, 0)), 1) AS approx_bloat_pct
FROM pg_stat_user_indexes
WHERE idx_scan > 100
ORDER BY pg_relation_size(indexrelid) DESC;

For exact bloat, use pgstattuple:

CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstatindex('your_index_name');
-- Look at: leaf_fragmentation percentage

If an index has leaf_fragmentation over 30-40%, it's worth rebuilding.

To rebuild without locking the table:

REINDEX INDEX CONCURRENTLY your_index_name;
-- Or rebuild all indexes on a table:
REINDEX TABLE CONCURRENTLY your_table;

CONCURRENTLY builds a new index in the background and swaps it in without blocking reads or writes. It takes longer, but it's safe for production.

Partial Indexes: The Underused Superpower

A partial index only indexes rows matching a WHERE clause. This is massively underused.

-- Instead of indexing all orders, only index active ones
CREATE INDEX CONCURRENTLY idx_orders_active_user
ON orders (user_id, created_at)
WHERE status != 'completed';

-- Only index unread notifications
CREATE INDEX CONCURRENTLY idx_notifications_unread
ON notifications (user_id)
WHERE read_at IS NULL;

If 95% of your orders are 'completed' and queries almost never touch completed orders, why index them? The partial index is 20x smaller, fits better in cache, and is faster to update.

I once reduced an index from 8GB to 400MB with a partial index. Same query performance, 20x less disk space, much faster writes.

Expression Indexes

Index the result of an expression, not just a column value:

-- Case-insensitive email lookup
CREATE INDEX CONCURRENTLY idx_users_email_lower
ON users (lower(email));

-- Date-only index (ignoring time component)
CREATE INDEX CONCURRENTLY idx_events_date
ON events (date_trunc('day', created_at));

For the first one, queries must use WHERE lower(email) = '[email protected]' to hit the index. If your query uses WHERE email = '[email protected]', it won't. The query and the index expression must match.

Covering Indexes

A covering index includes extra columns beyond the search columns, so PostgreSQL can answer the query entirely from the index without touching the heap:

-- Query: SELECT name, email FROM users WHERE user_id = $1
-- Standard index on user_id requires heap lookup to get name and email
-- Covering index eliminates the heap lookup:
CREATE INDEX CONCURRENTLY idx_users_id_covering
ON users (user_id) INCLUDE (name, email);

The INCLUDE columns aren't used for searching — they're just stored in the index leaf nodes. This eliminates the "heap fetch" step you see in EXPLAIN output as "Index Only Scan" vs "Index Scan."

Covering indexes make sense when:

The included columns are frequently queried alongside the search columns
The table has high concurrency (heap fetches cause random I/O)
You're trying to eliminate specific sequential scans

MonPG tracks index usage statistics over time — you can spot indexes going from frequently used to unused when query patterns change. That's often the sign that an application change made an index redundant.

The 40-index team? They now have 8 indexes on that table, a documented index review process, and significantly less grey hair.

← Back to all articles