Vector Databases: How I Decide Between pgvector and a Dedicated Vector Store
Two teams. One went straight to Pinecone for $4,200/month they didn't need; the other stayed on pgvector past the point it was wise. Here is the framework I use now.
Notes for the problems that show up after launch: bad plans, awkward migrations, index debt, vacuum pressure, replica lag, and the small decisions that make PostgreSQL easier to operate.
Two teams. One went straight to Pinecone for $4,200/month they didn't need; the other stayed on pgvector past the point it was wise. Here is the framework I use now.
EXPLAIN ANALYZE produces a wall of indented text and three numbers per node. Here is the actual reading order I use, and what each number means in practice.
A nested loop join is fast or slow depending on the row counts. Here is when it is the right plan, when it is not, and how to read the EXPLAIN output to tell.
Hash joins are the workhorse for large equi-joins. They are also the join type that quietly spills to disk when work_mem is too small, turning a fast query into a slow one.
A sort that fits in `work_mem` is fast. A sort that does not is 10x slower. Here is how to tell which case you are in and what to do about it.
What the planner does, the statistics that drive its decisions, and the diagnostic workflow for the moments when it picks the spectacularly wrong plan.
CTEs read beautifully and sometimes cost beautifully too. The behavior changed in Postgres 12, and most teams I work with have not updated their mental model.
Window functions are the right tool for ranked-per-group queries. They are also surprisingly fragile in performance. Here is how to write them so they actually scale.
GROUP BY queries either run instantly because Postgres can stream from a sorted index, or they sort 50GB on disk. Here is how to put yourself in the first camp.
DISTINCT ON is a Postgres-specific shortcut for "give me one row per group, sorted some way." It is fast, it is pleasant, and the version in your head is probably wrong.
OFFSET pagination gets slower the deeper you go. Keyset pagination is constant time. Here is how to switch over and the cases where you cannot.
A `SELECT count(*)` on a small table is instant. On a 500M-row table it can take a minute. Here is what to do when the count itself is the slow query.