pgvector and RAG15 min read

Vector Databases: How I Decide Between pgvector and a Dedicated Vector Store

The hard vector database decision is not pgvector versus Pinecone on a checklist. It is whether your filters, recall target, update rate, and incident budget still fit inside Postgres.

The first vector database decision I helped with was not really a database decision. It was a deadline decision. A team had six weeks to ship a RAG feature, one engineer had already used Pinecone on a side project, and nobody wanted to spend two days proving whether Postgres could handle the workload.

They signed a one-year contract before they knew their corpus size, filter selectivity, or recall target. Eight months later, the dedicated vector store was serving queries that a 30GB pgvector index could have handled inside the database they already operated.

A different team made the opposite mistake. They stayed on pgvector after the product had become a retrieval system in its own right: hundreds of millions of chunks, strict p99, noisy tenants, aggressive re-embedding, and filtered search where missing one document changed the answer. Their cloud bill looked cheap. Their engineering calendar did not.

Both teams had reasonable instincts. The failure was choosing the tool before measuring the shape of the pain.

The framework starts with the query, not the vendor

I do not start with "pgvector or dedicated?" I start with the exact serving query. The query tells you more than the architecture diagram.

SELECT id, document_id, body
FROM document_chunks
WHERE organization_id = $1
  AND acl_group_id = ANY($2)
  AND language = $3
  AND is_deleted = false
ORDER BY embedding <=> $4
LIMIT 12;

If that is close to your workload, you are not doing "vector search" in the abstract. You are doing filtered retrieval under product rules. Tenant boundaries, permissions, language, freshness, and deletion state are part of correctness.

This is why generic vector benchmarks can mislead teams. They usually measure nearest-neighbor search on a clean corpus. Production measures whether the right user gets the right chunks after the filters have removed most of the candidates.

What pgvector is good at

pgvector is a strong default when the vector rows are still part of the application data model. You get transactions, backups, joins, point-in-time recovery, SQL access control, and one operational surface. For early RAG systems, internal search, support assistants, and document products still learning what "good retrieval" means, that simplicity is worth a lot.

The best pgvector deployments I see are boring in the right way:

  • The corpus is small enough that exact search or a measured approximate index meets the latency target.
  • The filters are relational and understood by the Postgres planner.
  • The team has recall tests, not just latency graphs.
  • Re-embedding jobs are scheduled and observable.
  • The database still has headroom for the OLTP workload.

The dangerous version is treating pgvector as a way to avoid designing a retrieval service forever. It is a simplification, not an escape from measurement.

What changes when you add approximate indexes

By default, pgvector can do exact nearest-neighbor search. Exact search is simple to reason about because recall is perfect. The cost is that it can get too slow as the corpus grows.

Approximate indexes such as HNSW and IVFFlat trade some recall for speed. That trade is the whole point, but it is also where product risk enters. After adding an approximate index, the result set can change.

CREATE INDEX document_chunks_embedding_hnsw
ON document_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

The operational question is not "did the query get faster?" It is "did it stay correct enough for this product?" A support assistant can tolerate a different top-12 than an incident runbook assistant. A legal search product may need a different bar again.

The first cliff is filtered recall

Filtered recall is where pgvector decisions get real. With approximate indexes, filters can be applied after the index scan. If a filter matches a small percentage of rows, many candidate rows may be discarded after the ANN search has already done its work.

That creates a failure mode that looks strange from the product side: latency is acceptable, but the answer is weak or incomplete. The query asked for 12 chunks. It got 4 useful chunks and the model filled the rest with confidence.

This is the part I measure before recommending either path:

-- Exact reference on a representative query set.
BEGIN;
SET LOCAL enable_indexscan = off;

SELECT id
FROM document_chunks
WHERE organization_id = $1
  AND acl_group_id = ANY($2)
  AND is_deleted = false
ORDER BY embedding <=> $3
LIMIT 20;
ROLLBACK;

Then I compare the production indexed path against that reference. Track recall@k, result count, p95, p99, and tenant. If the global average is fine but small tenants or narrow permission groups fail, the average is hiding the bug.

What I try before leaving Postgres

I usually try to make pgvector honest before moving the workload out.

  • B-tree indexes for selective filters: if a filter narrows to a small subset, exact filtered search can be good enough.
  • Partial vector indexes: useful when a small number of filter values dominate the workload.
  • Partitioning: better than one partial index per tenant when there are many filter values.
  • HNSW search tuning: raising search breadth can recover recall at the cost of latency and CPU.
  • Iterative scans: useful when filtered approximate queries need to scan more of the index to return enough rows.

Those tools only help if the team watches the right metrics. If every dashboard is built around average latency, pgvector will look healthy long after retrieval quality has started drifting.

When a dedicated vector store is the cleaner architecture

I move toward a dedicated vector store when vector retrieval becomes a serving system with its own scale curve. The signs are usually obvious in hindsight:

  • Hundreds of millions of vectors or more.
  • Strict p99 targets under concurrent query load.
  • Heavy metadata filtering where recall matters.
  • High update or re-embedding cadence.
  • Index rebuilds that compete with production Postgres work.
  • A team that needs retrieval to scale independently from OLTP.

The clean model is not "move truth into the vector database." Postgres should usually stay the source of truth for documents, permissions, tenant state, and lifecycle. The vector system becomes a serving index. That distinction prevents a common migration mess where product truth is split across two systems with no clear owner.

The migration path I trust

The safest migration is boring and reversible.

  1. Keep Postgres as source of truth.
  2. Add an outbox or change stream for chunk lifecycle events.
  3. Backfill the vector store from the same rows used by pgvector.
  4. Run shadow queries and compare recall, latency, and result count.
  5. Route a small percentage of retrieval traffic through the new store.
  6. Keep dual-read observability until the failure modes are understood.

The migration is not done when the new store returns results. It is done when the team can explain mismatches between old and new retrieval, including permission filters and deletion state.

The decision table I actually use

SignalStay with pgvectorConsider dedicated vector store
Corpus sizeFits comfortably in Postgres with measured headroomIndex size and rebuild time affect normal database work
FiltersSimple, selective, easy to index or partitionHeavy metadata filters with recall-sensitive results
Latencyp95/p99 meet product needs under concurrencyRetrieval needs independent scaling and strict p99
UpdatesEmbedding churn is predictableBulk re-embedding and deletes are constant operational pressure
Team maturityStill learning retrieval qualityRetrieval is a core product surface with owners and SLOs

A pragmatic default

Start with pgvector when the workload is still close to the application database and you can measure recall. Keep it there as long as it stays boring.

Move out when retrieval has become a separate production system. The mistake is not choosing pgvector or choosing a dedicated store. The mistake is letting the tool choice happen before the workload is measured.