The first vector database decision I helped a team make was rushed. They had a deadline to ship a RAG-based feature in six weeks. Someone in a Slack channel said "use Pinecone, it's the standard." They signed a contract for a year before anyone had measured whether their query patterns even needed a dedicated vector store. Eight months later, they were paying $4,200/month for queries they could have served from a 30GB pgvector index in the database they already had.
Six months after that, a different team I worked with did the opposite. They went all-in on pgvector for a workload that genuinely needed sub-50ms p99 across 200 million vectors with multi-tenancy and filtered search. Eighteen months in, the operational pain — index rebuilds during peak, recall drift after VACUUM, planner instability — had cost them more engineering time than a Pinecone subscription would have for three years.
Both decisions were defensible at the time they were made. Both teams ended up doing the migration the other way around. The mistake in both cases was the same: picking the tool before measuring the workload.
This is the framework I use now, after watching this play out a dozen times.
What a vector database actually is
A vector database stores vectors (typically 128-3072 dimensional floats) alongside an index that supports approximate nearest-neighbor (ANN) search. Given a query vector, return the K most-similar vectors from the index, ranked by distance (cosine, L2, or inner product).
The "approximate" matters. Exact nearest-neighbor on millions of vectors is too slow for interactive use. ANN trades exactness for speed: you get the top K with high probability, not certainty. The probability of getting the right top K is "recall@K" — typically reported as a percentage.
Most production embedding workloads — semantic search, RAG retrieval, recommendation, deduplication, image similarity — sit on top of an ANN index. The choice of vector database is mostly a choice of which ANN implementation, what wrapping infrastructure, and what operational characteristics.
The big choice: pgvector or dedicated
The serious options for most teams are:
- pgvector: a PostgreSQL extension that adds vector data types and ANN indexes (HNSW and IVFFlat). Uses your existing Postgres. Free.
- Dedicated vector databases: Pinecone, Weaviate, Qdrant, Milvus, ChromaDB, etc. Specialized infrastructure for vector workloads. Usually managed services with their own operational model.
- Built-in features in non-Postgres databases: Mongo Atlas Vector Search, Elasticsearch dense_vector, Redis Vector Search. Reasonable if you already use the parent system.
For teams already running Postgres — which is most teams I work with — the choice is essentially pgvector vs Pinecone (or its equivalent). The right answer depends on the workload.
When pgvector is enough
pgvector earns its keep when:
1. Your data fits comfortably on one Postgres instance. Up to ~50 million vectors of moderate dimension (768-1536) on a sensibly-sized RDS instance. Above that, the index size and the maintenance overhead start mattering.
2. Your workload includes structured filters alongside vector search. "Find similar documents owned by tenant X published after date Y." pgvector lets you combine vector search with SQL WHERE clauses, indexes, JOINs. Dedicated vector DBs handle this less smoothly — most have metadata filter support but it is rarely as flexible as SQL.
3. Your team is more comfortable with Postgres than with a new infrastructure component. This sounds like a soft criterion but it dominates a lot of decisions in practice. The team that knows Postgres well will deploy pgvector with reasonable defaults and ship; the team learning a new vector DB will have a longer time-to-production.
4. You don't have extreme latency requirements. pgvector with HNSW can serve 10-50ms p99 on tens of millions of vectors when tuned properly. Below 10ms p99 reliably is harder.
5. Cost matters more than convenience. A 32GB Postgres instance with a few extra GB for the vector index is much cheaper than a Pinecone or equivalent subscription at the same data scale.
For most early-stage applications adding semantic search, pgvector is the right starting point. The amount of effort it saves — no new infrastructure, no new vendor, same backup and monitoring story — is significant.
When you need a dedicated vector DB
The patterns that push me toward dedicated:
1. Hundreds of millions of vectors or more, growing fast. pgvector indexes get unwieldy at this scale. HNSW build times on a 200M-vector dataset are measured in hours; rebuilds during operational events become real outages. Dedicated vector DBs are built for this scale.
2. Strict sub-10ms p99 requirements at high QPS. Pinecone, Qdrant, and similar are tuned for vector queries first; everything else is secondary. They will outperform pgvector on raw vector latency at high scale.
3. Multi-region distributed search. Replicating a Postgres cluster to multiple regions for vector workloads is doable but operationally heavy. Dedicated services often handle this transparently.
4. You need features pgvector does not have. Hybrid search (BM25 + vector), pre-built reranking, automatic re-embedding on document updates, namespace-level isolation. Dedicated services have spent a lot of engineering on these.
5. Vector workload is a primary product feature, not a side capability. If 80% of your application's read traffic is vector search, the operational tooling and observability of a dedicated vector DB is worth the cost. If vector search is one of fifty things your application does, pgvector wins on consolidation.
What pgvector actually looks like in production
Here is a typical pgvector setup that I would not be embarrassed to ship.
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
tenant_id BIGINT NOT NULL,
title TEXT NOT NULL,
body TEXT NOT NULL,
embedding vector(1536),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Filter index
CREATE INDEX documents_tenant_created
ON documents (tenant_id, created_at DESC);
-- HNSW vector index
CREATE INDEX documents_embedding_hnsw
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Search is a normal SQL query:
SELECT id, title, embedding <=> $1::vector AS distance
FROM documents
WHERE tenant_id = $2
AND created_at > now() - interval '1 year'
ORDER BY embedding <=> $1::vector
LIMIT 10;
The filter on tenant_id uses the regular index; the vector ordering uses the HNSW index. This combined query is one of pgvector's strongest cases — the kind of pattern where dedicated vector DBs are awkward.
For tuning recall, the runtime parameter is hnsw.ef_search:
SET LOCAL hnsw.ef_search = 80; -- higher = better recall, slower
Default is 40. For most workloads, 80-120 produces 0.95+ recall with minimal latency penalty.
The pgvector pitfalls people hit
The operational realities I have learned the hard way:
1. Default HNSW parameters are not enough for serious workloads. m = 16, ef_construction = 64, ef_search = 40 produces ~0.65 recall@10 on most embedding distributions. That is dramatically worse than a fresh team would expect from "the search index." Tuning to m = 32, ef_construction = 128, ef_search = 80 typically produces 0.94+ recall at acceptable latency.
The team I worked with who did not know this shipped recommendations with effective recall around 0.6 — meaning 40% of the "most similar" recommendations were wrong. They thought their model was bad. The model was fine; the index was misconfigured.
2. VACUUM behavior on HNSW indexes is rough. A heavy autovacuum on the underlying table can cause noticeable recall fluctuation while the index is being maintained. Tune autovacuum_vacuum_cost_delay and run VACUUM ANALYZE proactively after large insert batches.
3. Index build time scales with vector count and dimension. Building HNSW on 50M × 1536-dimensional vectors takes hours. Plan for it. Use maintenance_work_mem = 8GB+ to make the build faster.
4. Reads spike during index build. A CREATE INDEX CONCURRENTLY hnsw_idx ON ... reads the entire table twice. On RDS with provisioned IOPS, this can saturate the disk for hours. Schedule for off-peak.
5. The planner does not always understand vector search costs. For queries that combine filters and vector ordering, the planner sometimes picks plans that filter first and order vectorially, sometimes the reverse. Test both with EXPLAIN ANALYZE for hot queries.
The migration paths
If you start with pgvector and need to migrate:
To a dedicated vector DB:
- Set up dual-write: every embedding insert/update goes to both pgvector and the new system.
- Verify consistency for a window (a few days minimum).
- Cut over reads to the new system, keeping pgvector as fallback.
- Once stable, stop writing to pgvector. Drop the index.
The dual-write phase is the long part. Plan for 4-8 weeks for a non-trivial dataset.
If you start with a dedicated vector DB and need to consolidate to pgvector:
- Bulk export embeddings from the dedicated DB.
INSERT ... ON CONFLICTinto pgvector (with proper chunking).- Build the HNSW index (can take hours).
- Verify recall against the original system using a sample of queries.
- Cut over reads.
Both directions are doable. Both take time. The decision to migrate is rarely made lightly because the value is intangible (operational simplicity, cost) until the migration is done.
What I check before recommending either
The questions I ask:
- How many vectors today? How many in 24 months at current growth?
- What is the dimension? Higher dimensions favor dedicated VDBs more.
- What is the QPS? Filtered or unfiltered?
- What is the acceptable latency? p50 and p99.
- What is the recall requirement? Is 0.95 enough, or does it need 0.99?
- Are you already running Postgres? Are you comfortable operating it?
- Is this a primary feature or a side capability?
- What is your budget for vector workload specifically?
For most teams, this conversation lands on pgvector as the right starting point. For teams with vector search as the primary product feature at scale, the dedicated VDB makes sense.
The wrong answer is to pick before having this conversation. That is the source of the rushed decisions I have seen go wrong in both directions.
A pragmatic default
For a new application adding vector search:
- Start with pgvector. Use HNSW with tuned parameters (
m = 32, ef_construction = 128). - Measure recall on your actual data. Aim for 0.94+ at acceptable latency.
- Set up monitoring on query latency, recall, and index size growth.
- Re-evaluate at meaningful inflection points: 10M vectors, 100M vectors, 50ms p99 violation, 5ms p99 requirement.
- Migrate to dedicated only when one of those inflection points genuinely forces it.
This default has worked for every team I have advised who started fresh. It costs little, ships fast, and the migration path is real if it eventually becomes necessary.
The teams who skipped step 1 and went straight to dedicated mostly regret it. The teams who refused to consider step 5 even at a clear inflection point eventually paid more than they should have.
The right answer is rarely the trendy one. It is the one that fits the workload at the scale you actually have, with realistic projections of where it is going.