pgvector recall tuning

Your pgvector recall is silently 67%. Fix it in 5 minutes.

HNSW index parameters are immutable after creation, and defaults deliver 67% recall on 2026 RAG workloads. MonPG samples your real embeddings, runs 48 configurations on our benchmark infra, and hands you zero-downtime migration SQL.

Review Question

What HNSW parameters actually give you 95% recall on this exact table, and how do you roll that out without dropping queries?

At A Glance

Input
Read-only Postgres connection to your pgvector table
Decision
Keep defaults, tune, or rebuild on a different model
Output
Trade-off matrix, winner config, migration SQL, rollback plan

What you get

  • Recall@10 measured against brute-force ground truth on your real data
  • p50 / p95 / p99 latency per HNSW configuration
  • Trade-off matrix across recall, latency, index size, and build time
  • Zero-downtime CREATE INDEX CONCURRENTLY migration SQL with rollback plan

Guardrails

  • Reads one sample from your DB; all benchmarking runs on our worker.
  • Embeddings are never persisted on MonPG's side; sample is held in memory only.
  • Self-hosted runner available for air-gapped or strict compliance environments.

Signals covered

The page is designed to answer these production questions and search intents without losing the operational context.

pgvector hnsw tuningpgvector recall optimizationpgvector ef_construction ef_searchpgvector production rag tuningpgvector m parameter benchmark
PostgreSQL guideCompare tools

Why This Matters

Your RAG works in demo. It returns the wrong document 1 in 3 times in production.

Most teams ship pgvector with default HNSW parameters. The application looks healthy, queries are fast, and metrics stay green. The only signal something is wrong is that users say the chatbot or search feels off. By the time anyone traces it back to the index, trust has eroded.

Defaults are tuned for 2023, not 2026

pgvector defaults (m=16, ec=64, es=40) were set for 384-dim sentence-transformers. 1536-dim OpenAI embeddings at 500K+ scale need denser graphs and wider query-time exploration.

HNSW parameters are immutable

Once you CREATE INDEX, m and ef_construction cannot be ALTERed. Wrong defaults mean a full rebuild, not a tweak.

Recall is not surfaced by any existing PG monitoring tool

pganalyze, Datadog, pgwatch all show latency and throughput. None measure recall against ground truth. The silent-killer bug is invisible.

Sample Benchmark Output

m=16, ec=64, es=40 (default)recall 0.67 · p95 8 ms · 520 MB
m=16, ec=64, es=80recall 0.79 · p95 12 ms · 520 MB
m=32, ec=128, es=40recall 0.82 · p95 9 ms · 780 MB
m=32, ec=128, es=60recall 0.91 · p95 14 ms · 780 MB
m=32, ec=128, es=80 (winner)recall 0.94 · p95 18 ms · 780 MB
m=48, ec=256, es=120recall 0.97 · p95 32 ms · 1150 MB

500K OpenAI text-embedding-3-small (1536 dim) on a customer-support workload. Winner highlighted. Your numbers will differ; that is the point.

Workflow

How It Works

01

Connect your Postgres

Read-only credentials or agent token. Self-hosted, Supabase, Neon, RDS, Aurora, AlloyDB, Azure — all supported.

02

Pick the table and vector column

Auto-detected from your pgvector inventory. Pick distance operator (cosine, L2, inner product).

03

Sample real embeddings

Your DB sees a single SELECT ... LIMIT N. Sample (1K-100K vectors) is held in memory on our worker only.

04

Compute ground truth

Brute-force exact k-NN on the sample gives us the reference set for recall@10.

05

Grid search 48 configurations

m x ef_construction x ef_search in parallel. Build, query, measure recall, latency, size, build time per config.

06

Get the winner + migration SQL

Trade-off matrix, winner config, and zero-downtime CREATE INDEX CONCURRENTLY script ready to copy.

Integrations

Works With Your pgvector Deployment

Self-hosted Postgres

Any Postgres 12+ with pgvector 0.5+

Supabase

Read-only role across all projects

Neon

Branching-friendly, works with ephemeral DBs

AWS RDS / Aurora

pgvector 0.8+ on PostgreSQL 15/16/17

GCP AlloyDB / Cloud SQL

Manual params, no native tuner

Azure Database for PostgreSQL

Flexible Server tier

Stop shipping RAG with default HNSW parameters.

Measure your recall once this quarter. If it is below 0.85, you have the silent-killer bug and the fix is a zero-downtime index swap. MonPG gives you the measurement and the migration SQL in 10 minutes.

Frequently Asked

Does MonPG read all of my embeddings?
No. MonPG reads a sample (you choose: 1K to 100K vectors) with a single SELECT ... LIMIT N. The sample stays in the benchmark worker memory and is discarded when the job completes. Nothing is persisted or logged.
How often should I re-benchmark?
Once per quarter, after major data growth, or after changing your embedding model. Scale tier supports scheduled re-benchmarks with recall drop alerts.
Can I use this against pgvectorscale / StreamingDiskANN?
Not in v1. We start with HNSW because it covers around 90 percent of current pgvector deployments. pgvectorscale tuning is on the v2 roadmap.
What about IVFFlat indexes?
HNSW-first in v1. IVFFlat support is v2. If you need IVFFlat tuning now, the methodology in our blog posts is portable; message us and we can prioritize.

Related PostgreSQL Hubs

Related Tools

Useful here, still point-in-time.

This page should help with a real decision right now. It should not pretend to replace the historical evidence that production teams need after a deploy, during an incident, or across a longer rollout window.

Helpful On This Page

  • A candidate index shape based on the query pattern you entered
  • A rollout risk band from table size, writes, replicas, and timing
  • Starter checks to adapt before touching production

Still Needs Live History

  • Whether PostgreSQL actually uses the new index after deploy
  • How write latency, lag, and storage changed during the rollout
  • Whether the index kept earning its cost a week later
See Live History in MonPG