Extensions

RAG Retrieval Quality Runbook

A production runbook for RAG retrieval quality: golden queries, recall checks, citation coverage, stale chunks, ACL filters, and no-hit behavior.

Why retrieval quality needs a runbook

RAG failures rarely announce themselves as database errors. The assistant answers confidently, but it cites the wrong document, misses a fresh document, ignores a permission boundary, or invents an answer because retrieval returned weak evidence.

This runbook turns those product failures into measurable checks.

Keep a golden query set

Create a small but realistic set of queries with expected source documents. Include common questions, exact identifiers, rare edge cases, permission-restricted documents, deleted documents, and newly uploaded content.

Query classWhat to verify
Exact identifierError codes, invoice IDs, table names, product SKUs rank correctly.
Semantic questionConceptually relevant docs appear even when wording differs.
Permission boundaryRestricted docs never appear for the wrong user.
Fresh documentRecently embedded docs are retrievable within the agreed freshness SLO.
Deleted documentDeleted or revoked docs disappear from retrieval.

Metrics to capture

  • Recall@k: how often expected documents appear in the top k retrieved chunks.
  • Result count: whether filters leave enough chunks for the answer step.
  • Citation coverage: whether answers cite retrieved evidence.
  • No-hit rate: how often retrieval correctly admits there is no evidence.
  • Freshness lag: time from document update to searchable chunk.
  • Forbidden-hit rate: any retrieval result the user should not be allowed to see.

Debug sequence

  1. Re-run the query with the same tenant, user, language, and timestamp filters.
  2. Run exact search on a small sample and compare it with the approximate production path.
  3. Check whether filters removed most approximate candidates.
  4. Check whether the document was embedded with the current model and chunking version.
  5. Check whether deleted or stale chunks still exist in the serving table or vector store.
  6. Review the prompt only after retrieval evidence is proven healthy.

MonPG checks

Use MonPG to watch the PostgreSQL side of retrieval: slow vector queries in pg_stat_statements, vector index growth, dead tuples after re-embedding, queue tables for embedding jobs, and query plans for filtered retrieval. If retrieval quality drops after a data import or index rebuild, compare those operational signals with the golden query results.