RAG Retrieval Quality Runbook
A production runbook for RAG retrieval quality: golden queries, recall checks, citation coverage, stale chunks, ACL filters, and no-hit behavior.
Why retrieval quality needs a runbook
RAG failures rarely announce themselves as database errors. The assistant answers confidently, but it cites the wrong document, misses a fresh document, ignores a permission boundary, or invents an answer because retrieval returned weak evidence.
This runbook turns those product failures into measurable checks.
Keep a golden query set
Create a small but realistic set of queries with expected source documents. Include common questions, exact identifiers, rare edge cases, permission-restricted documents, deleted documents, and newly uploaded content.
| Query class | What to verify |
|---|---|
| Exact identifier | Error codes, invoice IDs, table names, product SKUs rank correctly. |
| Semantic question | Conceptually relevant docs appear even when wording differs. |
| Permission boundary | Restricted docs never appear for the wrong user. |
| Fresh document | Recently embedded docs are retrievable within the agreed freshness SLO. |
| Deleted document | Deleted or revoked docs disappear from retrieval. |
Metrics to capture
- Recall@k: how often expected documents appear in the top k retrieved chunks.
- Result count: whether filters leave enough chunks for the answer step.
- Citation coverage: whether answers cite retrieved evidence.
- No-hit rate: how often retrieval correctly admits there is no evidence.
- Freshness lag: time from document update to searchable chunk.
- Forbidden-hit rate: any retrieval result the user should not be allowed to see.
Debug sequence
- Re-run the query with the same tenant, user, language, and timestamp filters.
- Run exact search on a small sample and compare it with the approximate production path.
- Check whether filters removed most approximate candidates.
- Check whether the document was embedded with the current model and chunking version.
- Check whether deleted or stale chunks still exist in the serving table or vector store.
- Review the prompt only after retrieval evidence is proven healthy.
MonPG checks
Use MonPG to watch the PostgreSQL side of retrieval: slow vector queries in pg_stat_statements, vector index growth, dead tuples after re-embedding, queue tables for embedding jobs, and query plans for filtered retrieval. If retrieval quality drops after a data import or index rebuild, compare those operational signals with the golden query results.