6 min read

Re-Embedding Without Breaking Production Search

Embedding model upgrades are migrations. Versioned embeddings, dual indexes, shadow queries, backfills, and rollback decide whether search quality survives.

A team upgraded its embedding model on Friday because the benchmark looked better. On Monday, support search was worse for half the customers. The old and new embeddings were mixed in the same index, chunk boundaries had changed, and nobody had a rollback path.

Re-embedding is not a batch job. It is a data migration that changes search quality.

The framework: version everything

Every embedding row should say which model, dimension, chunking strategy, and source content hash produced it. Without that, you cannot explain results or roll back confidently.

CREATE TABLE document_embeddings (
  id bigserial PRIMARY KEY,
  document_id bigint NOT NULL,
  chunk_id text NOT NULL,
  embedding_model text NOT NULL,
  embedding_version integer NOT NULL,
  content_hash text NOT NULL,
  embedding vector(1536) NOT NULL,
  embedded_at timestamptz NOT NULL DEFAULT now()
);

Do not mix models casually

Distances are meaningful inside the same embedding space. Mixing two models in one query path can make rankings hard to interpret. I prefer a new column, new table, namespace, or index per embedding version, then a deliberate cutover.

Shadow before cutover

Run production queries against both old and new retrieval paths. Compare recall, result overlap, answer quality, latency, and cost before users see the new path.

SELECT old_results.query_id,
       old_results.top_ids AS old_top_ids,
       new_results.top_ids AS new_top_ids
FROM rag_eval_old old_results
JOIN rag_eval_new new_results USING (query_id);

Backfill is an operational workload

Backfills compete for CPU, network, API quota, WAL, storage, and index build time. If embeddings live in Postgres, bulk insert and index maintenance are part of the migration budget. If they live in a vector database, API rate limits and namespace size are part of the budget.

The rollout I trust

The mistake is treating re-embedding as cleanup. It is one of the highest-risk migrations in a RAG system because it changes what the model is allowed to know.

  1. Freeze the old retrieval path as a rollback target.
  2. Write new embeddings with explicit version metadata.
  3. Build a separate serving index or namespace.
  4. Run shadow queries from real traffic.
  5. Compare golden-set quality by query class.
  6. Cut over gradually by tenant or traffic slice.
  7. Keep the old index until quality and cost are stable.

The runbook I want before this reaches production

Before I trust this design, I want a small runbook that names the failure mode, the owner, and the rollback path. Vector systems fail in ways that look like product quality problems: missing evidence, stale evidence, wrong-tenant evidence, high p99, or answers that cite weak chunks. If the team cannot tell which one happened, the system is not observable enough.

  • Define a golden query set with real permissions and expected source documents.
  • Track recall, result count, p95, p99, and cost by query class.
  • Keep a rollback path for index, embedding model, chunking, and metadata changes.
  • Test deleted, restricted, fresh, and re-embedded documents as canaries.
  • Review the dashboard after every bulk import, re-embedding job, and index rebuild.

The practical standard is simple: a retrieval change should be measurable before it ships, visible while it runs, and reversible when quality drops.