6 min read

Multi-Tenant Vector Search: Namespaces, Metadata Filters, or Partitions?

Multi-tenant vector search is a correctness and isolation problem. Namespaces, filters, and partitions each fail differently under tenant skew and ACL rules.

The demo used one corpus. Production had 3,000 tenants. One tenant had 70 percent of the data, most had almost none, and enterprise customers needed strict permission boundaries. The same vector query no longer meant the same thing.

Multi-tenant vector search is not just a scaling problem. It is a correctness, isolation, and operations problem.

The framework: isolate what must be correct

I choose tenancy strategy by the blast radius of a mistake. If a result from the wrong tenant is a security incident, the design should make that hard to express, not merely unlikely.

  • Metadata filters: flexible and simple, but recall can suffer when filters are selective.
  • Namespaces: stronger logical separation, but many small namespaces can create operational overhead.
  • Partitions or separate tables: good for Postgres-heavy designs, but index count and migrations need discipline.
  • Separate clusters: expensive, but sometimes right for enterprise isolation.

Tenant skew changes everything

A global index can work for the largest tenant and fail for smaller tenants because approximate candidates mostly come from other tenants. Average recall hides this. Always split recall and latency by tenant size bucket.

A pgvector shape

For pgvector, the tenant and permission rules should be first-class columns. Then you can decide whether a global index, partial indexes, or partitioning matches the workload.

CREATE TABLE document_chunks (
  organization_id bigint NOT NULL,
  acl_group_id bigint NOT NULL,
  document_id bigint NOT NULL,
  embedding vector(1536) NOT NULL,
  is_deleted boolean NOT NULL DEFAULT false
);

CREATE INDEX document_chunks_embedding_hnsw
ON document_chunks
USING hnsw (embedding vector_cosine_ops);

What I monitor

  • Recall@k by tenant size.
  • p99 by tenant size.
  • Filtered result count below requested top_k.
  • Cross-tenant leak tests.
  • Index size and build time by tenant strategy.
  • Noisy-neighbor impact during re-embedding or imports.

The production default

Start with the simplest strategy that makes cross-tenant leakage hard and recall measurable. Move to namespaces, partitions, or dedicated clusters when tenant skew, enterprise isolation, or SLOs justify the operational cost.

The runbook I want before this reaches production

Before I trust this design, I want a small runbook that names the failure mode, the owner, and the rollback path. Vector systems fail in ways that look like product quality problems: missing evidence, stale evidence, wrong-tenant evidence, high p99, or answers that cite weak chunks. If the team cannot tell which one happened, the system is not observable enough.

  • Define a golden query set with real permissions and expected source documents.
  • Track recall, result count, p95, p99, and cost by query class.
  • Keep a rollback path for index, embedding model, chunking, and metadata changes.
  • Test deleted, restricted, fresh, and re-embedded documents as canaries.
  • Review the dashboard after every bulk import, re-embedding job, and index rebuild.

The practical standard is simple: a retrieval change should be measurable before it ships, visible while it runs, and reversible when quality drops.