MemorySyncMemorySync
Retrieval

How Vector Search Works

Vector search is the foundation that recall sits on. This page describes only what the index does — distance, top-k, namespacing, and filter pushdown. The hybrid scoring that wraps it lives on the Semantic Ranking page.

The distance metric

Every collection is created with cosine similarity. The index returns a distance in [0, 2]; the platform clamps it to [0, 1] and converts with similarity = 1 - clamped_distance. Dot-product and Manhattan are not configured.

Top-k and what the limit actually means

  • Default k = 5; clamped to [1, 50].
  • The platform sometimes asks the index for more than k internally so it has candidates to work with during reranking. The caller still gets at most k back.
  • If fewer than k rows match the filters, the response is shorter — the platform never pads with random rows.

Per-user namespacing

Each user has exactly one collection named memorysync-user-{user_id}. A search against another user's collection is structurally impossible — the collection name is derived from the authenticated user_id, not from anything in the request body. There are no per-project sub-collections; project scoping happens through the metadata filter.

Metadata filter pushdown

Filters are applied at the index, not after results come back. The index supports where-clauses on every field stored in the payload — project_id, environment, source, tags, tier, importance ranges, created_at ranges. This is what makes a filtered query as fast as an unfiltered one even on a large index.

What happens when there is nothing to find

  • Empty user collection → empty list. The platform never falls back to random results or to a different user.
  • Restrictive filters that match zero rows → empty list, with applied_filters.expansion_used set if the platform attempted a soft-relax pass.
  • Out-of-distribution query (the index has nothing semantically close) → still an empty list. The caller's UI is responsible for rendering the no-match state.

Reliability around the index call

  • Each call is wrapped in a retry: 50 ms → 100 ms → 200 ms exponential backoff.
  • After three consecutive failures the call surfaces an HTTP 503; downstream recall does not pretend the index is empty.
  • Per-user collections are cached locally so steady-state queries do not hit the index server's collection-listing path.

Vector search alone is not the answer

The list the index returns is in distance order, not relevance order. The recall engine still has to fetch the rows from the durable store, compute six other ranking factors, normalise them per batch, and apply boosts for relationship and session continuity. Treat the index call as the candidate-generation step — never as the final answer.