Core Concepts

Vectors & Semantics

Every memory is searchable because it carries a vector — a numerical representation of its meaning. This page covers when the vector is computed, what text gets embedded, how the platform tracks the model behind each vector, and what changes if the model is updated.

What an embedding is, in one paragraph

An embedding is a fixed-length array of floating-point numbers that represents the meaning of a piece of text. Two pieces of text with similar meaning produce vectors that are close to each other under the platform's distance metric. Recall is built on this property — a query becomes a vector, and the platform finds memories whose vectors sit near it.

When the vector is computed

Embedding is part of the write path. The platform calls the embedding model during /memory/add processing. It is async at the language level (the call is awaited) but it blocks the response — when the API returns, the row already has a vector.

Why blocking

Blocking on embed time is what lets the API guarantee that a memory is recallable as soon as the write completes.

What text actually gets embedded

Not the raw input string. The platform first runs the input through the semantic preprocessing step, which normalises whitespace, strips low-signal padding, and may attach context. The result is the text that goes to the embedding model. The original input is preserved encrypted in the text column for reads.

How each vector is tied to a model version

Every memory carries an embedding_version field that records which model produced its vector. This is what lets the platform reason about which vectors are comparable to which queries, and which rows would need re-embedding after a model upgrade. Callers do not set this value — it is stamped server-side.

Cost and token tracking

Every embedding call records token count and cost in cents (kept as a decimal for precision). These figures roll up into the per-tenant usage counters that drive billing and quota enforcement.

What happens when the model is updated

Existing rows keep their original embedding_version and stay in their original index.
New writes use the new model.
Cross-version recall is not automatic — a query embedded with a new model only matches rows that share its embedding_version.
Plan a re-embedding pass before changing a tenant's effective model.

What to watch when debugging recall

embedding_version on the row — confirms which model produced the vector.
embedding_version of the query path — must align with the row's version for the row to be a candidate.
Whether the query's preprocessing matches the write-time preprocessing — divergent normalisation hurts recall.

← Previous

Source Provenance

Retrieval Pipeline