How It Works
Memory Query Flow
This page traces a single POST /memory/query from the inbound request to the ranked response. The query path is read-only on the database, but it touches the embedding service, the vector index, the database (for hydration), and the cache — in that order.
The six stages of one query
- 1Validate & resolve scope. Auth, project, tenant resolved exactly as on the write path.
- 2Embed the query string. The same embedding service used at write time generates the query vector. The cache is checked first; on miss, the service is called.
- 3Vector recall. The vector index searches the per-user namespace and returns the top-K candidates ranked by cosine similarity, filtered by
retention_status="active",environment,project_id, and any caller-supplied filters. - 4Hydrate. One database query —
SELECT … WHERE vector_id IN (…)— pulls the fullmemoriesrows for the candidates. Decryption happens here. - 5Rank. A weighted formula combines semantic similarity, recency, importance, usage, decay, consistency, quality, and confidence into a single score.
- 6Bookkeep. Recall signals (
usage_count++,last_accessed_at=now()) are written for the returned memories so future ranking and tier transitions can use them.
What filters are honoured at recall, not after
| Filter | Where it applies | Allowed values |
|---|---|---|
tier | Vector index metadata filter — never returns the wrong-tier rows. | "hot" | "warm" | "cold"(or omitted = all) |
tags | Vector index metadata filter — exact match on any. | Array of strings. |
session_id | Vector index metadata filter. | String. |
metadata_filter | Post-hydration filter on the database row. | Free-form JSON match. |
environment | Always implicit — taken from the credential, cannot be set in the body. | development | staging | production(set by the credential) |
The ranking formula, in plain English
The final score is a weighted sum of seven signals, each normalised to [0, 1]:
| Signal | Default weight | What it measures |
|---|---|---|
| Semantic similarity | ~0.50 | Cosine similarity between query vector and stored vector. |
| Recency | ~0.20 | Decays smoothly with age. Newer wins ties. |
| Importance | ~0.15 | importance from creation, updated by the intelligence pipeline. |
| Usage | ~0.05 | usage_count normalised by tenant percentile. |
| Quality | ~0.05 | quality_score from the semantic engine. |
| Consistency | ~0.025 | consistency_score — penalised when contradicted. |
| Decay | ~0.025 | decay_score — drops on long no-recall stretches. |
Why the vector namespace is per-user
Vector recall scopes to the namespace memorysync-user-{user_id}. This is not an optimisation — it is a safety boundary. Even if the metadata filter is misconfigured, a query physically cannot return another user's vectors because they live in a different namespace. Tenant isolation works the same way at the database layer; the vector layer reinforces it.
What the response includes
JSON
{
"memories": [
{
"id": 18421,
"content": "User Alice prefers concise replies and uses dark mode.",
"tier": "hot",
"score": 0.83,
"score_breakdown": {
"semantic": 0.91,
"recency": 0.62,
"importance": 0.70,
"usage": 0.40
},
"embedding_version": "v3",
"tags": ["preference", "ui"],
"created_at": "2026-05-04T10:14:32Z"
}
],
"explanation": "Top match is a recent high-importance preference fact.",
"request_id": "req_3f9c1ab2"
}What makes a query return fewer rows than you expect
- Tier filter. By default, queries scan all tiers. If you set
tier="hot", cold and warm rows are excluded at the index — not just deprioritised. - Embedding-version mismatch. If you upserted vectors under one model and queried under another, only same-version vectors are eligible.
- Importance gate. The default score floor drops anything below ~0.5 importance unless explicitly requested.
- Project scope. Queries are project-scoped. A memory added under one project is invisible to queries under another, even with the same user.
What bookkeeping runs after the response
usage_count++on each returned memory.last_accessed_at = now()on each returned memory.- One
memory_eventsrow per recall, used by the re-evaluation flow to spot hot memories. - Cache update — top-K result cached for identical follow-up queries (5-minute TTL).