MemorySyncMemorySync
How It Works

Memory Query Flow

This page traces a single POST /memory/query from the inbound request to the ranked response. The query path is read-only on the database, but it touches the embedding service, the vector index, the database (for hydration), and the cache — in that order.

The six stages of one query

  1. 1Validate & resolve scope. Auth, project, tenant resolved exactly as on the write path.
  2. 2Embed the query string. The same embedding service used at write time generates the query vector. The cache is checked first; on miss, the service is called.
  3. 3Vector recall. The vector index searches the per-user namespace and returns the top-K candidates ranked by cosine similarity, filtered by retention_status="active", environment, project_id, and any caller-supplied filters.
  4. 4Hydrate. One database query — SELECT … WHERE vector_id IN (…) — pulls the full memories rows for the candidates. Decryption happens here.
  5. 5Rank. A weighted formula combines semantic similarity, recency, importance, usage, decay, consistency, quality, and confidence into a single score.
  6. 6Bookkeep. Recall signals (usage_count++, last_accessed_at=now()) are written for the returned memories so future ranking and tier transitions can use them.

What filters are honoured at recall, not after

FilterWhere it appliesAllowed values
tierVector index metadata filter — never returns the wrong-tier rows."hot" | "warm" | "cold"
(or omitted = all)
tagsVector index metadata filter — exact match on any.Array of strings.
session_idVector index metadata filter.String.
metadata_filterPost-hydration filter on the database row.Free-form JSON match.
environmentAlways implicit — taken from the credential, cannot be set in the body.development | staging | production
(set by the credential)

The ranking formula, in plain English

The final score is a weighted sum of seven signals, each normalised to [0, 1]:

SignalDefault weightWhat it measures
Semantic similarity~0.50Cosine similarity between query vector and stored vector.
Recency~0.20Decays smoothly with age. Newer wins ties.
Importance~0.15importance from creation, updated by the intelligence pipeline.
Usage~0.05usage_count normalised by tenant percentile.
Quality~0.05quality_score from the semantic engine.
Consistency~0.025consistency_score — penalised when contradicted.
Decay~0.025decay_score — drops on long no-recall stretches.

Why the vector namespace is per-user

Vector recall scopes to the namespace memorysync-user-{user_id}. This is not an optimisation — it is a safety boundary. Even if the metadata filter is misconfigured, a query physically cannot return another user's vectors because they live in a different namespace. Tenant isolation works the same way at the database layer; the vector layer reinforces it.

What the response includes

JSON
{
  "memories": [
    {
      "id": 18421,
      "content": "User Alice prefers concise replies and uses dark mode.",
      "tier": "hot",
      "score": 0.83,
      "score_breakdown": {
        "semantic": 0.91,
        "recency":  0.62,
        "importance": 0.70,
        "usage": 0.40
      },
      "embedding_version": "v3",
      "tags": ["preference", "ui"],
      "created_at": "2026-05-04T10:14:32Z"
    }
  ],
  "explanation": "Top match is a recent high-importance preference fact.",
  "request_id": "req_3f9c1ab2"
}

What makes a query return fewer rows than you expect

  • Tier filter. By default, queries scan all tiers. If you set tier="hot", cold and warm rows are excluded at the index — not just deprioritised.
  • Embedding-version mismatch. If you upserted vectors under one model and queried under another, only same-version vectors are eligible.
  • Importance gate. The default score floor drops anything below ~0.5 importance unless explicitly requested.
  • Project scope. Queries are project-scoped. A memory added under one project is invisible to queries under another, even with the same user.

What bookkeeping runs after the response

  • usage_count++ on each returned memory.
  • last_accessed_at = now() on each returned memory.
  • One memory_events row per recall, used by the re-evaluation flow to spot hot memories.
  • Cache update — top-K result cached for identical follow-up queries (5-minute TTL).