MemorySyncMemorySync
Retrieval

Query Execution Flow

The full path of a single POST /memory/query from byte to byte. Twelve steps, each owned by a specific module.

Step 1 — request validation

The handler parses the body, asserts query is non-empty, clamps k to [1, 50], clamps traversal_depth to [1, 3], and returns HTTP 400 with field-level errors if any of those fail.

Step 2 — authentication and scope resolution

The auth dependency validates the credential (Bearer JWT or X-API-Key) and attaches the principal. user_id, tenant_id, project_id and environment are resolved here. The X-End-User-ID B2B header, if present, is validated against the API key's allowed set.

Step 3 — embed the query

  1. Preprocess: strip, validate non-empty, truncate at 8000 characters.
  2. Cache lookup keyed by emb:{sha256_digest[:24]}. Hit returns within ~10 ms.
  3. On miss, call the embedding service. Failures retry with exponential backoff up to five attempts.
  4. On full failure, fall back to an offline transformer if cached, otherwise to a hash-based deterministic vector.
  5. On success, write the vector to the cache with a 1 hour TTL.

Step 4 — index lookup with filter pushdown

The query vector and the metadata where-clause built from the request are sent to the per-user collection. The index returns up to k (often more, internally) candidates ordered by cosine distance.

Step 5 — fetch rows from the durable store

Memory ids from the index are batched in a single read against the memories table. Rows that do not exist (deletion in flight) are silently dropped from the candidate list. Text and summary remain encrypted at this stage.

Step 6 — compute the seven ranking factors

For each candidate the engine computes semantic, cluster, recency, importance, usage, quality, decay. Cluster centroids are bulk-loaded so this step is one round trip, not N. Each factor is min-max normalised across the batch, then weighted-summed into hybrid_score.

Step 7 — variance-floor tiebreak

If the per-batch hybrid_score standard deviation drops below 0.02, the deterministic tiebreak chain runs (semantic desc, importance desc, recency desc, id desc) and tiebreak_applied=true is set in the response.

Step 8 — optional graph traversal

When traversal_depth > 1, depth-1 neighbours are fetched, then depth-2, up to depth 3. Edges contribute the additive boosts listed on the Semantic Ranking page. Neighbour fetches at the same depth run in parallel.

Step 9 — relationship, personalisation, and session boosts

On top of the hybrid score, the engine adds the relationship-edge boost (per edge type), the personalisation boost (capped at +0.10), and the session-continuity boost (+0.03 if the memory was returned earlier in the same session_id).

Step 10 — decrypt text and summary

Memory text and summary are decrypted using the tenant's data-encryption key. Decryption failures replace the text with [REDACTED — DECRYPTION FAILED] and lower the row's effective rank instead of crashing the response.

Step 11 — compose the response

  • memories — the ranked list.
  • query_intent — the intent class the engine inferred.
  • explanations — per-memory factor breakdown.
  • confidence_scores — engine confidence per memory.
  • computation_tier — the tier the request actually ran in.
  • sla_target_ms / sla_met — the latency contract.
  • routing_info — diagnostics about which tier(s) were searched.

Step 12 — side effects

After the response is queued for return, the engine asynchronously increments usage_count, updates last_accessed_at, and may write the result to the recall result cache for popular queries. The HTTP response does not wait for these.