Query Execution Flow
The full path of a single POST /memory/query from byte to byte. Twelve steps, each owned by a specific module.
Step 1 — request validation
The handler parses the body, asserts query is non-empty, clamps k to [1, 50], clamps traversal_depth to [1, 3], and returns HTTP 400 with field-level errors if any of those fail.
Step 2 — authentication and scope resolution
The auth dependency validates the credential (Bearer JWT or X-API-Key) and attaches the principal. user_id, tenant_id, project_id and environment are resolved here. The X-End-User-ID B2B header, if present, is validated against the API key's allowed set.
Step 3 — embed the query
- Preprocess: strip, validate non-empty, truncate at 8000 characters.
- Cache lookup keyed by
emb:{sha256_digest[:24]}. Hit returns within ~10 ms. - On miss, call the embedding service. Failures retry with exponential backoff up to five attempts.
- On full failure, fall back to an offline transformer if cached, otherwise to a hash-based deterministic vector.
- On success, write the vector to the cache with a 1 hour TTL.
Step 4 — index lookup with filter pushdown
The query vector and the metadata where-clause built from the request are sent to the per-user collection. The index returns up to k (often more, internally) candidates ordered by cosine distance.
Step 5 — fetch rows from the durable store
Memory ids from the index are batched in a single read against the memories table. Rows that do not exist (deletion in flight) are silently dropped from the candidate list. Text and summary remain encrypted at this stage.
Step 6 — compute the seven ranking factors
For each candidate the engine computes semantic, cluster, recency, importance, usage, quality, decay. Cluster centroids are bulk-loaded so this step is one round trip, not N. Each factor is min-max normalised across the batch, then weighted-summed into hybrid_score.
Step 7 — variance-floor tiebreak
If the per-batch hybrid_score standard deviation drops below 0.02, the deterministic tiebreak chain runs (semantic desc, importance desc, recency desc, id desc) and tiebreak_applied=true is set in the response.
Step 8 — optional graph traversal
When traversal_depth > 1, depth-1 neighbours are fetched, then depth-2, up to depth 3. Edges contribute the additive boosts listed on the Semantic Ranking page. Neighbour fetches at the same depth run in parallel.
Step 9 — relationship, personalisation, and session boosts
On top of the hybrid score, the engine adds the relationship-edge boost (per edge type), the personalisation boost (capped at +0.10), and the session-continuity boost (+0.03 if the memory was returned earlier in the same session_id).
Step 10 — decrypt text and summary
Memory text and summary are decrypted using the tenant's data-encryption key. Decryption failures replace the text with [REDACTED — DECRYPTION FAILED] and lower the row's effective rank instead of crashing the response.
Step 11 — compose the response
memories— the ranked list.query_intent— the intent class the engine inferred.explanations— per-memory factor breakdown.confidence_scores— engine confidence per memory.computation_tier— the tier the request actually ran in.sla_target_ms/sla_met— the latency contract.routing_info— diagnostics about which tier(s) were searched.
Step 12 — side effects
After the response is queued for return, the engine asynchronously increments usage_count, updates last_accessed_at, and may write the result to the recall result cache for popular queries. The HTTP response does not wait for these.