Retrieval

Retrieval Pipeline

Retrieval is how the platform turns a question — a string — into a ranked list of memories. There are several entry points; this page maps each one to what it does, who should reach for it, and what it returns.

The canonical recall endpoint

POST /memory/query is what 99% of traffic uses. It accepts a query string, optional filters, optional ranking weights, an optional session_id for multi-turn context, and a traversal depth. It returns a MemoryQueryResponse with the ranked memories, an inferred query intent, per-memory scoring breakdowns, the SLA target it tried to hit, and routing diagnostics. POST /memory/retrieve is an alias for the same handler.

What the handler validates before it runs

query must be a non-empty string.
k is clamped to [1, 50] (default 5).
traversal_depth is clamped to [1, 3] (default 2).
Authentication is enforced — every request resolves a user_id, a tenant_id, optionally a project_id from X-Project-ID, and an environment from the API key's settings. None of those can be overridden from the body.

What the body can shape

Field	Effect
`filters.sources`	Restrict to specific origin labels.
`filters.tags`	OR-match on tags.
`filters.since` / `filters.until`	Time range on `created_at`.
`filters.include_summaries`	When `false`, drops `is_summary=true` rows.
`filters.tier`	Restricts to a single tier.
`weights`	Override the default ranking weights. Values are normalised to sum to 1.
`session_id`	Activates session-continuity boost.
`traversal_depth`	Controls multi-hop graph expansion.
`computation_tier`	Force-pick a latency tier (`low` / `medium` / `high`).

Routed knowledge search

POST /memory/search/routed is the compliance-aware path. It accepts a query and an optional force_intent (factual / analytical / hybrid) and may refuse the query outright if it triggers a sensitivity rule. Use this for end-user-facing surfaces where you want refusals visible to the caller.

Synthesise

POST /memory/synthesize chains a recall and a model pass: it pulls the top-k memories for a query and produces a single coherent narrative or answer instead of a ranked list. Used for executive summaries and FAQ-style replies. Cost-gated and slower than raw recall.

Graph and cluster endpoints

GET /memory/graph — returns nodes and typed edges for the relationship graph. Supports scope=user|project|tenant and a user_id filter. Used by the dashboard's graph visualiser.
GET /memory/clusters — returns semantic clusters and their member counts. Useful for "what themes are in my memories?" panels.

When to use which

Conversational agent → POST /memory/query with session_id.
End-user search box → POST /memory/search/routed so refusals propagate.
Briefing or report generation → POST /memory/synthesize.
Knowledge-graph UI → GET /memory/graph + GET /memory/clusters.

← Previous

Common Mistakes

How Vector Search Works