Memory Model

Metadata & Semantic Fields

Metadata on a memory is split into two camps: fields the caller chose, and fields the platform owns. This page lists every field in both camps, what it is used for downstream, and which ones are safe to leave unset.

Fields the caller chooses at write time

Key	Value
`<code>metadata</code>`	Free-form JSON object. Common keys seen in real traffic: `file_type`, `url`, `author`, `department`. No schema is enforced; the only limit is total payload size.
`<code>tags</code>`	Array of strings. Used as an OR filter at recall time — `tags=["budget", "urgent"]` matches a memory with either label.
`<code>source</code>`	String. Origin label such as `email`, `slack`, `web`, `api`, `upload`.
`<code>event_type</code>`	String. Trigger label such as `meeting`, `decision`, `status_update`.
`<code>importance</code>`	Float in [0, 1]. Defaults to `0.5` if omitted. Drives the importance factor in the ranker.
`<code>ttl_minutes</code>`	Integer. When set, sets `expires_at = now + ttl_minutes` and accelerates the retention timeline.

Semantic fields the platform fills in

key_points — list of concrete takeaways extracted by the analysis pass.
insights — higher-order observations layered on top of key points.
entities — named entities surfaced from the text.
topics — semantic topic labels.
context — short paragraph of why this memory matters.
memory_type — one of fact | event | decision | insight | narrative | analysis | reference.

Extraction identity

The extraction pipeline classifies the candidate into a category and produces a canonical key for it. Both are stored on the row and become first-class metadata.

Field	What it carries	Used by
`extracted_type`	Semantic category — `fact`, `event`, `decision`, `preference`, `temporal`, …	Hard filtering at recall time and clustering for summarization.
`extracted_key`	Canonical dedup key, e.g. `person:alice/role:cto`.	Idempotency — re-asserting the same fact updates the row instead of creating duplicates.
`extraction_attempt_id`	Link to the extraction run.	Audit, debugging, and re-evaluation triggers.

Platform-owned scoring fields

importance_score — deterministic 0–100 score from the gating pass.
quality_score / confidence_score / freshness_score — all in [0, 1].
decay_score — defaults to 1.0; decays exponentially without recall hits.
consistency_score — defaults to 1.0; drops as contradictions accumulate.
evidence_count — number of supports/extends edges.

Tier and importance interact at recall time

tier picks which slice of memories the index even searches: hot for the active tier, warm for occasionally-accessed, cold for archived. importance only matters within the chosen tier — it does not promote a cold memory back into the hot search. To do that, you raise the row's tier explicitly.

What the API will and will not let you set

Allowed: metadata, tags, source, event_type, importance, ttl_minutes, content_type.
Ignored if you send them: extracted_type, extracted_key, quality_score, decay_score, retention_status, vector_id. The platform owns these.
Forbidden: user_id, tenant_id, project_id in the body. Project comes from the X-Project-ID header; the rest come from auth.

Which fields are safe to omit

All of them, except text. The system fills in defaults for importance, environment, tier, retention_status, and every scoring field. Setting metadata you do not need just bloats the payload — leave it blank and the platform will keep the row at its defaults.

← Previous

Vector Representation

Memory Types