How It Works

Summarization Pipeline

Summaries condense groups of related memories — same person, same topic, same kind of fact — into a single derived row. They keep recall fast as the memory count grows and let the API return one paragraph instead of dozens of rows when that is what the caller needs. This page describes when summaries are produced, what counts as “related”, and how the link to the originals is preserved.

What counts as a summary in this system

A summary is a memory row with is_summary=true and parent_id set to a list of source memory ids. Its summary_ciphertext column holds the encrypted summary text (its text_ciphertext column is null). It participates in recall like any other memory — it has a vector, a tier, a retention state, and an importance.

The two ways a summary gets produced

Path	Trigger	Cluster shape
Incremental (inline)	The moment a cluster reaches 3 unsummarised memories the summary is generated during the write that crossed the threshold — no waiting for a periodic tick. Subsequent writes that grow the cluster refresh the existing summary in place.	Minimum 3 members. No upper cap — clusters keep growing as more matching facts arrive.
Periodic worker (defense in depth)	If the inline path failed for any reason, a background worker picks up clusters that have at least 3 unsummarised members and produces the summary on its next tick. This is a safety net, not the primary path.	Same threshold — minimum 3 members.

Why 3?

SUMMARY_MIN_CLUSTER_ITEMS=3. Below three facts about the same thing the cluster is treated as sparse — recall returns the raw memories directly. Three is the smallest count that produces a summary the model can generalise from without inventing detail. The threshold is configurable per deployment.

How clustering works — same fact, same person, same project

A cluster is keyed by (user_id, project_id, environment, extracted_type, extracted_key). In plain language: same user, same project, same environment, same kind of fact, same subject. So three preferences about Alice's UI theme form one cluster; three preferences about Alice's email cadence form a different cluster; preferences about Bob's UI theme are a third.

1On every memory write, the ingestion path extracts the extracted_type (e.g. preference, fact, relationship) and the extracted_key (the subject — a person, a topic, an entity).
2The new memory is attached to its cluster via nearest-centroid matching against the existing centroids in scope. If the cosine distance is too far from any existing centroid, a new cluster is opened.
3When the cluster's count of unsummarised members reaches 3, the inline summariser is triggered for that cluster only. Other clusters are unaffected.
4Subsequent writes that land in the same cluster grow the membership; the summary is refreshed in place to cover the new union of source memory ids.
5When source memories are deleted, the refreshed summary drops them — deleted facts never resurface in the summary.

What the summariser does with a cluster

Decrypts the source rows and assembles the prompt with strict token-budget enforcement — clusters that would exceed the budget are split.
Generates the summary text.
Embeds the summary just like any memory (same pipeline, same version stamp).
Encrypts and writes the summary row with is_summary=true, parent_id = the cluster's source ids.
Re-scores the summary's quality through the semantic engine; low-quality summaries are discarded and the cluster is left alone.
Per-tenant guards: cooldown of 2 minutes between scheduler runs for the same tenant, max 20 summary jobs per user+project per scheduler run, max 3 concurrent jobs per tenant, and a configurable monthly cost ceiling that fails over to the raw-memory fallback when exhausted.

Refresh, not rewrite — how summaries stay current

Once a cluster has a summary, the system never abandons it just because new members arrived. As the membership grows, the existing summary is refreshed in place: the union of currently-active source memory ids is recomputed, prior source ids that have been deleted are dropped, and a new summary is generated to cover the updated union.

If the union is identical to what the existing summary already covers (nothing changed), the refresh is a no-op — the same summary stays.
If the union shrinks below the minimum threshold (because too many sources were deleted), the summary itself is deleted and recall falls back to the surviving raw memories.
Source memories are not demoted or hidden when a summary is created — they stay alive and queryable. The summary is an additional row, not a replacement.

How recall uses summaries

Summaries are eligible for recall just like any memory.
Their importance is propagated from the cluster — a summary of three highly-important memories scores higher at recall than a summary of three low-importance ones.
When a summary returns, the response carries the source memory ids so the caller can re-expand the cluster with a follow-up read if needed.
When the cluster has fewer than 3 members (the “sparse” case), no summary exists and recall returns the raw memories directly — no synthesis, no risk of fabrication.

What to watch when tuning the summariser

Sparse-cluster ratio — clusters that never reach 3 members produce no summary. If most clusters are sparse, your extracted_key extraction is too granular; loosen it.
Refresh churn — if the same cluster is being refreshed many times per hour, members are being added in tiny batches. Consider raising the cluster size before each refresh.
Token spend per summary — climbs when clusters grow large. Each tenant has a configurable monthly cost ceiling; over-cap clusters fall back to raw-memory recall instead of summarising.

← Previous

Embedding Pipeline

Intelligence Pipeline