Advanced

Re-evaluation Engine

Memory importance is not static. As usage patterns change, recall frequency shifts, and new related memories are added, a memory’s value to the system evolves. The re-evaluation engine periodically re-scores memories to keep importance scores accurate and promote previously gated content that has become valuable.

Why Re-evaluation Exists

At ingestion time, the importance scorer makes its best estimate of a memory’s value. But this initial score can become stale for several reasons:

Recall feedback. A memory that is frequently recalled is clearly valuable, but the original score didn’t know it would be recalled at all.
New context. When related memories are added later, an isolated memory may become part of a valuable cluster.
Usage decay. A memory that was important six months ago but hasn’t been recalled since may need a lower score.
Threshold changes. When an admin adjusts the gating threshold, previously gated memories may now qualify for embedding.

The re-evaluation engine closes these feedback gaps by re-running the importance scorer on existing memories and acting on the results.

Feedback Loop Architecture

The re-evaluation engine forms a closed feedback loop between recall and ingestion:

1Recall logs accumulate. Every time a memory is recalled via POST /memory/query, the recall engine increments the memory’s usage count and last-accessed timestamp.
2Re-evaluation triggers. The engine identifies memories whose last evaluation is older than the configured sweep interval (default: 7 days).
3Re-scoring. Each candidate memory is passed through the importance scorer again, with current recall metrics as input. The new score replaces the old one.
4Promotion or demotion. If a re-scored memory crosses the gating threshold in either direction, the gatekeeper is notified to dispatch or gate it accordingly.

Trigger Conditions

Re-evaluation fires under three conditions:

Trigger	Frequency	Description
Time-based sweep	Periodic	Background workers scan for memories whose last evaluation is older than the sweep interval. This is the primary trigger.
Recall frequency spike	Event-driven	When a memory’s usage count crosses a threshold since its last evaluation, it is flagged for immediate re-evaluation.
Manual trigger	On-demand	Admins can trigger re-evaluation for a specific project or memory set via the API.

Batch Processing

The re-evaluation engine processes memories in configurable batches to control resource usage:

Batch size. Each processing cycle handles a fixed number of memories (configurable per project). This prevents a single re-evaluation sweep from monopolizing compute.
Priority ordering. Within a batch, memories are ordered by staleness — those evaluated longest ago are processed first.
Progress tracking. The engine tracks which memories have been re-evaluated in the current sweep so it can resume from where it left off if interrupted.
Rate limiting. Re-evaluation batches are spaced with configurable intervals to avoid overwhelming the scoring pipeline during peak ingestion.

Score Delta Handling

When a memory’s re-evaluated score differs from its current score, the engine takes specific actions based on the delta:

Scenario	Action
Score increases above gating threshold	Memory is promoted: dispatched for embedding and added to the vector index. Status changes from `gated` to `active`.
Score decreases but stays above threshold	Score is updated. No embedding or indexing changes. Memory retains its position in recall rankings with the new score.
Score decreases below gating threshold	Score is updated but the memory is not retroactively gated. Once embedded, a memory stays in the index. The lower score naturally deprioritizes it in recall.
Score unchanged	Only the evaluation timestamp is updated. No other side effects.

Design decision

Active memories are never retroactively gated because removing a memory from the vector index could break recall chains that depend on it. Demotion happens naturally through lower ranking scores.

Idempotency & Safety

The re-evaluation engine is designed to be safe under concurrent execution and restarts:

Idempotent scoring. Running the importance scorer twice on the same memory with the same inputs produces the same result. There are no side effects from duplicate evaluations.
Atomic updates. Score updates and evaluation timestamps are written in a single database transaction. A crash mid-batch leaves the remaining unprocessed memories with their previous evaluation timestamp, so they are picked up in the next sweep.
Conflict-free concurrency. Multiple workers can process different batches simultaneously without interfering, because each batch selects distinct memory IDs.

Observability

Every re-evaluation batch emits structured metrics for monitoring:

Batch metrics. Total memories processed, average score delta, number of promotions, and processing duration per batch.
Promotion events. Each promotion (gated → active) generates an audit log entry with the old score, new score, and the reason for re-evaluation.
Sweep progress. Per-project dashboards show how many memories are pending re-evaluation and the estimated time to complete the current sweep.

These metrics help you understand whether your scoring thresholds are well-calibrated and whether the re-evaluation frequency is appropriate for your usage patterns.

← Previous

Importance Scoring

Memory Compaction