Re-evaluation Engine
Memory importance is not static. As usage patterns change, recall frequency shifts, and new related memories are added, a memory’s value to the system evolves. The re-evaluation engine periodically re-scores memories to keep importance scores accurate and promote previously gated content that has become valuable.
Why Re-evaluation Exists
At ingestion time, the importance scorer makes its best estimate of a memory’s value. But this initial score can become stale for several reasons:
- Recall feedback. A memory that is frequently recalled is clearly valuable, but the original score didn’t know it would be recalled at all.
- New context. When related memories are added later, an isolated memory may become part of a valuable cluster.
- Usage decay. A memory that was important six months ago but hasn’t been recalled since may need a lower score.
- Threshold changes. When an admin adjusts the gating threshold, previously gated memories may now qualify for embedding.
The re-evaluation engine closes these feedback gaps by re-running the importance scorer on existing memories and acting on the results.
Feedback Loop Architecture
The re-evaluation engine forms a closed feedback loop between recall and ingestion:
- 1Recall logs accumulate. Every time a memory is recalled via
POST /memory/query, the recall engine increments the memory’s usage count and last-accessed timestamp. - 2Re-evaluation triggers. The engine identifies memories whose last evaluation is older than the configured sweep interval (default: 7 days).
- 3Re-scoring. Each candidate memory is passed through the importance scorer again, with current recall metrics as input. The new score replaces the old one.
- 4Promotion or demotion. If a re-scored memory crosses the gating threshold in either direction, the gatekeeper is notified to dispatch or gate it accordingly.
Trigger Conditions
Re-evaluation fires under three conditions:
| Trigger | Frequency | Description |
|---|---|---|
| Time-based sweep | Periodic | Background workers scan for memories whose last evaluation is older than the sweep interval. This is the primary trigger. |
| Recall frequency spike | Event-driven | When a memory’s usage count crosses a threshold since its last evaluation, it is flagged for immediate re-evaluation. |
| Manual trigger | On-demand | Admins can trigger re-evaluation for a specific project or memory set via the API. |
Batch Processing
The re-evaluation engine processes memories in configurable batches to control resource usage:
- Batch size. Each processing cycle handles a fixed number of memories (configurable per project). This prevents a single re-evaluation sweep from monopolizing compute.
- Priority ordering. Within a batch, memories are ordered by staleness — those evaluated longest ago are processed first.
- Progress tracking. The engine tracks which memories have been re-evaluated in the current sweep so it can resume from where it left off if interrupted.
- Rate limiting. Re-evaluation batches are spaced with configurable intervals to avoid overwhelming the scoring pipeline during peak ingestion.
Score Delta Handling
When a memory’s re-evaluated score differs from its current score, the engine takes specific actions based on the delta:
| Scenario | Action |
|---|---|
| Score increases above gating threshold | Memory is promoted: dispatched for embedding and added to the vector index. Status changes from gated to active. |
| Score decreases but stays above threshold | Score is updated. No embedding or indexing changes. Memory retains its position in recall rankings with the new score. |
| Score decreases below gating threshold | Score is updated but the memory is not retroactively gated. Once embedded, a memory stays in the index. The lower score naturally deprioritizes it in recall. |
| Score unchanged | Only the evaluation timestamp is updated. No other side effects. |
Idempotency & Safety
The re-evaluation engine is designed to be safe under concurrent execution and restarts:
- Idempotent scoring. Running the importance scorer twice on the same memory with the same inputs produces the same result. There are no side effects from duplicate evaluations.
- Atomic updates. Score updates and evaluation timestamps are written in a single database transaction. A crash mid-batch leaves the remaining unprocessed memories with their previous evaluation timestamp, so they are picked up in the next sweep.
- Conflict-free concurrency. Multiple workers can process different batches simultaneously without interfering, because each batch selects distinct memory IDs.
Observability
Every re-evaluation batch emits structured metrics for monitoring:
- Batch metrics. Total memories processed, average score delta, number of promotions, and processing duration per batch.
- Promotion events. Each promotion (gated → active) generates an audit log entry with the old score, new score, and the reason for re-evaluation.
- Sweep progress. Per-project dashboards show how many memories are pending re-evaluation and the estimated time to complete the current sweep.
These metrics help you understand whether your scoring thresholds are well-calibrated and whether the re-evaluation frequency is appropriate for your usage patterns.