Debugging

Wrong Results

If your queries return results but they're not the ones you expected, the issue is in how the retrieval pipeline ranks, deduplicates, and selects memories. This page explains the exact mechanics of result ranking so you can understand why a specific memory was surfaced (or suppressed) and how to tune the system for your use case.

How Results Are Ranked

Every candidate memory is scored on three factors that are combined into a single ranking score:

Semantic similarity — how close the memory’s vector is to your query’s vector. Higher means more semantically relevant.
Recency — more recently updated memories rank higher than older ones, with a smooth decay over time.
Importance — memories the platform considers high-value rank higher than low-value ones.

For most queries, semantic similarity dominates the score, so the most relevant memory typically wins — unless a slightly less relevant but much more recent or much more important memory outscores it on the combined ranking.

Adaptive Weight Selection

The platform reads your query text and adapts how the three ranking factors are combined:

Default — for most queries, semantic similarity dominates and recency has a moderate effect.
Preference-style queries (e.g., “what does the user prefer”) — importance is given more weight, so consequential preferences win over more recent but less important matches.
Time-sensitive queries (e.g., “what happened recently”) — recency dominates so the freshest information surfaces first.

💡 Debugging tip: If you are getting “wrong” results, check whether your query text contains words that imply a recency or preference focus you did not intend. Removing those words often returns more semantically relevant results.

Staleness Penalty

Memories that have not been recalled in a long time receive a staleness penalty that reduces their final score. This prevents outdated information from dominating results just because it has high importance or happened to be semantically similar.

The platform tracks the most recent time each memory appeared in a query result.
Once a memory has gone unused for a long stretch, a staleness factor is applied to its ranking score.
Memories that have never been recalled fall back to their creation time for this signal.

Self-reinforcing effect: Each time a memory appears in a query result, its “last recalled” signal is refreshed. Frequently-recalled memories stay fresh while unused memories gradually decay. This is intentional — the platform surfaces the information that is actually being used.

If this is causing wrong results: A semantically perfect match may be penalized for staleness. The dashboard surfaces the most-recently-recalled timestamp for each memory so you can confirm whether this is the cause.

Conflict Resolution

Before ranking, candidates that represent the same underlying fact are reduced to the single most recently updated entry. The rule is simple: same fact identity, keep the newest.

Why this matters for “wrong” results:

Stale facts are suppressed: If a user changed their preference from “dark theme” to “light theme”, only the “light theme” memory survives — even if the “dark theme” memory was more semantically similar to the query.
No semantic comparison happens here: the rule is based purely on fact identity and recency, not on text similarity.

When this causes unexpected results: If two memories share a fact identity but actually represent different things, the older one is suppressed. In that case the underlying issue is the structured-fact labeling rather than the ranking itself.

Semantic Dedup Threshold

After ranking, the platform compares the remaining candidates and collapses near-duplicates. When two memories effectively mean the same thing, only the higher-scored one is kept.

Examples:

“The user prefers dark mode” and “User likes dark theme” — collapsed (paraphrases of the same fact).
“The user prefers dark mode” and “The user works at night” — both kept (different facts).

When this removes results you wanted: If two memories are semantically very similar but represent meaningfully different information, they may still be collapsed. If this is too aggressive for your use case, make your memories more distinct by including differentiating context in the content field.

Debugging Ranking Decisions

When you need to understand exactly why a specific result was ranked the way it was, follow this diagnostic workflow:

Check the query intent. Words like “recent”, “latest”, “prefer”, or “favorite” can shift ranking towards recency or importance. Try the same query without them.
Check the memory’s importance. If it is below the retrieval threshold, it is filtered out before ranking.
Check for supersession. If a newer memory represents the same fact, the older one is suppressed.
Check the staleness signal. A memory that has not been recalled in a long time may be penalized.
Check for semantic dedup. If a very similar memory scored higher, it could be collapsing the one you expect. Request the maximum allowed top_k to see whether it surfaces with more room.
Check the environment and project. Confirm you are querying the same environment and project where the memory was stored.

✅ Quick test: Use the direct GET /memory/{id} endpoint to confirm the memory exists and inspect its importance, status, environment, and last-recalled timestamp. The combination of those fields tells you whether the memory was filtered out before ranking or suppressed during ranking.

← Previous

Missing Memories

Webhook Failures