MemorySyncMemorySync
Guides

Build a Chatbot with Memory

Wire MemorySync to your LLM in three places: before generation (recall context), after generation (write turn summary), and on user feedback (boost importance). Total added latency: ~30–80 ms p50.

Architecture

Step 1 — Recall before generation

TYPESCRIPT
async function buildPrompt(userId: string, message: string) {
const ms = new MemorySync({ apiKey: process.env.MS_KEY!, projectId: "prj_chatbot" })
const result = await ms.memory.query({
query: message,
k: 5,
sessionId: userId, // pin session for cross-turn boosting
filters: { tier: "hot" },
weights: { semantic: 0.5, recency: 0.2, importance: 0.2, graph: 0.1 },
})
const memoryBlock = result.memories
.map((m, i) => `[memory ${i+1}] (${m.tier}) ${m.text}`)
.join("\n")
return `You have access to the user's prior context:\n${memoryBlock}\n\nUser: ${message}`
}

Step 2 — Store the turn after generation

TYPESCRIPT
async function storeTurn(userId: string, userMsg: string, assistantMsg: string) {
// Compress the turn into one memory; the intelligence pipeline will add structure.
await ms.memory.add({
text: `User asked: "${userMsg}". Assistant answered: "${assistantMsg}".`,
source: "chat",
tags: ["turn"],
importance: 0.4,
metadata: { user_id: userId },
})
}

Step 3 — Use feedback to tune

On a 👍 from the user, bump the importance of the memories that fed the prompt. On 👎, lower importance and increment a feedback counter that the daily re-evaluator reads.

TYPESCRIPT
await ms.memory.update(memoryId, { importance: 0.7 }) // up-vote
await ms.memory.update(memoryId, { importance: 0.2 }) // down-vote

Avoiding turn-spam

  • Store one memory per topic shift, not every turn. A simple heuristic: only write when the user introduces a new entity or asks a question.
  • Use deduplicate: true (default). The pipeline collapses near-duplicates within 24h.
  • Set ttl_minutes for ephemeral chat data so old turns auto-archive.

Latency tips

  1. Run recall in parallel with non-LLM context fetches (user profile, tools).
  2. Use computation_tier: "low" for trivial follow-ups (< 6 tokens).
  3. Keep k ≤ 5 unless you actually pass them all into the prompt.
  4. Pin session_id across the conversation — repeats hit warmer caches.

Cost

A typical 10-turn conversation: 10 recalls (medium tier) + 10 adds + 10 embeddings ≈ ~150–250 µ¢ depending on provider. Compare this against the LLM completion cost on the same conversation; MemorySync usually accounts for 1–3% of total cost.