Guides

Build a Chatbot with Memory

Wire MemorySync to your LLM in three places: before generation (recall context), after generation (write turn summary), and on user feedback (boost importance). Total added latency: ~30–80 ms p50.

Architecture

Step 1 — Recall before generation

TYPESCRIPT

async function buildPrompt(userId: string, message: string) {
  const ms = new MemorySync({ apiKey: process.env.MS_KEY!, projectId: "prj_chatbot" })

  const result = await ms.memory.query({
    query: message,
    k: 5,
    sessionId: userId,            // pin session for cross-turn boosting
    filters: { tier: "hot" },
    weights: { semantic: 0.5, recency: 0.2, importance: 0.2, graph: 0.1 },
  })

  const memoryBlock = result.memories
    .map((m, i) => `[memory ${i+1}] (${m.tier}) ${m.text}`)
    .join("\n")

  return `You have access to the user's prior context:\n${memoryBlock}\n\nUser: ${message}`
}

Step 2 — Store the turn after generation

TYPESCRIPT

async function storeTurn(userId: string, userMsg: string, assistantMsg: string) {
  // Compress the turn into one memory; the intelligence pipeline will add structure.
  await ms.memory.add({
    text: `User asked: "${userMsg}". Assistant answered: "${assistantMsg}".`,
    source: "chat",
    tags: ["turn"],
    importance: 0.4,
    metadata: { user_id: userId },
  })
}

Step 3 — Use feedback to tune

On a 👍 from the user, bump the importance of the memories that fed the prompt. On 👎, lower importance and increment a feedback counter that the daily re-evaluator reads.

TYPESCRIPT

await ms.memory.update(memoryId, { importance: 0.7 })           // up-vote
await ms.memory.update(memoryId, { importance: 0.2 })           // down-vote

Avoiding turn-spam

Store one memory per topic shift, not every turn. A simple heuristic: only write when the user introduces a new entity or asks a question.
Use deduplicate: true (default). The pipeline collapses near-duplicates within 24h.
Set ttl_minutes for ephemeral chat data so old turns auto-archive.

Latency tips

Run recall in parallel with non-LLM context fetches (user profile, tools).
Use computation_tier: "low" for trivial follow-ups (< 6 tokens).
Keep k ≤ 5 unless you actually pass them all into the prompt.
Pin session_id across the conversation — repeats hit warmer caches.

Cost

A typical 10-turn conversation: 10 recalls (medium tier) + 10 adds + 10 embeddings ≈ ~150–250 µ¢ depending on provider. Compare this against the LLM completion cost on the same conversation; MemorySync usually accounts for 1–3% of total cost.

← Previous

Next Steps

Personalization