MemorySyncMemorySync
Guides

Cost Optimization

Most cost on MemorySync is embeddings + intelligence. Cut the bill 40-70% by deduplicating, lowering computation tier on cheap calls, narrowing recall filters, and archiving stale data to cold tier.

Where the money goes

DriverTypical shareTunable?
Embeddings on add35–55%Yes — dedup, cache, batch
Intelligence (clustering, summaries)20–35%Yes — schedule + tier filter
Vector search5–15%Yes — k, partition, computation tier
Storage<5%Indirect — purge cold tier
Egress / SDKnegligibleNo

Quick wins

  1. Enable dedupdeduplicate: true on add (default). Cuts embedding cost on chat workloads dramatically.
  2. Use computation tier — pass computation_tier: "low" for trivial queries. Skips reranker, halves cost.
  3. Tighten k — most LLM prompts use 3–5 memories. Don’t request 20.
  4. Archive aggressively — set ttl_minutes on chat memories. Cold tier costs 10× less to store.
  5. Cache top queries — the SDK caches identical query+filter combos for 60s by default; bump it for read-heavy dashboards.

Embedding strategy

  • Pick a primary provider that matches your latency / cost tradeoff. Local fallback is free but slower.
  • Bulk imports go through batched embeddings — never call add 10k times sequentially. Use POST /memory/bulk-import.
  • For low-value text (e.g. machine-generated logs), use importance: 0.0–0.2 and tier: "cold" from the start so they skip the warm path.

Tame the intelligence pipeline

The intelligence scheduler reads org_intelligence_settings. Reduce cost by:

  • Setting cluster_interval_hours = 24 (default 6).
  • Setting summary_min_cluster_size = 8 so trivial clusters skip summarization.
  • Restricting clustering to tier IN ('hot', 'warm') — cold-tier facts rarely change.

Silent mode at the edge

For freemium and trial flows, switch the project to BILLING_QUOTA_MODE=silent. When the budget is exhausted, the API returns 200 with { "status": "ok", "skipped": true } — the SDK degrades silently and the end user is not blocked. Track skips on the usage dashboard and upgrade plans only for real usage.

Per-project budgets

BASH
curl -X PUT https://api.memorysync.io/org/billing/budgets \
-H "Authorization: Bearer USER_TOKEN" \
-d '{ "project_id": "prj_chatbot", "monthly_request_cap": 100000 }'

Caps are enforced at the request level. The dashboard shows real-time burn down and alerts at 50/80/100%.

Tip
Teams that adopt all five quick wins typically see total monthly spend drop 50–60% without measurable drop in recall quality.