Guides
Cost Optimization
Most cost on MemorySync is embeddings + intelligence. Cut the bill 40-70% by deduplicating, lowering computation tier on cheap calls, narrowing recall filters, and archiving stale data to cold tier.
Where the money goes
| Driver | Typical share | Tunable? |
|---|---|---|
| Embeddings on add | 35–55% | Yes — dedup, cache, batch |
| Intelligence (clustering, summaries) | 20–35% | Yes — schedule + tier filter |
| Vector search | 5–15% | Yes — k, partition, computation tier |
| Storage | <5% | Indirect — purge cold tier |
| Egress / SDK | negligible | No |
Quick wins
- Enable dedup —
deduplicate: trueon add (default). Cuts embedding cost on chat workloads dramatically. - Use computation tier — pass
computation_tier: "low"for trivial queries. Skips reranker, halves cost. - Tighten
k— most LLM prompts use 3–5 memories. Don’t request 20. - Archive aggressively — set
ttl_minuteson chat memories. Cold tier costs 10× less to store. - Cache top queries — the SDK caches identical query+filter combos for 60s by default; bump it for read-heavy dashboards.
Embedding strategy
- Pick a primary provider that matches your latency / cost tradeoff. Local fallback is free but slower.
- Bulk imports go through batched embeddings — never call add 10k times sequentially. Use
POST /memory/bulk-import. - For low-value text (e.g. machine-generated logs), use
importance: 0.0–0.2andtier: "cold"from the start so they skip the warm path.
Tame the intelligence pipeline
The intelligence scheduler reads org_intelligence_settings. Reduce cost by:
- Setting
cluster_interval_hours = 24(default 6). - Setting
summary_min_cluster_size = 8so trivial clusters skip summarization. - Restricting clustering to
tier IN ('hot', 'warm')— cold-tier facts rarely change.
Silent mode at the edge
For freemium and trial flows, switch the project to BILLING_QUOTA_MODE=silent. When the budget is exhausted, the API returns 200 with { "status": "ok", "skipped": true } — the SDK degrades silently and the end user is not blocked. Track skips on the usage dashboard and upgrade plans only for real usage.
Per-project budgets
BASH
curl -X PUT https://api.memorysync.io/org/billing/budgets \-H "Authorization: Bearer USER_TOKEN" \-d '{ "project_id": "prj_chatbot", "monthly_request_cap": 100000 }'
Caps are enforced at the request level. The dashboard shows real-time burn down and alerts at 50/80/100%.
Tip
Teams that adopt all five quick wins typically see total monthly spend drop 50–60% without measurable drop in recall quality.