Guides

Cost Optimization

Most cost on MemorySync is embeddings + intelligence. Cut the bill 40-70% by deduplicating, lowering computation tier on cheap calls, narrowing recall filters, and archiving stale data to cold tier.

Where the money goes

Driver	Typical share	Tunable?
Embeddings on add	35–55%	Yes — dedup, cache, batch
Intelligence (clustering, summaries)	20–35%	Yes — schedule + tier filter
Vector search	5–15%	Yes — k, partition, computation tier
Storage	<5%	Indirect — purge cold tier
Egress / SDK	negligible	No

Quick wins

Enable dedup — deduplicate: true on add (default). Cuts embedding cost on chat workloads dramatically.
Use computation tier — pass computation_tier: "low" for trivial queries. Skips reranker, halves cost.
Tighten k — most LLM prompts use 3–5 memories. Don’t request 20.
Archive aggressively — set ttl_minutes on chat memories. Cold tier costs 10× less to store.
Cache top queries — the SDK caches identical query+filter combos for 60s by default; bump it for read-heavy dashboards.

Embedding strategy

Pick a primary provider that matches your latency / cost tradeoff. Local fallback is free but slower.
Bulk imports go through batched embeddings — never call add 10k times sequentially. Use POST /memory/bulk-import.
For low-value text (e.g. machine-generated logs), use importance: 0.0–0.2 and tier: "cold" from the start so they skip the warm path.

Tame the intelligence pipeline

The intelligence scheduler reads org_intelligence_settings. Reduce cost by:

Setting cluster_interval_hours = 24 (default 6).
Setting summary_min_cluster_size = 8 so trivial clusters skip summarization.
Restricting clustering to tier IN ('hot', 'warm') — cold-tier facts rarely change.

Silent mode at the edge

For freemium and trial flows, switch the project to BILLING_QUOTA_MODE=silent. When the budget is exhausted, the API returns 200 with { "status": "ok", "skipped": true } — the SDK degrades silently and the end user is not blocked. Track skips on the usage dashboard and upgrade plans only for real usage.

Per-project budgets

BASH

curl -X PUT https://api.memorysync.io/org/billing/budgets \
  -H "Authorization: Bearer USER_TOKEN" \
  -d '{ "project_id": "prj_chatbot", "monthly_request_cap": 100000 }'

Caps are enforced at the request level. The dashboard shows real-time burn down and alerts at 50/80/100%.

Tip

Teams that adopt all five quick wins typically see total monthly spend drop 50–60% without measurable drop in recall quality.

← Previous

Multi-tenant Applications

What is a Memory