MemorySyncMemorySync
Billing

Silent Mode Explained

Silent mode is MemorySync’s quota-breach behavior. When your organization exceeds its plan limit, requests are not rejected. Instead, they return a normal HTTP 200 response — but the underlying operation is silently skipped. This page explains exactly what happens and how to detect it.

The Silent Contract

Silent mode exists because MemorySync is designed to sit in the hot path of LLM pipelines. If a billing quota breach threw a hard error, it would crash your AI agent mid-conversation. Silent mode prevents this by design:

  • Your application’s user never sees a billing error. The LLM continues working — it just doesn’t benefit from memory enrichment until quota is restored.
  • The API contract is preserved: you get a valid 200 OK response with the correct schema. No exception handling is needed in your integration code.
  • Silent skips are not counted: silently degraded requests do not increment your usage counter, so you do not pay for skipped work.

How Silent Skip Works

When a request arrives after your monthly quota has been reached, the atomic check refuses to admit it. From there:

  • The counter does not advance. Your usage stays exactly at the plan limit — it never exceeds it.
  • The pipeline is skipped entirely. The request handler short-circuits before any memory work is done.
  • A canonical 200 response is returned. The body is a well-formed success response so your code does not need any error-handling branches.

For add requests this means no memory is stored, embedded, or indexed. For query requests this means no search is run — the response is an empty result set.

Response Shapes During Silent Degradation

During silent degradation, the response body is still a valid, well-formed response. The difference is in what the body contains:

OperationHTTP StatusResponse behavior
Add memory200 OKReturns the canonical success body. No memory is stored, embedded, or indexed — the write pipeline is skipped entirely.
Query / search200 OKReturns an empty memories array. The retrieval pipeline is skipped — no search is run.
Watch for this
If your LLM starts producing responses that seem to lack context it used to have, you may be in silent degradation. The API will not surface an error. Check your usage dashboard — metrics sitting at exactly 100% of their plan limit are the signal.

How to Detect Silent Degradation

Silent degradation is invisible to your application code. To detect it, watch the surfaces that are visible:

  • Usage dashboard. Each metric shows a current usage value and a plan limit. When usage equals the limit, every additional request for that metric is silently degraded until the cycle resets or you upgrade.
  • Usage alerts. Configure alert thresholds in the dashboard (for example, at 80% and 100% of plan limit) to receive notifications before and after a metric is exhausted.
  • Trend deltas. The dashboard surfaces a percentage change vs. the previous cycle — a steep upward trend is a leading indicator that you may exhaust quota mid-cycle.

There is no need to build custom log scraping — everything you need to detect and react to silent degradation is exposed in the dashboard and the alerts you configure there.

No Automatic Retry

Add requests that are silently skipped are not retried automatically when quota restores:

  • If you send 500 add requests while over quota, none of them are stored or embedded — the write pipeline is skipped for each one.
  • When your cycle resets (or you upgrade), new add requests will be processed normally, but the previously skipped writes are not replayed.
  • To recover skipped writes, you must re-submit them as new add requests after quota is restored. There is no internal “pending queue” that holds skipped writes for later.

This is by design — an automatic replay of potentially millions of deferred writes would create unbounded memory pressure and unpredictable cost spikes the moment quota restores.

Silent Mode vs Rate Limiting

Silent mode and rate limiting are two separate systems. Don’t confuse them:

AspectRate LimitingBilling Quota (Silent Mode)
ScopePer-IP or per-routePer-organization, per-metric
HTTP responseAlways 429200 OK with silently skipped operation
RecoveryWait seconds (has Retry-After)Wait for cycle reset or upgrade
WindowSliding seconds/minutesCalendar-month billing cycle
PurposeAbuse preventionPlan enforcement

A request can be rate-limited even if you have plenty of billing quota remaining. Conversely, you can exhaust your billing quota while never triggering rate limits (if your request rate is slow but sustained over the month).