High Availability
MemorySync is engineered to keep serving requests even when individual components fail. The platform combines multi-region capacity, automatic failover, per-tenant isolation, fail-open reads, and atomic writes so that your application keeps running through dependency outages, deploys, and partial failures — with no operational action required from your team.
HA Principles
Every layer of the platform follows three guiding principles:
- No single point of failure. The API tier, recall pipeline, background workers, and storage tiers all run with redundant capacity. Loss of any single instance is invisible to your application.
- Fail-open for reads. When a non-critical component degrades (caching, reasoning, explainability), the recall pipeline returns a result anyway and tags the response as degraded. Memory queries always produce a response.
- Fail-closed for writes. Memory add, update, and delete operations either complete fully or roll back atomically. Partial writes are never committed and never become visible to downstream queries.
Redundant Capacity
Every customer-facing service runs with built-in redundancy. You don’t configure this — it’s the default for every plan:
| Layer | Redundancy guarantee |
|---|---|
| API tier | Multiple stateless instances behind a load balancer. Loss of any single instance does not interrupt traffic; new instances are added automatically as load grows. |
| Recall pipeline | Embedding, ranking, and synthesis run across redundant compute. Outage of one upstream model provider triggers automatic fallback to a secondary provider. |
| Storage | Memory content, vectors, audit log, and tier archives are persisted with synchronous replication. A single storage-node failure does not lose data and does not require recovery action. |
| Background workers | Tier transitions, retention sweeps, compaction, and re-evaluation each run from a redundant worker pool. A worker crash mid-job is auto-recovered — the job is reclaimed and resumed by a healthy worker on the next sweep. |
Health & Status
The platform continuously monitors itself. Two surfaces are visible to you:
- Public health endpoint.
GET /healthreturns"status": "ok"when all platform dependencies are reachable, or"status": "degraded"when one or more non-critical features (e.g. caching) are temporarily affected. The endpoint is unauthenticated and rate-limited — ideal for use as a liveness probe in your own client-side health dashboards. - Public status page. Major incidents, maintenance windows, and per-region availability are published to the public status page. Subscribe via email or RSS to be notified before your traffic is impacted.
When the platform reports degraded, your write operations and recall queries continue to work — the degraded flag means an auxiliary feature (such as the cache layer or explainability capture) is temporarily impaired, not that the memory API is offline.
Per-Tenant Failure Isolation
When an upstream dependency starts failing, the failure must not cascade into other tenants’ traffic. The platform enforces this with per-tenant failure isolation:
- Per-organization circuit isolation. Each organization has independent circuit state for upstream model providers. If your account hits a string of provider errors, the circuit opens for your traffic only — every other tenant on the platform continues operating normally.
- Probing recovery. Once the cooldown period passes, the platform sends a small number of probe calls. If they succeed, traffic is restored automatically. If they fail, the cooldown extends.
- Component-level isolation. Embedding, ranking, reasoning, explainability, and the cache each have independent failure tracking. A failure in one component never disables the others — the pipeline routes around the impaired component using fallbacks.
Graceful Degradation
When a non-critical component is unavailable, the recall pipeline degrades gracefully rather than failing the request:
- 1Primary path attempted. The pipeline tries the full pipeline (embedding → vector search → reasoning → ranking → explainability capture).
- 2Component skip. If a non-critical component is unavailable (cache miss, reasoning impaired, explainability capture queue full), it is skipped instead of retried in the request path.
- 3Fallback execution. A cheaper alternative runs in place of the impaired component (semantic ranking without reasoning, raw scoring without confidence calibration).
- 4Degraded tagging. The response includes
degraded: trueand adegraded_componentsarray listing which auxiliary features were unavailable, so your application can react accordingly.
A degraded response is still a usable response. Your callers receive a valid memory list with the same shape as a fully-healthy response — only auxiliary metadata fields may be reduced or omitted.
Zero-Downtime Deploys
Platform updates are rolled out continuously without scheduled maintenance windows. Customer traffic is never interrupted:
- Rolling rollout. New versions are introduced one instance at a time. Traffic is shifted to healthy instances throughout, so there is never a moment where the API is fully unavailable.
- In-flight request drain. Before an instance is replaced, it stops accepting new connections and waits for in-flight requests to complete. No request is killed mid-flight.
- Background-job hand-off. Background jobs in progress on a draining instance are checkpointed; another worker picks them up. Tier transitions, retention sweeps, and DSR jobs survive deploys without losing progress.
- Automatic rollback. If a new version begins increasing the platform-wide error rate or latency p95, the rollout is paused and reverted automatically before more traffic is shifted.
What Your Application Sees
From the perspective of your application, the HA architecture is invisible — which is the goal. The properties you can rely on:
- No retry handling for HA events. Instance restarts, deploys, and worker crashes are handled inside the platform. Your code never needs to retry due to a platform-internal event.
- Stable response shape. Whether the pipeline is fully healthy or degraded, the JSON shape of
POST /memory/queryandPOST /memory/addresponses is identical. The optionaldegradedflag is the only signal that something auxiliary was bypassed. - Consistent latency budget. Failover and circuit-breaker activations happen below the request latency budget. You never see a several-second timeout because of an internal failure — the platform fails fast and routes around it.
- Atomic writes. A successful
POST /memory/addresponse is a durable commitment. The memory is replicated to all storage replicas before the response is returned.