MemorySyncMemorySync
Production

Monitoring & Observability

The MemorySync dashboard gives you per-tenant visibility into request volume, latency, error rate, recall quality, cost, and the health of every background job. The platform handles metric collection internally — you don’t scrape endpoints, install agents, or manage retention. This page describes the surfaces you actually use to operate against the platform.

Observability Surfaces

Three customer-facing surfaces give you everything you need to operate confidently in production:

SurfaceWhat it answers
Dashboard – Overview"Is my tenant healthy right now?" High-level cards for request rate, latency p95, error rate, recall quality, cost, and quota usage.
Dashboard – API Logs"Why did that specific request fail / spike?" Per-request detail with status, latency, request ID, trace ID, and grouped failure clusters.
Dashboard – Alerts"What changed?" Threshold-based alerts on latency, error rate, recall quality, cost, and dead-letter counts, with delivery to email, Slack, or webhook.

All three surfaces are scoped to your organization and respect your role-based access. There is no internal monitoring endpoint to expose, no agent to install, and no scrape target to maintain.

What Is Measured

Every customer-relevant signal is collected and aggregated automatically. The dashboard surfaces the following categories for each project and time window:

CategorySignals
TrafficRequests per minute, split by endpoint, status code, and API key. Includes 4xx vs 5xx breakdown and 429 counts.
LatencyPer-endpoint p50, p95, p99, and max. Pipeline stage timing breakdown for recall queries (embedding, retrieval, ranking, reasoning, assembly).
Recall qualityPer-query recall score, rolling-window quality average, and a per-project quality trend line. Drops below threshold are auto-flagged.
Memory lifecycleActive memory count by tier, ingestion rate, dedup rate, compaction merges, tier transitions, soft-deletes, and retention purges.
CostPer-query cost estimate, total spend by day/week/month, and cost trend per project. Outlier queries are flagged for inspection.
Quota usageCurrent month’s plan-quota usage with projection to month-end. The Billing dashboard mirrors this with finer-grained breakdowns.
Background healthPer-category dead-letter counters, webhook delivery success rate, integration sync success rate, and last-completed timestamps for retention/tier sweeps.

Per-Query Tracing

For every recall query, the platform captures a per-stage timing breakdown so you can see exactly where latency goes. The breakdown is visible from the API Logs page and from the Explainability views:

  • Stage timings. Time spent in embedding, vector search, ranking, reasoning, dedup, and response assembly. Each stage has a published budget; outliers are highlighted.
  • Request and trace IDs. Every request returns X-Request-ID and X-Trace-ID headers so you can correlate the entry in your own logs with the platform-side entry.
  • Degraded-component list. When the recall pipeline runs in degraded mode (a non-critical component skipped), the trace shows which components were bypassed and why.
  • Rolling aggregates. Beyond individual traces, the dashboard rolls up p50/p95/p99 per stage across configurable windows so you can spot a stage that has slowly gotten heavier.

Alerts

The alerting layer evaluates your tenant’s metrics continuously and notifies you when something crosses a threshold. Alerts ship with sensible defaults and are fully configurable from the dashboard:

AlertDefaultSeverityTrigger
Latency p952,000 msWarningPer-endpoint p95 exceeds the threshold for the configured window.
Per-query cost$0.10WarningA single query exceeds the per-query cost budget.
Error rate10%CriticalRolling-window error rate exceeds the threshold (minimum sample size required).
Recall quality0.30 MRRWarningRolling-window recall quality drops below the threshold.
Quota usage75% / 90% / 100%Info → CriticalPlan-quota usage crosses one of the configured thresholds. The 100% alert fires the moment silent degradation begins.
Dead-letter risingAny growthWarningAny background category’s dead-letter counter increases over the previous window.
Recommended baseline
At minimum, enable the latency p95, error rate, and 75% / 90% quota-usage alerts. These three catch the vast majority of production-impacting events before users feel them.

Alert Delivery Channels

Alerts can be routed to multiple destinations so they show up in the channels your team already uses:

  • Email. Alerts are delivered to the email addresses configured under Settings → Alerts. Each alert email includes the metric, current value, threshold, and a link back to the dashboard.
  • Slack. Connect a Slack workspace once; route different alert categories to different channels (e.g. #oncall-critical for criticals, #oncall-info for warnings).
  • Webhook. Outgoing webhook deliveries fire when alerts trigger, signed with your webhook secret. Use this to feed alerts into your own paging system or status page.

Forwarding to Your Stack

If you already operate a centralized observability stack, the platform forwards the per-tenant signals directly to it instead of asking you to scrape something:

  • Audit & API logs. Outbound SIEM forwarders push every audit and API log entry for your tenant to a configured HTTPS endpoint at a fixed cadence. Delivery health is visible in the dashboard.
  • Webhook events. Every notable platform event (memory created, DSR completed, alert triggered, background sweep finished) can fire a webhook. Receivers verify the signature with the shared secret.
  • Reusable correlation IDs. Every API response carries X-Request-ID and X-Trace-ID headers. Pass these through to your downstream logs and you can correlate any user-visible incident with the matching dashboard entry in seconds.

No agent, sidecar, or scrape endpoint is required — the platform pushes to you on cadence, and you pull dashboard views from the UI on demand.