Monitoring & Observability
The MemorySync dashboard gives you per-tenant visibility into request volume, latency, error rate, recall quality, cost, and the health of every background job. The platform handles metric collection internally — you don’t scrape endpoints, install agents, or manage retention. This page describes the surfaces you actually use to operate against the platform.
Observability Surfaces
Three customer-facing surfaces give you everything you need to operate confidently in production:
| Surface | What it answers |
|---|---|
| Dashboard – Overview | "Is my tenant healthy right now?" High-level cards for request rate, latency p95, error rate, recall quality, cost, and quota usage. |
| Dashboard – API Logs | "Why did that specific request fail / spike?" Per-request detail with status, latency, request ID, trace ID, and grouped failure clusters. |
| Dashboard – Alerts | "What changed?" Threshold-based alerts on latency, error rate, recall quality, cost, and dead-letter counts, with delivery to email, Slack, or webhook. |
All three surfaces are scoped to your organization and respect your role-based access. There is no internal monitoring endpoint to expose, no agent to install, and no scrape target to maintain.
What Is Measured
Every customer-relevant signal is collected and aggregated automatically. The dashboard surfaces the following categories for each project and time window:
| Category | Signals |
|---|---|
| Traffic | Requests per minute, split by endpoint, status code, and API key. Includes 4xx vs 5xx breakdown and 429 counts. |
| Latency | Per-endpoint p50, p95, p99, and max. Pipeline stage timing breakdown for recall queries (embedding, retrieval, ranking, reasoning, assembly). |
| Recall quality | Per-query recall score, rolling-window quality average, and a per-project quality trend line. Drops below threshold are auto-flagged. |
| Memory lifecycle | Active memory count by tier, ingestion rate, dedup rate, compaction merges, tier transitions, soft-deletes, and retention purges. |
| Cost | Per-query cost estimate, total spend by day/week/month, and cost trend per project. Outlier queries are flagged for inspection. |
| Quota usage | Current month’s plan-quota usage with projection to month-end. The Billing dashboard mirrors this with finer-grained breakdowns. |
| Background health | Per-category dead-letter counters, webhook delivery success rate, integration sync success rate, and last-completed timestamps for retention/tier sweeps. |
Per-Query Tracing
For every recall query, the platform captures a per-stage timing breakdown so you can see exactly where latency goes. The breakdown is visible from the API Logs page and from the Explainability views:
- Stage timings. Time spent in embedding, vector search, ranking, reasoning, dedup, and response assembly. Each stage has a published budget; outliers are highlighted.
- Request and trace IDs. Every request returns
X-Request-IDandX-Trace-IDheaders so you can correlate the entry in your own logs with the platform-side entry. - Degraded-component list. When the recall pipeline runs in degraded mode (a non-critical component skipped), the trace shows which components were bypassed and why.
- Rolling aggregates. Beyond individual traces, the dashboard rolls up p50/p95/p99 per stage across configurable windows so you can spot a stage that has slowly gotten heavier.
Alerts
The alerting layer evaluates your tenant’s metrics continuously and notifies you when something crosses a threshold. Alerts ship with sensible defaults and are fully configurable from the dashboard:
| Alert | Default | Severity | Trigger |
|---|---|---|---|
| Latency p95 | 2,000 ms | Warning | Per-endpoint p95 exceeds the threshold for the configured window. |
| Per-query cost | $0.10 | Warning | A single query exceeds the per-query cost budget. |
| Error rate | 10% | Critical | Rolling-window error rate exceeds the threshold (minimum sample size required). |
| Recall quality | 0.30 MRR | Warning | Rolling-window recall quality drops below the threshold. |
| Quota usage | 75% / 90% / 100% | Info → Critical | Plan-quota usage crosses one of the configured thresholds. The 100% alert fires the moment silent degradation begins. |
| Dead-letter rising | Any growth | Warning | Any background category’s dead-letter counter increases over the previous window. |
Alert Delivery Channels
Alerts can be routed to multiple destinations so they show up in the channels your team already uses:
- Email. Alerts are delivered to the email addresses configured under Settings → Alerts. Each alert email includes the metric, current value, threshold, and a link back to the dashboard.
- Slack. Connect a Slack workspace once; route different alert categories to different channels (e.g.
#oncall-criticalfor criticals,#oncall-infofor warnings). - Webhook. Outgoing webhook deliveries fire when alerts trigger, signed with your webhook secret. Use this to feed alerts into your own paging system or status page.
Forwarding to Your Stack
If you already operate a centralized observability stack, the platform forwards the per-tenant signals directly to it instead of asking you to scrape something:
- Audit & API logs. Outbound SIEM forwarders push every audit and API log entry for your tenant to a configured HTTPS endpoint at a fixed cadence. Delivery health is visible in the dashboard.
- Webhook events. Every notable platform event (memory created, DSR completed, alert triggered, background sweep finished) can fire a webhook. Receivers verify the signature with the shared secret.
- Reusable correlation IDs. Every API response carries
X-Request-IDandX-Trace-IDheaders. Pass these through to your downstream logs and you can correlate any user-visible incident with the matching dashboard entry in seconds.
No agent, sidecar, or scrape endpoint is required — the platform pushes to you on cadence, and you pull dashboard views from the UI on demand.