Production

Monitoring & Observability

The MemorySync dashboard gives you per-tenant visibility into request volume, latency, error rate, recall quality, cost, and the health of every background job. The platform handles metric collection internally — you don’t scrape endpoints, install agents, or manage retention. This page describes the surfaces you actually use to operate against the platform.

Observability Surfaces

Three customer-facing surfaces give you everything you need to operate confidently in production:

Surface	What it answers
Dashboard – Overview	"Is my tenant healthy right now?" High-level cards for request rate, latency p95, error rate, recall quality, cost, and quota usage.
Dashboard – API Logs	"Why did that specific request fail / spike?" Per-request detail with status, latency, request ID, trace ID, and grouped failure clusters.
Dashboard – Alerts	"What changed?" Threshold-based alerts on latency, error rate, recall quality, cost, and dead-letter counts, with delivery to email, Slack, or webhook.

All three surfaces are scoped to your organization and respect your role-based access. There is no internal monitoring endpoint to expose, no agent to install, and no scrape target to maintain.

What Is Measured

Every customer-relevant signal is collected and aggregated automatically. The dashboard surfaces the following categories for each project and time window:

Category	Signals
Traffic	Requests per minute, split by endpoint, status code, and API key. Includes 4xx vs 5xx breakdown and 429 counts.
Latency	Per-endpoint p50, p95, p99, and max. Pipeline stage timing breakdown for recall queries (embedding, retrieval, ranking, reasoning, assembly).
Recall quality	Per-query recall score, rolling-window quality average, and a per-project quality trend line. Drops below threshold are auto-flagged.
Memory lifecycle	Active memory count by tier, ingestion rate, dedup rate, compaction merges, tier transitions, soft-deletes, and retention purges.
Cost	Per-query cost estimate, total spend by day/week/month, and cost trend per project. Outlier queries are flagged for inspection.
Quota usage	Current month’s plan-quota usage with projection to month-end. The Billing dashboard mirrors this with finer-grained breakdowns.
Background health	Per-category dead-letter counters, webhook delivery success rate, integration sync success rate, and last-completed timestamps for retention/tier sweeps.

Per-Query Tracing

For every recall query, the platform captures a per-stage timing breakdown so you can see exactly where latency goes. The breakdown is visible from the API Logs page and from the Explainability views:

Stage timings. Time spent in embedding, vector search, ranking, reasoning, dedup, and response assembly. Each stage has a published budget; outliers are highlighted.
Request and trace IDs. Every request returns X-Request-ID and X-Trace-ID headers so you can correlate the entry in your own logs with the platform-side entry.
Degraded-component list. When the recall pipeline runs in degraded mode (a non-critical component skipped), the trace shows which components were bypassed and why.
Rolling aggregates. Beyond individual traces, the dashboard rolls up p50/p95/p99 per stage across configurable windows so you can spot a stage that has slowly gotten heavier.

Alerts

The alerting layer evaluates your tenant’s metrics continuously and notifies you when something crosses a threshold. Alerts ship with sensible defaults and are fully configurable from the dashboard:

Alert	Default	Severity	Trigger
Latency p95	2,000 ms	Warning	Per-endpoint p95 exceeds the threshold for the configured window.
Per-query cost	$0.10	Warning	A single query exceeds the per-query cost budget.
Error rate	10%	Critical	Rolling-window error rate exceeds the threshold (minimum sample size required).
Recall quality	0.30 MRR	Warning	Rolling-window recall quality drops below the threshold.
Quota usage	75% / 90% / 100%	Info → Critical	Plan-quota usage crosses one of the configured thresholds. The 100% alert fires the moment silent degradation begins.
Dead-letter rising	Any growth	Warning	Any background category’s dead-letter counter increases over the previous window.

Recommended baseline

At minimum, enable the latency p95, error rate, and 75% / 90% quota-usage alerts. These three catch the vast majority of production-impacting events before users feel them.

Alert Delivery Channels

Alerts can be routed to multiple destinations so they show up in the channels your team already uses:

Email. Alerts are delivered to the email addresses configured under Settings → Alerts. Each alert email includes the metric, current value, threshold, and a link back to the dashboard.
Slack. Connect a Slack workspace once; route different alert categories to different channels (e.g. #oncall-critical for criticals, #oncall-info for warnings).
Webhook. Outgoing webhook deliveries fire when alerts trigger, signed with your webhook secret. Use this to feed alerts into your own paging system or status page.

Forwarding to Your Stack

If you already operate a centralized observability stack, the platform forwards the per-tenant signals directly to it instead of asking you to scrape something:

Audit & API logs. Outbound SIEM forwarders push every audit and API log entry for your tenant to a configured HTTPS endpoint at a fixed cadence. Delivery health is visible in the dashboard.
Webhook events. Every notable platform event (memory created, DSR completed, alert triggered, background sweep finished) can fire a webhook. Receivers verify the signature with the shared secret.
Reusable correlation IDs. Every API response carries X-Request-ID and X-Trace-ID headers. Pass these through to your downstream logs and you can correlate any user-visible incident with the matching dashboard entry in seconds.

No agent, sidecar, or scrape endpoint is required — the platform pushes to you on cadence, and you pull dashboard views from the UI on demand.

← Previous

Worker Tuning

Logging