How It Works

System

MemorySync runs as a single async API server, a pool of background workers, and a scheduler — all sharing one database, one vector index, and one cache. There is no microservice fan-out. Every request enters the same middleware stack, hits the same service layer, and writes through the same session factory. This page lays out the runtime topology and what each piece is responsible for.

The three processes that make up the system

Process	What it runs	Triggered by
API server	Every HTTP route under `/memory/`, `/auth/`, `/org/`, `/admin/`, `/api/v1/`, `/api/v2/`. Async handlers, one event loop per worker.	Inbound HTTP requests.
Worker pool	Every async task: embedding, summarization, intelligence re-evaluation, integration sync, webhook delivery, SIEM forwarding, retention sweeps.	Task enqueue from the API or the scheduler.
Scheduler	A single dedicated process that fires recurring jobs at fixed cadences. Configured with `coalesce=True` and `max_instances=1` so missed runs collapse and overlap is impossible.	Wall clock.

What the API server actually does on every request

1Inbound request enters the middleware stack (CORS → request id → logging → CSRF → tenant resolver → project resolver → audit → security headers).
2Authentication dependency validates the credential — Bearer JWT or X-API-Key header — and attaches the principal.
3The route handler resolves the tenant scope and (if required by the route) the project scope.
4Service layer performs the business work: read or write to the database, read or write to the vector index, read or write the cache.
5If the operation needs deferred work (embedding, webhook fan-out, summarization, audit forwarding), the service enqueues a task and returns. The HTTP response does not wait.
6On the way out, the audit middleware records the request fingerprint and the security-headers middleware sets response headers.

What state lives where

Storage	What it holds	Source of truth?
Primary database	`memories`, `users`, `tenants`, `projects`, `memberships`, `api_keys`, `sessions`, `audit_logs`, `memory_events`, `webhook_deliveries`, `integration_jobs`, billing rows. ~50 tables.	Yes — every other store is derived.
Vector index	One namespace per user (`memorysync-user-{user_id}`). Each entry is a fixed-length float vector plus the metadata needed to filter at query time (`memory_id`, `tier`, `retention_status`, `environment`, `project_id`, `tags`, `source`).	No — rebuildable from `memories.text_ciphertext`.
Cache	Hot lookups (project metadata, tenant principals, recent query results). 5-minute default TTL. Graceful fallback if the cache is unavailable — the API stays up, requests just take a slow path.	No — fully transient.
Object store	Encrypted backups and SIEM dead-letter payloads.	No.

How tenancy is enforced — four layers, not one

Authentication layer resolves the principal and attaches a tenant_id that the request body cannot override.
Project layer enforces X-Project-ID on routes that require it (~40 routers explicitly opt out; everything memory-related opts in).
Service layer calls the tenant-scope resolver on every read and write — the database query is built with the resolved scope, not with the user-supplied body.
Vector layer uses a per-user namespace, so a query physically cannot return another user's vectors even if the metadata filter is wrong.

What happens when the API server boots

TEXT

1. API server initialisation begins
2. database schema initialised
3. embedding manager warmed (cache + client primed, 10s timeout per service)
4. vector store handle opened
5. webhook worker started as a background async task
6. scheduler started — recurring jobs registered
7. router stack mounted (~40 routers, project-resolution flags applied)
8. middleware stack composed (innermost → outermost)
9. server begins accepting requests

   on shutdown:
   1. webhook worker drains in-flight deliveries (30s grace)
   2. scheduler stops new firings
   3. database engine disposed

What this design buys you

One deployment — the API server, workers, and scheduler share a single image. No coordinated multi-service releases.
Strong consistency on the write path — a memory either commits to the database or the request fails; embedding happens before the response returns, so the next query will find it.
Graceful degradation — cache failure, vector-index failure, and embedding-service failure each have a documented fallback. The database is the only hard dependency.
Auditability — every mutating request writes one audit row, and every audit row can be forwarded to a customer-side SIEM.

What this design does not give you (read this before you scale)

It is not a horizontally-sharded database. Past the per-region capacity limit, you partition by tenant_id — that is operations work, not a config flag.
It is not multi-region active-active. The vector index and database are regional. Cross-region replication is async.
It is not a streaming system. Memory ingestion is request-scoped; if you need millions of writes per second, batch them through the integration sync flow, not POST /memory/add.

← Previous

Common Mistakes

Request Lifecycle