How It Works

Caching Layers

The cache is an optional accelerant — the API works correctly without it, just slower. This page describes what gets cached, how long, what happens when the cache is unavailable, and the rules you must follow before caching anything new.

The rule, before anything else

Cache only what you can recompute

Nothing in the cache is a source of truth. Every cache key has a loader that reads from the primary database. The cache can be wiped at any moment without data loss — only latency changes.

What actually gets cached

Key shape	What it stores	Default TTL
`cache:project:{project_id}`	Project row + computed counters used on the dashboard.	5 minutes
`cache:principal:{credential_hash}`	Resolved auth principal — tenant, role, project scope. Lets the auth dependency skip the database on hot routes.	1 minute
`cache:tenant:{tenant_id}`	Tenant row + plan + quota window.	5 minutes
`cache:embedding:{sha256(text)}`	Vector for an exact-text input. Stops repeated `POST /memory/add` with the same string from re-spending tokens.	Long (effectively until evicted)
`cache:query:{tenant}:{user}:{hash(query+filters)}`	Top-K result for an identical query. Bypassed when `tier` or `session_id` filters change.	5 minutes

The read-through pattern, in one paragraph

Every cached read goes through cache.get_or_load(key, loader). The cache is checked first; on miss, the loader runs against the primary database and the result is written back. There is no manual set in handler code — that prevents drift between two callers writing different values for the same key.

How invalidation works

Explicit invalidation on writes: every mutating service (create / update / delete on Project, Tenant, Memory) calls cache.delete(key) for the rows it touched. The next read repopulates.
TTL backstop: every key carries a TTL even if the writer forgets to invalidate. Worst case, a stale read survives one TTL window — never longer.
Version bump on schema change: cache keys are prefixed with a build version; a deploy that changes a row shape invalidates the entire space without a manual flush.

The failure mode you should rely on

1Cache is unreachable (network, restart, transient timeout).
2The client's socket timeout fires within 0.5 s.
3The handler logs a degraded-mode line (cache.unavailable) and falls through to the loader.
4Subsequent requests bypass the cache entirely until a periodic reconnect succeeds. The API stays up; latency rises while the cache is out.

What you must not cache

Anything you cannot rebuild from the primary database — counters, billing, or anything mutated mid-flight.
Anything end-user-specific that does not include the user id in the key — risk of cross-user leakage.
Anything larger than ~256 KB. The cache is sized for hot small lookups, not blobs.

How to tell if the cache is actually helping

Cache hit ratio per key prefix is exposed as a metric. A prefix below ~70% hit usually means the TTL is too short or the keying is wrong.
Median request latency for cached routes drops by 30–60% when the cache is healthy. If your latency does not drop after a cache deploy, the route is not actually cached — check the loader for a missing wrap.

← Previous

Background Jobs

Failure Modes & Retries