How It Works
Caching Layers
The cache is an optional accelerant — the API works correctly without it, just slower. This page describes what gets cached, how long, what happens when the cache is unavailable, and the rules you must follow before caching anything new.
The rule, before anything else
Cache only what you can recompute
Nothing in the cache is a source of truth. Every cache key has a loader that reads from the primary database. The cache can be wiped at any moment without data loss — only latency changes.
What actually gets cached
| Key shape | What it stores | Default TTL |
|---|---|---|
cache:project:{project_id} | Project row + computed counters used on the dashboard. | 5 minutes |
cache:principal:{credential_hash} | Resolved auth principal — tenant, role, project scope. Lets the auth dependency skip the database on hot routes. | 1 minute |
cache:tenant:{tenant_id} | Tenant row + plan + quota window. | 5 minutes |
cache:embedding:{sha256(text)} | Vector for an exact-text input. Stops repeated POST /memory/add with the same string from re-spending tokens. | Long (effectively until evicted) |
cache:query:{tenant}:{user}:{hash(query+filters)} | Top-K result for an identical query. Bypassed when tier or session_id filters change. | 5 minutes |
The read-through pattern, in one paragraph
Every cached read goes through cache.get_or_load(key, loader). The cache is checked first; on miss, the loader runs against the primary database and the result is written back. There is no manual set in handler code — that prevents drift between two callers writing different values for the same key.
How invalidation works
- Explicit invalidation on writes: every mutating service (create / update / delete on Project, Tenant, Memory) calls
cache.delete(key)for the rows it touched. The next read repopulates. - TTL backstop: every key carries a TTL even if the writer forgets to invalidate. Worst case, a stale read survives one TTL window — never longer.
- Version bump on schema change: cache keys are prefixed with a build version; a deploy that changes a row shape invalidates the entire space without a manual flush.
The failure mode you should rely on
- 1Cache is unreachable (network, restart, transient timeout).
- 2The client's socket timeout fires within 0.5 s.
- 3The handler logs a degraded-mode line (
cache.unavailable) and falls through to the loader. - 4Subsequent requests bypass the cache entirely until a periodic reconnect succeeds. The API stays up; latency rises while the cache is out.
What you must <em>not</em> cache
- Anything you cannot rebuild from the primary database — counters, billing, or anything mutated mid-flight.
- Anything end-user-specific that does not include the user id in the key — risk of cross-user leakage.
- Anything larger than ~256 KB. The cache is sized for hot small lookups, not blobs.
How to tell if the cache is actually helping
- Cache hit ratio per key prefix is exposed as a metric. A prefix below ~70% hit usually means the TTL is too short or the keying is wrong.
- Median request latency for cached routes drops by 30–60% when the cache is healthy. If your latency does not drop after a cache deploy, the route is not actually cached — check the loader for a missing wrap.