Debugging

Rate Limit Issues

MemorySync enforces rate limits at three time granularities. This page explains when the system throttles you, what the limits are for each tier, and how to read the response headers to understand exactly what happened.

How Rate Limiting Works

Rate limiting is enforced at three independent layers. A request must pass all three to proceed — if any single layer is exceeded, the request is rejected with 429:

Layer	Window	Purpose
`per_second`	1 second	Burst control — prevents rapid-fire requests from overwhelming the system
`per_minute`	60 seconds	Sustained throughput — controls average request rate
`per_hour`	3600 seconds	Budget guard — prevents runaway scripts from exhausting your hourly allocation

Smoothing: Each layer uses a sliding-window counter, so the boundary between two windows does not allow a sudden burst. The result is steady, predictable enforcement without long-tail memory overhead.

All three layers are checked atomically — there is no race window where you pass the second-level check but get rejected by the minute-level check a moment later.

Rate Limit Tiers

API keys are assigned a rate-limit tier that determines their per-second, per-minute, and per-hour limits:

Tier	Per Second	Per Minute	Per Hour
Free	2	30	100
Pro	10	200	5,000
Enterprise	50	1,000	50,000
Unlimited	1,000	60,000	3,600,000

Custom limits: Enterprise keys may also have custom per-second / per-minute / per-hour limits independent of the tier presets, configured per agreement.

Default Configs by Endpoint Type

When a request doesn't have an API key (or the key lookup fails), the system applies default rate limits based on the endpoint type:

Endpoint Type	Per Second	Per Minute	Per Hour	Identifier
Default (unauthenticated)	10	200	2,000	Per IP
Dashboard read-only	20	500	5,000	Per signed-in user
SCIM provisioning	30	1,000	50,000	Per provisioning credential
Documentation / Health	5	60	600	Per IP

Identifier priority: The rate-limit identifier is resolved in order — (1) API key, (2) authenticated user, (3) IP — so two different API keys from the same IP have independent rate-limit budgets.

Response Headers

Every response (including successful ones) includes rate-limit headers so you can monitor your usage proactively:

Header	Meaning
`X-RateLimit-Limit`	Maximum requests allowed in the tightest (most-constrained) window
`X-RateLimit-Remaining`	Requests remaining in the tightest window before you get throttled
`X-RateLimit-Reset`	Unix timestamp when the tightest window resets
`Retry-After`	Seconds to wait before retrying (only present on 429 responses)

Additionally, per-layer detail headers are included for full observability:

X-RateLimit-Per-Second-Limit: 10
X-RateLimit-Per-Second-Remaining: 7
X-RateLimit-Per-Second-Reset: 1715265601
X-RateLimit-Per-Minute-Limit: 200
X-RateLimit-Per-Minute-Remaining: 142
X-RateLimit-Per-Minute-Reset: 1715265660
X-RateLimit-Per-Hour-Limit: 5000
X-RateLimit-Per-Hour-Remaining: 4891
X-RateLimit-Per-Hour-Reset: 1715269200

✅ Pro tip: Monitor X-RateLimit-Per-Hour-Remaining in your application logs. When it drops below 10% of the limit, either slow down your request rate or consider upgrading your tier.

Rate Limit vs Billing Quota

Rate limits and billing quotas are two completely independent systems that protect different things. Confusing them is one of the most common debugging mistakes:

Aspect	Rate Limiting	Billing Quota
What it limits	HTTP request velocity	Monthly resource consumption (memories, queries, tokens)
Scope	Per API key, per user, or per IP	Per organization (billing cycle)
Time window	Second / minute / hour (rolling)	Monthly billing cycle
Error when exceeded	`429 RATE_LIMIT_EXCEEDED`	Silent mode — `200 OK` with no memory stored or embedded; the pipeline is skipped
Recovery	Wait for the window to reset (seconds to hours)	Upgrade plan or wait for next billing cycle

⚠️ Common mistake: "My queries return no results" is almost never a rate limit issue. Rate limits reject requests entirely with 429. If your request succeeds but returns empty or incomplete results, you're likely in billing silent mode or facing a retrieval pipeline issue — see Missing Memories or Silent Mode Explained.

Diagnosing Rate Limit Issues

Follow this checklist when you're getting 429s:

Check error.blocked_by — tells you which layer is the bottleneck. "per_second" means bursting; "per_hour" means budget exhaustion.
Read error.limits — shows your current per-second/per-minute/per-hour caps. Compare against the tier table above to confirm your tier.
Check the identifier — are you rate-limited by API key, user ID, or IP? If multiple services share one API key, they share one rate-limit budget.
Implement exponential backoff — honor the Retry-After header. Tight retry loops will continue to receive 429s until the window resets.
Consider key-per-service — if you have multiple services calling MemorySync, give each one its own API key so they have independent rate-limit budgets.

← Previous

Authentication Issues

Slow Queries