Rate Limit Issues
MemorySync enforces rate limits at three time granularities. This page explains when the system throttles you, what the limits are for each tier, and how to read the response headers to understand exactly what happened.
How Rate Limiting Works
Rate limiting is enforced at three independent layers. A request must pass all three to proceed — if any single layer is exceeded, the request is rejected with 429:
| Layer | Window | Purpose |
|---|---|---|
per_second | 1 second | Burst control — prevents rapid-fire requests from overwhelming the system |
per_minute | 60 seconds | Sustained throughput — controls average request rate |
per_hour | 3600 seconds | Budget guard — prevents runaway scripts from exhausting your hourly allocation |
Smoothing: Each layer uses a sliding-window counter, so the boundary between two windows does not allow a sudden burst. The result is steady, predictable enforcement without long-tail memory overhead.
All three layers are checked atomically — there is no race window where you pass the second-level check but get rejected by the minute-level check a moment later.
Rate Limit Tiers
API keys are assigned a rate-limit tier that determines their per-second, per-minute, and per-hour limits:
| Tier | Per Second | Per Minute | Per Hour |
|---|---|---|---|
| Free | 2 | 30 | 100 |
| Pro | 10 | 200 | 5,000 |
| Enterprise | 50 | 1,000 | 50,000 |
| Unlimited | 1,000 | 60,000 | 3,600,000 |
Custom limits: Enterprise keys may also have custom per-second / per-minute / per-hour limits independent of the tier presets, configured per agreement.
Default Configs by Endpoint Type
When a request doesn't have an API key (or the key lookup fails), the system applies default rate limits based on the endpoint type:
| Endpoint Type | Per Second | Per Minute | Per Hour | Identifier |
|---|---|---|---|---|
| Default (unauthenticated) | 10 | 200 | 2,000 | Per IP |
| Dashboard read-only | 20 | 500 | 5,000 | Per signed-in user |
| SCIM provisioning | 30 | 1,000 | 50,000 | Per provisioning credential |
| Documentation / Health | 5 | 60 | 600 | Per IP |
Identifier priority: The rate-limit identifier is resolved in order — (1) API key, (2) authenticated user, (3) IP — so two different API keys from the same IP have independent rate-limit budgets.
Response Headers
Every response (including successful ones) includes rate-limit headers so you can monitor your usage proactively:
| Header | Meaning |
|---|---|
X-RateLimit-Limit | Maximum requests allowed in the tightest (most-constrained) window |
X-RateLimit-Remaining | Requests remaining in the tightest window before you get throttled |
X-RateLimit-Reset | Unix timestamp when the tightest window resets |
Retry-After | Seconds to wait before retrying (only present on 429 responses) |
Additionally, per-layer detail headers are included for full observability:
X-RateLimit-Per-Second-Limit: 10
X-RateLimit-Per-Second-Remaining: 7
X-RateLimit-Per-Second-Reset: 1715265601
X-RateLimit-Per-Minute-Limit: 200
X-RateLimit-Per-Minute-Remaining: 142
X-RateLimit-Per-Minute-Reset: 1715265660
X-RateLimit-Per-Hour-Limit: 5000
X-RateLimit-Per-Hour-Remaining: 4891
X-RateLimit-Per-Hour-Reset: 1715269200
✅ Pro tip: Monitor X-RateLimit-Per-Hour-Remaining in your application logs. When it drops below 10% of the limit, either slow down your request rate or consider upgrading your tier.
Rate Limit vs Billing Quota
Rate limits and billing quotas are two completely independent systems that protect different things. Confusing them is one of the most common debugging mistakes:
| Aspect | Rate Limiting | Billing Quota |
|---|---|---|
| What it limits | HTTP request velocity | Monthly resource consumption (memories, queries, tokens) |
| Scope | Per API key, per user, or per IP | Per organization (billing cycle) |
| Time window | Second / minute / hour (rolling) | Monthly billing cycle |
| Error when exceeded | 429 RATE_LIMIT_EXCEEDED | Silent mode — 200 OK with no memory stored or embedded; the pipeline is skipped |
| Recovery | Wait for the window to reset (seconds to hours) | Upgrade plan or wait for next billing cycle |
⚠️ Common mistake: "My queries return no results" is almost never a rate limit issue. Rate limits reject requests entirely with 429. If your request succeeds but returns empty or incomplete results, you're likely in billing silent mode or facing a retrieval pipeline issue — see Missing Memories or Silent Mode Explained.
Diagnosing Rate Limit Issues
Follow this checklist when you're getting 429s:
- Check
error.blocked_by— tells you which layer is the bottleneck."per_second"means bursting;"per_hour"means budget exhaustion. - Read
error.limits— shows your current per-second/per-minute/per-hour caps. Compare against the tier table above to confirm your tier. - Check the identifier — are you rate-limited by API key, user ID, or IP? If multiple services share one API key, they share one rate-limit budget.
- Implement exponential backoff — honor the
Retry-Afterheader. Tight retry loops will continue to receive 429s until the window resets. - Consider key-per-service — if you have multiple services calling MemorySync, give each one its own API key so they have independent rate-limit budgets.