MemorySyncMemorySync
Debugging

Rate Limit Issues

MemorySync enforces rate limits at three time granularities. This page explains when the system throttles you, what the limits are for each tier, and how to read the response headers to understand exactly what happened.

How Rate Limiting Works

Rate limiting is enforced at three independent layers. A request must pass all three to proceed — if any single layer is exceeded, the request is rejected with 429:

Layer Window Purpose
per_second1 secondBurst control — prevents rapid-fire requests from overwhelming the system
per_minute60 secondsSustained throughput — controls average request rate
per_hour3600 secondsBudget guard — prevents runaway scripts from exhausting your hourly allocation

Smoothing: Each layer uses a sliding-window counter, so the boundary between two windows does not allow a sudden burst. The result is steady, predictable enforcement without long-tail memory overhead.

All three layers are checked atomically — there is no race window where you pass the second-level check but get rejected by the minute-level check a moment later.

Rate Limit Tiers

API keys are assigned a rate-limit tier that determines their per-second, per-minute, and per-hour limits:

Tier Per Second Per Minute Per Hour
Free230100
Pro102005,000
Enterprise501,00050,000
Unlimited1,00060,0003,600,000

Custom limits: Enterprise keys may also have custom per-second / per-minute / per-hour limits independent of the tier presets, configured per agreement.

Default Configs by Endpoint Type

When a request doesn't have an API key (or the key lookup fails), the system applies default rate limits based on the endpoint type:

Endpoint Type Per Second Per Minute Per Hour Identifier
Default (unauthenticated)102002,000Per IP
Dashboard read-only205005,000Per signed-in user
SCIM provisioning301,00050,000Per provisioning credential
Documentation / Health560600Per IP

Identifier priority: The rate-limit identifier is resolved in order — (1) API key, (2) authenticated user, (3) IP — so two different API keys from the same IP have independent rate-limit budgets.

Response Headers

Every response (including successful ones) includes rate-limit headers so you can monitor your usage proactively:

Header Meaning
X-RateLimit-LimitMaximum requests allowed in the tightest (most-constrained) window
X-RateLimit-RemainingRequests remaining in the tightest window before you get throttled
X-RateLimit-ResetUnix timestamp when the tightest window resets
Retry-AfterSeconds to wait before retrying (only present on 429 responses)

Additionally, per-layer detail headers are included for full observability:

X-RateLimit-Per-Second-Limit: 10
X-RateLimit-Per-Second-Remaining: 7
X-RateLimit-Per-Second-Reset: 1715265601
X-RateLimit-Per-Minute-Limit: 200
X-RateLimit-Per-Minute-Remaining: 142
X-RateLimit-Per-Minute-Reset: 1715265660
X-RateLimit-Per-Hour-Limit: 5000
X-RateLimit-Per-Hour-Remaining: 4891
X-RateLimit-Per-Hour-Reset: 1715269200

✅ Pro tip: Monitor X-RateLimit-Per-Hour-Remaining in your application logs. When it drops below 10% of the limit, either slow down your request rate or consider upgrading your tier.

Rate Limit vs Billing Quota

Rate limits and billing quotas are two completely independent systems that protect different things. Confusing them is one of the most common debugging mistakes:

Aspect Rate Limiting Billing Quota
What it limitsHTTP request velocityMonthly resource consumption (memories, queries, tokens)
ScopePer API key, per user, or per IPPer organization (billing cycle)
Time windowSecond / minute / hour (rolling)Monthly billing cycle
Error when exceeded429 RATE_LIMIT_EXCEEDEDSilent mode — 200 OK with no memory stored or embedded; the pipeline is skipped
RecoveryWait for the window to reset (seconds to hours)Upgrade plan or wait for next billing cycle

⚠️ Common mistake: "My queries return no results" is almost never a rate limit issue. Rate limits reject requests entirely with 429. If your request succeeds but returns empty or incomplete results, you're likely in billing silent mode or facing a retrieval pipeline issue — see Missing Memories or Silent Mode Explained.

Diagnosing Rate Limit Issues

Follow this checklist when you're getting 429s:

  1. Check error.blocked_by — tells you which layer is the bottleneck. "per_second" means bursting; "per_hour" means budget exhaustion.
  2. Read error.limits — shows your current per-second/per-minute/per-hour caps. Compare against the tier table above to confirm your tier.
  3. Check the identifier — are you rate-limited by API key, user ID, or IP? If multiple services share one API key, they share one rate-limit budget.
  4. Implement exponential backoff — honor the Retry-After header. Tight retry loops will continue to receive 429s until the window resets.
  5. Consider key-per-service — if you have multiple services calling MemorySync, give each one its own API key so they have independent rate-limit budgets.