Billing

Common Billing Mistakes

These are the most common misunderstandings developers encounter when integrating with MemorySync’s billing system. Each mistake is documented with what actually happens behind the scenes and how to avoid the issue.

Assuming Downgrade Is Immediate

The mistake

A developer downgrades from Pro to Starter and immediately expects Starter limits to be in effect.

What actually happens: Downgrades are scheduled, not immediate. When you downgrade, the new plan is queued for the next cycle but your current plan, limits, and cycle are completely unchanged until rollover.

Why: You have already paid for the current cycle at the higher tier. The system honors that payment by keeping your Pro limits until the period expires.
How to verify: After scheduling a downgrade, the dashboard shows your active plan as Pro with a pending Starter downgrade noted on the billing page. The transition happens at the end of the current cycle.
Contrast with upgrades: Upgrades are immediate because you are paying for more. The asymmetry is intentional.

Confusing Rate Limits with Billing Limits

The mistake

A developer sees their requests being limited and assumes they’ve exhausted their plan quota, when they’ve actually hit a per-IP rate limit (or vice versa).

These are two completely separate systems:

Rate limits are per-IP or per-route, measured in requests-per-second or per-minute, and always return 429 with a Retry-After header. They prevent abuse and protect infrastructure. They have nothing to do with your plan.
Billing limits are per-organization and per-metric, measured against monthly plan quotas. They silently degrade your requests — returning 200 OK with empty results and no stored writes. No error is surfaced to your application.
How to tell the difference: Rate-limit 429s include Retry-After and resolve within seconds. Billing-quota degradation returns 200 OK with silently skipped operations and persists until cycle reset or upgrade. Watch your usage dashboard — metrics at exactly 100% of their plan limit are the signal.

Expecting Background Jobs to Count as Usage

The mistake

A developer monitors their usage counter and notices it’s lower than expected, not realizing that internal background operations don’t consume quota.

What actually happens: Only externally-initiated, successful API calls increment the usage counter. Internal system operations are completely free:

Memory summarization (compaction of old memories) — free.
Tier transition sweeps (moving memories between hot/warm/cold tiers) — free.
Retention enforcement (purging expired memories) — free.
SIEM forwarding (exporting audit events) — free.
Scheduled integration syncs (pulling from connected data sources) — free.
Intelligence re-evaluation (recalculating relevance scores) — free.
Session cleanup — free.

The usage counter is only advanced from API request handlers. Internal background work runs on a separate code path entirely and never touches the counter.

Not Handling Silent Degradation

The mistake

A developer’s application stops returning contextual memories, but they don’t notice because the API returns 200 OK with no error.

What actually happens: When you exceed your monthly quota, the API still returns 200 OK, but the underlying work is skipped. Add requests do not store, embed, or index the memory — the write pipeline is skipped entirely. Query requests return an empty results array.

Why this is dangerous: Your LLM continues working but loses its memory context. Responses become generic instead of personalized. Users may complain about quality without you realizing it’s a billing issue.
How to detect it: Watch the usage dashboard for any metric sitting at 100% of its plan limit, and configure usage alerts so you are notified the moment a metric crosses your alert threshold.
How to stay ahead: Set alert thresholds well below 100% (for example, 80%) so you can upgrade or shed load before silent degradation kicks in.

Ignoring Cycle Boundaries

The mistake

A developer assumes usage accumulates indefinitely, or that a burst near month-end will carry over into the next cycle.

What actually happens: Usage counters reset to zero at every cycle rollover — no exceptions. A burst of 9,000 add requests on the last day of your cycle disappears at rollover. Conversely, unused quota does not roll over either.

No carry-over. If you use 3,000 of 10,000 adds this month, you do not get 17,000 next month. Every cycle starts at zero.
No carry-debt. If you hit your limit on day 20, that does not reduce next month’s quota. You get the full plan allowance fresh.
Rollover timing. The new cycle window is anchored to the end of the previous cycle, not to the moment the rollover happens. If your cycle ends at 3:00 AM UTC and the first request after that arrives at 9:00 AM UTC, the new cycle still starts at 3:00 AM — the 6-hour gap does not shift the window forward.

Assuming Payment Failure Preserves the Plan

The mistake

A developer assumes that if their card fails, they’ll get a grace period to fix it while keeping their current plan limits.

What actually happens: It depends on the failure type:

Autopay (subscription renewal) failure: Immediate downgrade to Free. No grace period. Counters reset. The customer is on Free limits from the moment the failure is processed. The paid subscription itself is preserved so a successful retry can restore the plan automatically.
Manual/one-off payment failure: Plan is completely untouched. No downgrade, no counter reset. The failed event is just recorded.
Why no grace period for autopay: The cycle has ended and the customer has not paid for the next one. Continuing to provide paid service without payment would create unbounded financial exposure. The immediate downgrade is strict but fair — the system preserves the subscription so automatic retries can restore service.

Not Monitoring Budget Utilization

The mistake

A developer sets a monthly budget but doesn’t monitor utilization, then is surprised when intelligence features start degrading as the budget ceiling approaches.

What actually happens: Budgets do not stop work at a hard 100% cliff — they progressively constrain expensive intelligence work as utilization rises. If you are not watching the dashboard, you may notice quality changes before you notice the budget number.

Progressive throttling. Below your alert threshold, work runs normally. As utilization climbs above the threshold, expensive work runs in increasingly conservative modes — cheaper requests still run, but the most expensive intelligence work backs off.
Spend visibility. The dashboard shows current spend for the cycle and a linear projection to end of month, so you can see where you are heading before you hit the throttle tiers.
Independent of plan limits. Hitting your spending budget does not change your add and retrieval quotas, and vice versa.
How to stay ahead: Set the alert threshold conservatively and configure dashboard alerts so you are notified the moment your spend crosses it.

← Previous

Real Examples

Debug Toolkit