How It Works
Integration Sync Flow
Integration sync brings external data sources (issue trackers, document stores, identity providers) into MemorySync as memories. This page describes the lifecycle of one sync — discovery, execution, candidate extraction, promotion to memories, and failure handling.
The three sync types
| Type | When it runs | What it pulls |
|---|---|---|
| FULL | First sync after a connection is created, or operator-triggered. | Every object the connection's scope can see. |
| INCREMENTAL | Recurring sync, every cadence tick. | Only objects modified since the last successful job's cursor. |
| WEBHOOK | Inbound webhook from the external source. | Just the object referenced in the webhook payload. |
The cadence options
A connection picks one of:
REALTIME— discovered every 5 minutes by the scheduler.HOURLY— every hour on the hour.DAILY— once per day at the tenant's configured local time.WEEKLY— once per week.MANUAL— never on a schedule; only operator-triggered.
The discovery loop, every 5 minutes
- 1Scheduler ticks
process_scheduled_syncs. - 2Query connections where
next_scheduled_sync <= now()and there is no PENDING/RUNNING job already (idempotency). - 3For each due connection, create a new
SyncJobwith typeFULLorINCREMENTALand the cursor from the last successful job. - 4Cap the per-tenant in-flight count at 3 — extra connections wait their turn rather than dogpile the provider.
What one sync job actually does
- 1Mark the job
RUNNINGand record start timestamp. - 2Call the provider with the cursor and a page size.
- 3For each returned object: upsert an
ExternalObjectrow keyed by(connection_id, source_object_id). - 4Extract memory candidates from the object — one provider object can produce zero, one, or many memory candidates.
- 5Validate each candidate (length, semantic checks); promote validated candidates to
memoriesrows with metadata pointing back to theExternalObjectfor provenance. - 6Page until the provider says no more, or the per-job ceiling hits.
- 7On success: update the job status to
SUCCESS, advance the cursor, and setnext_scheduled_syncbased on cadence.
How failures recover
| Failure | Behaviour |
|---|---|
| Provider rate-limited. | Backoff: 60 s → 300 s → 900 s, max 3 retries. Cursor preserved between retries. |
| Provider returned a permanent error. | Job marked FAILURE, cursor preserved, SyncFailure row written with the error. Operator-triggered retry resumes from the cursor. |
| Worker died mid-job. | Job stays RUNNING past the staleness window; the next discovery tick promotes it to FAILURE and queues a retry. |
| Per-tenant cap hit. | Job is left PENDING and picked up the next tick — no error. |
Provenance — every memory points home
Every memory created by sync carries metadata fields external_object_id and source_object_id. A row in the dashboard's memory list can therefore be linked back to the exact provider object that produced it. When the provider object is updated and re-synced, the existing memory is updated in place rather than duplicated — the linkage is the dedup key.
What sync deliberately does <em>not</em> do
- It does not delete memories when the provider object is deleted, by default. Tenants can opt into hard deletion via a connection setting; the default is soft archival.
- It does not bypass the validator. A provider object with low-signal text is skipped just like an SDK call would be.
- It does not embed inside the sync transaction. Embedding happens during the candidate-promotion step — failure there means the memory commits in
pending_embeddingand a background task fills the vector in.