How It Works

Integration Sync Flow

Integration sync brings external data sources (issue trackers, document stores, identity providers) into MemorySync as memories. This page describes the lifecycle of one sync — discovery, execution, candidate extraction, promotion to memories, and failure handling.

The three sync types

Type	When it runs	What it pulls
FULL	First sync after a connection is created, or operator-triggered.	Every object the connection's scope can see.
INCREMENTAL	Recurring sync, every cadence tick.	Only objects modified since the last successful job's cursor.
WEBHOOK	Inbound webhook from the external source.	Just the object referenced in the webhook payload.

The cadence options

A connection picks one of:

REALTIME — discovered every 5 minutes by the scheduler.
HOURLY — every hour on the hour.
DAILY — once per day at the tenant's configured local time.
WEEKLY — once per week.
MANUAL — never on a schedule; only operator-triggered.

The discovery loop, every 5 minutes

1Scheduler ticks process_scheduled_syncs.
2Query connections where next_scheduled_sync <= now() and there is no PENDING/RUNNING job already (idempotency).
3For each due connection, create a new SyncJob with type FULL or INCREMENTAL and the cursor from the last successful job.
4Cap the per-tenant in-flight count at 3 — extra connections wait their turn rather than dogpile the provider.

What one sync job actually does

1Mark the job RUNNING and record start timestamp.
2Call the provider with the cursor and a page size.
3For each returned object: upsert an ExternalObject row keyed by (connection_id, source_object_id).
4Extract memory candidates from the object — one provider object can produce zero, one, or many memory candidates.
5Validate each candidate (length, semantic checks); promote validated candidates to memories rows with metadata pointing back to the ExternalObject for provenance.
6Page until the provider says no more, or the per-job ceiling hits.
7On success: update the job status to SUCCESS, advance the cursor, and set next_scheduled_sync based on cadence.

How failures recover

Failure	Behaviour
Provider rate-limited.	Backoff: 60 s → 300 s → 900 s, max 3 retries. Cursor preserved between retries.
Provider returned a permanent error.	Job marked `FAILURE`, cursor preserved, `SyncFailure` row written with the error. Operator-triggered retry resumes from the cursor.
Worker died mid-job.	Job stays `RUNNING` past the staleness window; the next discovery tick promotes it to `FAILURE` and queues a retry.
Per-tenant cap hit.	Job is left `PENDING` and picked up the next tick — no error.

Provenance — every memory points home

Every memory created by sync carries metadata fields external_object_id and source_object_id. A row in the dashboard's memory list can therefore be linked back to the exact provider object that produced it. When the provider object is updated and re-synced, the existing memory is updated in place rather than duplicated — the linkage is the dedup key.

What sync deliberately does not do

It does not delete memories when the provider object is deleted, by default. Tenants can opt into hard deletion via a connection setting; the default is soft archival.
It does not bypass the validator. A provider object with low-signal text is skipped just like an SDK call would be.
It does not embed inside the sync transaction. Embedding happens during the candidate-promotion step — failure there means the memory commits in pending_embedding and a background task fills the vector in.

← Previous

Re-evaluation Flow

Webhook Dispatch