GitHub
The GitHub integration syncs repositories, issues, pull requests, commits, and discussions into MemorySync as queryable memories. It uses the GitHub REST API with OAuth 2.0 authentication and supports incremental sync to avoid re-fetching unchanged data.
What gets synced
The GitHub provider fetches the following object types, each becoming a separate memory:
| Type | Content |
|---|---|
repository | Repository description + README contents. Metadata includes full_name, language, stars, forks, topics, default_branch. |
issue | Issue body text. Both open and closed issues are fetched. Metadata: number, state, labels, assignees, milestone. |
pull_request | PR body text. Metadata: number, state, base_branch, head_branch, draft, merged, additions, deletions, changed_files. |
commit | Commit message. Up to 50 recent commits per repository. Metadata: sha, author, verified, stats (additions, deletions, total). |
pr_comment | Review comments on pull requests. |
discussion | GitHub Discussions threads. |
Required OAuth scopes
| Scope | Why it's needed |
|---|---|
repo | Read access to private repositories, issues, pull requests, and commits. Use public_repo if you only want public repos. |
read:org | Read organization membership to list repos the user has access to. |
read:discussion | Read access to GitHub Discussions. |
Setup guide
- 1Open the dashboard — navigate to Dashboard → Integrations → GitHub.
- 2Click Connect — you will be redirected to GitHub's OAuth consent screen. MemorySync already has the GitHub app registered, so no provider-side configuration is required on your side.
- 3Authorize the requested scopes — review the permissions GitHub displays and approve. You will be redirected back to the dashboard.
- 4Select repositories — choose which repositories to sync. The initial sync starts immediately and fetches all selected content.
Incremental sync
After the initial sync, the GitHub provider uses smart incremental strategies to minimize API calls:
- Repository-level skip — each repo's
updated_attimestamp is compared to the last sync time. Repos that haven't changed are skipped entirely. - Issues — uses the
sinceparameter to fetch only issues updated after the last sync. - Pull requests — sorted by
updateddescending with a cutoff date. Once a PR older than the last sync is encountered, pagination stops. - Commits — limited to the most recent 50 per repository per sync cycle.
- Pagination safety — a hard limit of 10 pages per API call prevents runaway pagination on very large repositories.
Memory structure
Each synced object produces a memory with type-specific metadata. Here's what gets stored for each type:
| Type | Content | Key metadata fields |
|---|---|---|
| Repository | Description + README | full_name, language, stars, forks, topics |
| Issue | Issue body | repo, number, state, labels, assignees |
| Pull request | PR body | base_branch, head_branch, draft, merged, additions/deletions |
| Commit | Commit message | sha, author, verified, stats |
Webhook events
When a GitHub webhook is configured, the provider parses incoming events to create, update, or delete memories in real time:
issues.opened/issues.edited/issues.closed— creates or updates the corresponding issue memory.pull_request.opened/pull_request.edited/pull_request.closed/pull_request.merged— creates or updates the PR memory.discussion.*— handles discussion creation and updates.issues.deleted/pull_request.deleted— marks the memory for deletion.
The provider validates the webhook signature to ensure the payload is authentic before processing.
Rate limits & tips
- 5,000 requests per hour — GitHub enforces this limit for authenticated OAuth requests. MemorySync tracks
X-RateLimit-Remainingand pauses sync when approaching the limit. - Pagination capped at 10 pages — prevents runaway API consumption on very large repos with thousands of issues.
- Empty repos return HTTP 409 — the provider handles this gracefully and skips the repository without failing the sync.
- Bot messages are skipped — commit authors and issue creators that are bots are excluded from sync to reduce noise.
- Sync is one-way — MemorySync never writes back to GitHub. All data flows from GitHub into MemorySync.