Recrawl (Force Fresh GitHub Fetch)
Recrawl tells the crawler to throw away its “this file is already indexed” tracking and fetch repos and docs from GitHub again. Use it when content is gone from Meilisearch and PostgreSQL doesn’t have it either.
The crawler ships doc content (markdown files in your repos) directly to Meilisearch. PostgreSQL only stores crawl-tracking metadata: which file was last seen, what its SHA was, whether it’s already indexed. So if Meilisearch loses data, Backfill can rebuild entities, repositories, K8s workloads, and molds from PostgreSQL — but docs are gone. Recrawl is the recovery path for docs.
When to Use
Section titled “When to Use”- Meilisearch was wiped or corrupted, and you need docs back
- Doc content in search is stale or missing entire repos
- After a Meilisearch upgrade where you didn’t import a dump
If only metadata is missing (entities, repositories, etc.), use Backfill — it’s faster and doesn’t hit the GitHub API.
What It Does
Section titled “What It Does”The endpoint runs three steps:
- Sets
document_crawl_status.indexed = falsefor the caller’s tenant. The crawler short-circuits onindexed = truerows, so this unblocks doc re-indexing. - Sets
repository_crawl_status.next_crawl_at = NOW()andstatus = 'pending'so the scheduler picks the repos up immediately. - Calls the crawler over gRPC to start a doc crawl right away. Without this kick the next scheduler tick can be up to an hour out.
Both DB updates run in a single transaction under the caller’s tenant context (RLS-scoped). The crawler runs the actual fetch async.
Endpoint
Section titled “Endpoint”Trigger Recrawl (Whole Tenant)
Section titled “Trigger Recrawl (Whole Tenant)”curl -X POST https://your-domain/api/v1/admin/recrawl \ -H "Authorization: Bearer <admin-token>" \ -H "Content-Type: application/json"Response on success:
{ "message": "Recrawl started", "scope": "tenant", "reset_documents": 7629, "reset_repositories": 57, "operation_id": "recrawl-f6b08153-..."}reset_documents counts rows where indexed was flipped from true to false. On a second call right after the first, this number is 0 because nothing else needs flipping.
Trigger Recrawl (Specific Repos)
Section titled “Trigger Recrawl (Specific Repos)”curl -X POST https://your-domain/api/v1/admin/recrawl \ -H "Authorization: Bearer <admin-token>" \ -H "Content-Type: application/json" \ -d '{"repositories": ["adaptive-labs/demo-1", "adaptive-labs/docs-site"]}'Response:
{ "message": "Recrawl started", "scope": "filtered", "repositories": ["adaptive-labs/demo-1", "adaptive-labs/docs-site"], "reset_documents": 412, "reset_repositories": 2, "operation_id": "recrawl-..."}Repository names use owner/repo form. Validation enforces a permissive regex (letters, digits, dots, hyphens, underscores) and rejects anything else with 400 INVALID_REQUEST.
Limits
Section titled “Limits”| Limit | Value |
|---|---|
| Max request body size | 64 KiB |
| Max repositories per call | 500 |
| Max repository name length | 255 chars |
If you need to recrawl more than 500 repos, omit the repositories field — that resets every repo in the caller’s tenant in one call.
Response Codes
Section titled “Response Codes”| Code | Meaning |
|---|---|
202 | Reset committed. Crawler triggered (or queued for the next tick if the trigger failed). |
400 | Invalid body, invalid repo name, or repo cap exceeded. |
401 | No tenant context. The token is missing or malformed. |
403 | Token isn’t authorized for admin:write. |
500 | Reset failed. Check API logs. |
When the Crawler Trigger Fails
Section titled “When the Crawler Trigger Fails”If the immediate gRPC trigger to the crawler fails (network blip, crawler down), the response still returns 202 but with:
{ "message": "Recrawl state reset; immediate trigger failed (will run on next scheduler tick)", "trigger_error": "could not reach crawler service"}The DB state is reset regardless, so the crawler picks the work up on its next periodic tick (within an hour). You don’t need to retry — but you can if you want immediate execution.
Tenant Scope
Section titled “Tenant Scope”The DB reset only affects the caller’s tenant. RLS enforces this at the database layer, so even an attempt to specify another tenant’s repository would resolve to zero rows.
The crawler-side trigger currently runs against the crawler’s process tenant context. In a single-tenant-per-crawler deployment this is correct. Multi-tenant crawler deployments are not yet supported by this endpoint — track this on the platform repo if you need it.
Cost Awareness
Section titled “Cost Awareness”Recrawl issues fresh GitHub API calls. A whole-tenant recrawl with 1,000 repos can consume a few thousand requests against your GitHub rate limit. Watch the crawler’s Rate limits refreshed log lines if you’re close to the cap.
There’s no per-tenant cooldown today — be deliberate about when you call this.
Combining with Backfill
Section titled “Combining with Backfill”After a Meilisearch wipe, the recovery sequence is:
# 1. Rebuild entity/repo/k8s/mold indexes from PostgreSQL (fast, no GitHub calls)curl -X POST https://your-domain/api/v1/admin/backfill \ -H "Authorization: Bearer <admin-token>"
# 2. Force docs to re-fetch from GitHub (slower, hits rate limit)curl -X POST https://your-domain/api/v1/admin/recrawl \ -H "Authorization: Bearer <admin-token>"Run them in that order. Backfill is fast and doesn’t depend on the crawler. Recrawl can take minutes to hours depending on how many docs you have.