Skip to content

Search Backfill (Meilisearch Re-indexing)

Shoehorn uses Meilisearch for full-text search. It runs single-node, so disk loss equals index loss.

You don’t need to back Meilisearch up. PostgreSQL is the source of truth for entities, repositories, and molds. The backfill API rebuilds those indexes from PostgreSQL. Snapshots and dumps are an optional speed optimization.

Two collections need different recovery paths:

  • Docs. The crawler ships markdown straight to Meilisearch and only keeps tracking metadata in PostgreSQL. Use Recrawl — it tells the crawler to re-fetch from GitHub.
  • K8s workloads. The agent push handler intentionally doesn’t index workloads to Meilisearch (catalog entities cover them). Postgres has the data, but there’s no search index for backfill to rebuild. After a Meilisearch wipe, the next agent push restores entity-level search results — no manual action needed. The agent’s sync interval is typically 30s–1m.

See also: Backup & Restore for PostgreSQL backups.

  • After Meilisearch data loss or corruption (entities, repositories, molds)
  • After a Meilisearch version upgrade
  • After restoring from a database backup

For lost docs, use Recrawl. For K8s workloads, just wait for the next agent push.

Terminal window
curl -X POST https://your-domain/api/v1/admin/backfill \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json"

Response: 202 Accepted

{
"message": "Backfill started",
"collections": ["entities", "repositories", "molds", "users", "teams"],
"started_at": "2026-05-06T14:00:00Z"
}

To re-index only specific collections:

Terminal window
curl -X POST https://your-domain/api/v1/admin/backfill \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{"collections": ["entities", "repositories"]}'
Terminal window
curl https://your-domain/api/v1/admin/backfill/status \
-H "Authorization: Bearer <admin-token>"

Running:

{
"status": "running",
"collections": ["entities", "repositories"],
"started_at": "2026-05-06T14:00:00Z",
"duration_seconds": 4.2
}

Completed:

{
"status": "completed",
"collections": ["entities", "repositories", "molds", "users", "teams"],
"started_at": "2026-05-06T14:00:00Z",
"completed_at": "2026-05-06T14:00:15Z",
"duration_seconds": 15.0
}
CollectionSource TableNotes
entitiescatalog_entitiesService catalog. Always populated.
repositoriesrepositoriesGit repositories synced by the crawler.
moldsmoldsForge templates. Public + tenant-scoped.
usersNo-op. Users are synced live from the IdP.
teamsNo-op. Same as users.

Three collections are intentionally absent:

  • docs, docs_sites — content lives only in Meilisearch. Use Recrawl.
  • k8s_workloads — never indexed to Meilisearch in the first place. Entity results cover the same searches; the agent re-pushes Postgres state on its next sync.

The endpoint runs in the background and returns 202 Accepted immediately. Only one backfill can run at a time; a second request returns 409 Conflict. Admin RBAC is required.

Backfill is non-destructive. It writes documents into Meilisearch and never deletes. It reads across all tenants using the RLS system user.

StatusMeaning
idleNo backfill has run since the API started.
runningBackfill in progress.
completedLast backfill finished without errors.
failedLast backfill failed. Check the error field and the API logs.

The status is in-memory only. Restarting the API resets it to idle.

When upgrading Meilisearch:

  1. Create a dump (cross-version safe):

    Terminal window
    curl -X POST http://meilisearch:7700/dumps \
    -H "Authorization: Bearer $MEILI_MASTER_KEY"
  2. Stop old Meilisearch and clear its data directory.

  3. Start the new version with --import-dump /meili_data/dumps/TIMESTAMP.dump.

Or skip the dump. Start the new Meilisearch fresh, run backfill, then run Recrawl to recover docs:

Terminal window
curl -X POST https://your-domain/api/v1/admin/backfill -H "Authorization: Bearer <token>"
curl -X POST https://your-domain/api/v1/admin/recrawl -H "Authorization: Bearer <token>"

Backfill plus recrawl is the primary recovery path. Snapshots and dumps are optional and only worth running if rebuilding is too slow:

  • Hourly snapshots (fast restore, same-version only): add --schedule-snapshot --snapshot-interval-sec=3600 to Meilisearch startup args.
  • Daily dumps (slower, cross-version safe): schedule a CronJob that calls POST /dumps on Meilisearch.

If both are lost, or you upgrade Meilisearch, run backfill plus recrawl. PostgreSQL plus GitHub has everything.