Search Backfill (Meilisearch Re-indexing)
Shoehorn uses Meilisearch for full-text search. It runs single-node, so disk loss equals index loss.
You don’t need to back Meilisearch up. PostgreSQL is the source of truth for entities, repositories, and molds. The backfill API rebuilds those indexes from PostgreSQL. Snapshots and dumps are an optional speed optimization.
Two collections need different recovery paths:
- Docs. The crawler ships markdown straight to Meilisearch and only keeps tracking metadata in PostgreSQL. Use Recrawl — it tells the crawler to re-fetch from GitHub.
- K8s workloads. The agent push handler intentionally doesn’t index workloads to Meilisearch (catalog entities cover them). Postgres has the data, but there’s no search index for backfill to rebuild. After a Meilisearch wipe, the next agent push restores entity-level search results — no manual action needed. The agent’s sync interval is typically 30s–1m.
See also: Backup & Restore for PostgreSQL backups.
When to Use
Section titled “When to Use”- After Meilisearch data loss or corruption (entities, repositories, molds)
- After a Meilisearch version upgrade
- After restoring from a database backup
For lost docs, use Recrawl. For K8s workloads, just wait for the next agent push.
Endpoints
Section titled “Endpoints”Trigger Backfill
Section titled “Trigger Backfill”curl -X POST https://your-domain/api/v1/admin/backfill \ -H "Authorization: Bearer <admin-token>" \ -H "Content-Type: application/json"Response: 202 Accepted
{ "message": "Backfill started", "collections": ["entities", "repositories", "molds", "users", "teams"], "started_at": "2026-05-06T14:00:00Z"}Selective Backfill
Section titled “Selective Backfill”To re-index only specific collections:
curl -X POST https://your-domain/api/v1/admin/backfill \ -H "Authorization: Bearer <admin-token>" \ -H "Content-Type: application/json" \ -d '{"collections": ["entities", "repositories"]}'Check Status
Section titled “Check Status”curl https://your-domain/api/v1/admin/backfill/status \ -H "Authorization: Bearer <admin-token>"Running:
{ "status": "running", "collections": ["entities", "repositories"], "started_at": "2026-05-06T14:00:00Z", "duration_seconds": 4.2}Completed:
{ "status": "completed", "collections": ["entities", "repositories", "molds", "users", "teams"], "started_at": "2026-05-06T14:00:00Z", "completed_at": "2026-05-06T14:00:15Z", "duration_seconds": 15.0}Collections
Section titled “Collections”| Collection | Source Table | Notes |
|---|---|---|
entities | catalog_entities | Service catalog. Always populated. |
repositories | repositories | Git repositories synced by the crawler. |
molds | molds | Forge templates. Public + tenant-scoped. |
users | — | No-op. Users are synced live from the IdP. |
teams | — | No-op. Same as users. |
Three collections are intentionally absent:
docs,docs_sites— content lives only in Meilisearch. Use Recrawl.k8s_workloads— never indexed to Meilisearch in the first place. Entity results cover the same searches; the agent re-pushes Postgres state on its next sync.
Behavior
Section titled “Behavior”The endpoint runs in the background and returns 202 Accepted immediately. Only one backfill can run at a time; a second request returns 409 Conflict. Admin RBAC is required.
Backfill is non-destructive. It writes documents into Meilisearch and never deletes. It reads across all tenants using the RLS system user.
Status Values
Section titled “Status Values”| Status | Meaning |
|---|---|
idle | No backfill has run since the API started. |
running | Backfill in progress. |
completed | Last backfill finished without errors. |
failed | Last backfill failed. Check the error field and the API logs. |
The status is in-memory only. Restarting the API resets it to idle.
Meilisearch Upgrade Procedure
Section titled “Meilisearch Upgrade Procedure”When upgrading Meilisearch:
-
Create a dump (cross-version safe):
Terminal window curl -X POST http://meilisearch:7700/dumps \-H "Authorization: Bearer $MEILI_MASTER_KEY" -
Stop old Meilisearch and clear its data directory.
-
Start the new version with
--import-dump /meili_data/dumps/TIMESTAMP.dump.
Or skip the dump. Start the new Meilisearch fresh, run backfill, then run Recrawl to recover docs:
curl -X POST https://your-domain/api/v1/admin/backfill -H "Authorization: Bearer <token>"curl -X POST https://your-domain/api/v1/admin/recrawl -H "Authorization: Bearer <token>"Optional Meilisearch Snapshots
Section titled “Optional Meilisearch Snapshots”Backfill plus recrawl is the primary recovery path. Snapshots and dumps are optional and only worth running if rebuilding is too slow:
- Hourly snapshots (fast restore, same-version only): add
--schedule-snapshot --snapshot-interval-sec=3600to Meilisearch startup args. - Daily dumps (slower, cross-version safe): schedule a CronJob that calls
POST /dumpson Meilisearch.
If both are lost, or you upgrade Meilisearch, run backfill plus recrawl. PostgreSQL plus GitHub has everything.