Backup & Restore

PostgreSQL is the only thing you need to back up. Meilisearch rebuilds from PostgreSQL via the backfill API, Valkey is an ephemeral cache, and Redpanda events are replayable. Back up PostgreSQL regularly or you’ll lose tenant data on disk failure.

What to Back Up

Component	Backup Required	Method
PostgreSQL	Yes	`pg_dump`
Meilisearch	No	Rebuilt from PostgreSQL via the backfill API
Valkey (Redis)	No	Ephemeral cache, repopulated on restart
Redpanda (Kafka)	Optional	Events are replayable; back up only if retention matters

Backup

export DATABASE_URL="postgresql://shoehorn_user:password@host:5432/shoehorn"

pg_dump "$DATABASE_URL" \
  --format=custom \
  --no-owner \
  --no-privileges \
  --file="shoehorn_$(date +%Y%m%d_%H%M%S).dump"

Use --format=custom for compression and selective restore, --no-owner for portability across database users, and --no-privileges to skip GRANTs that won’t apply to the target cluster.

Restore

export DATABASE_URL="postgresql://shoehorn_user:password@host:5432/shoehorn"

pg_restore \
  --dbname="$DATABASE_URL" \
  --clean --if-exists \
  --no-owner --no-privileges \
  shoehorn_20260324_120000.dump

--clean --if-exists drops and recreates objects during restore. Expect “relation already exists” warnings on first-time clean. They’re safe to ignore.

Automated Daily Backups

Kubernetes CronJob

apiVersion: batch/v1
kind: CronJob
metadata:
  name: shoehorn-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: postgres:18-alpine
            command:
            - /bin/sh
            - -c
            - |
              pg_dump "$DATABASE_URL" \
                --format=custom --no-owner --no-privileges \
                --file="/backups/shoehorn_$(date +%Y%m%d_%H%M%S).dump"
              find /backups -name "shoehorn_*.dump" -mtime +14 -delete
            env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: database-credentials
                  key: backup_url
            volumeMounts:
            - name: backups
              mountPath: /backups
          restartPolicy: OnFailure
          volumes:
          - name: backups
            persistentVolumeClaim:
              claimName: shoehorn-backups

Cron (Linux/macOS host)

# crontab -e
0 2 * * * DATABASE_URL="postgresql://shoehorn_user:pass@host:5432/shoehorn" \
  pg_dump "$DATABASE_URL" --format=custom --no-owner --no-privileges \
  --file=/var/backups/shoehorn/shoehorn_$(date +\%Y\%m\%d_\%H\%M\%S).dump \
  >> /var/log/shoehorn-backup.log 2>&1

Post-Restore Steps

After restoring a backup:

Re-run migrations (if upgrading): docker run shoehorn-api --migrate-only.
Rebuild the search index: call the backfill API. Restarting the API does not re-index. You need this call.
Terminal window
```
curl -X POST https://your-domain/api/v1/admin/backfill \
  -H "Authorization: Bearer <admin-token>"
```
See Search Backfill for status checks and selective re-indexing.
Clear cache: restart Valkey or wait for TTL expiry.
Verify RLS policies are intact: psql "$DATABASE_URL" -c "SELECT schemaname, tablename, rowsecurity FROM pg_tables WHERE schemaname='public' AND rowsecurity=true;". Every tenant table should be listed.

Disaster Recovery Checklist

Backups running on schedule (check logs)
Backups stored off-host (cloud storage, NAS, different server)
Test restore quarterly (restore to staging, verify data)
DATABASE_URL and credentials in a secret manager, never in the backup directory
RLS policies verified after restore

Troubleshooting

”role does not exist” during restore

The backup was created with a different database user. Use --no-owner:

pg_restore --dbname="$DATABASE_URL" --no-owner --no-privileges backup.dump

Large database takes too long

For databases over 10 GB:

Parallel dump: pg_dump --jobs=4 (requires directory format).
WAL archiving: continuous backup via PostgreSQL WAL for point-in-time recovery.
pg_basebackup: physical backup for very large databases.