Deploying Shoehorn with FluxCD
Production-ready FluxCD manifests for deploying the Shoehorn Helm chart without common anti-patterns.
Prerequisites
Section titled “Prerequisites”- Flux v2.3+ bootstrapped in your cluster
fluxCLI installed- Shoehorn secrets already created (see Helm deployment guide)
- OCI registry access configured
Architecture
Section titled “Architecture”The deployment is structured in three layers to handle dependency ordering correctly:
Layer 1: Infrastructure (Kustomization) └── Namespace, Secrets, CRD controllers (cert-manager)
Layer 2: Dependencies (Kustomization, dependsOn: Layer 1) └── PostgreSQL, Meilisearch, Valkey, Redpanda, Cerbos
Layer 3: Shoehorn (HelmRelease, wrapped in Kustomization dependsOn: Layer 2) └── API, Web, Worker, Crawler, Forge, EventBusThis layered approach avoids the CRD chicken-and-egg problem and ensures databases are ready before application services start.
Important: FluxCD Dependency Rules
Section titled “Important: FluxCD Dependency Rules”dependsOnonly works between resources of the same kind (HelmRelease-to-HelmRelease, Kustomization-to-Kustomization)- To order a HelmRelease after a Kustomization, wrap the HelmRelease in a Kustomization and use Kustomization-level
dependsOn dependsOnis reliable on first install but less so on updates — design charts to tolerate out-of-order updates
Step 1: Create the OCI HelmRepository
Section titled “Step 1: Create the OCI HelmRepository”apiVersion: source.toolkit.fluxcd.io/v1kind: HelmRepositorymetadata: name: shoehorn namespace: flux-systemspec: type: oci # No oci:// prefix needed for type: oci repositories url: oci://ghcr.io/shoehorn-dev/helm-charts interval: 10mNote: OCI HelmRepositories show no stored artifact in status. This is expected. The chart is only fetched when a HelmRelease references it.
Step 2: Create the Namespace
Section titled “Step 2: Create the Namespace”apiVersion: v1kind: Namespacemetadata: name: shoehornCreate namespaces via Kustomization, not inside Helm charts. Helm releases are namespace-scoped and should not manage cluster-scoped resources.
Step 3: Create the HelmRelease
Section titled “Step 3: Create the HelmRelease”apiVersion: helm.toolkit.fluxcd.io/v2kind: HelmReleasemetadata: name: shoehorn namespace: shoehornspec: # releaseName must be unique per namespace -- set equal to metadata.name # to avoid duplicate releaseName race conditions releaseName: shoehorn interval: 30m # Faster retry on failure (don't wait the full 30m interval) retryInterval: 2m chart: spec: chart: shoehorn # Always pin to semver -- never use '*' or mutable tags version: "0.1.0" sourceRef: kind: HelmRepository name: shoehorn namespace: flux-system interval: 10m
# --- Timeouts --- # Shoehorn includes PostgreSQL and Meilisearch which need PV provisioning # Default 5m is too short for stateful workloads timeout: 15m install: timeout: 15m # Create the target namespace if it doesn't exist createNamespace: true remediation: retries: 3 upgrade: timeout: 10m # Clean up orphaned resources from failed upgrades cleanupOnFail: true remediation: retries: 3 # Roll back on failure after exhausting retries remediateLastFailure: true # Do NOT set force: true unless you need to change immutable fields # force: true causes delete+recreate which means downtime uninstall: timeout: 5m
# --- Values --- values: global: domain: shoehorn.example.com environment: production logLevel: info storageClass: "gp3" organization: name: "Acme Corp" slug: "acme-corp"
image: # Pin to a release. Avoid 'latest' for reproducibility. tag: "v2026.3.0" pullPolicy: IfNotPresent
replicaCount: api: 2 web: 2 worker: 3 crawler: 2 forge: 2 eventbus: 1
auth: provider: zitadel zitadel: externalUrl: https://auth.example.com projectId: "CHANGE_ME" clientId: "CHANGE_ME"
postgresql: enabled: true persistence: size: 20Gi
meilisearch: enabled: true persistence: size: 10Gi
valkey: enabled: true
redpanda: enabled: true
cerbos: enabled: true
ingressRoute: enabled: true tls: enabled: true certResolver: letsencrypt
# --- Values from Secrets --- # Merge sensitive values from pre-created Kubernetes Secrets # These secrets are managed outside Flux (never in Git) valuesFrom: - kind: Secret name: database-credentials valuesKey: postgres_password targetPath: postgresql.password - kind: Secret name: database-credentials valuesKey: db_password targetPath: postgresql.appUserPassword - kind: Secret name: auth-credentials valuesKey: session-encryption-key targetPath: auth.sessionEncryptionKey - kind: Secret name: service-credentials valuesKey: meilisearchMasterKey targetPath: meilisearch.masterKeyStep 4: Create the Kustomization
Section titled “Step 4: Create the Kustomization”Wrap the HelmRelease in a Kustomization for dependency ordering:
apiVersion: kustomize.toolkit.fluxcd.io/v1kind: Kustomizationmetadata: name: shoehorn namespace: flux-systemspec: interval: 30m retryInterval: 2m # Service account with namespace-scoped permissions only serviceAccountName: shoehorn-deployer sourceRef: kind: GitRepository name: flux-system path: ./clusters/production/shoehorn prune: true wait: true timeout: 20m # If using SOPS for encrypted secrets in this path: # decryption: # provider: sops # secretRef: # name: sops-age-keyAnd the supporting Kustomize file:
# clusters/production/shoehorn/kustomization.yaml (Kustomize file)apiVersion: kustomize.config.k8s.io/v1beta1kind: Kustomizationresources: - namespace.yaml - release.yamlStep 5: Create the ServiceAccount (Multi-Tenant)
Section titled “Step 5: Create the ServiceAccount (Multi-Tenant)”Restrict what the Shoehorn Kustomization can deploy:
apiVersion: v1kind: ServiceAccountmetadata: name: shoehorn-deployer namespace: shoehorn---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: shoehorn-deployer namespace: shoehornroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin # Scoped to namespace via RoleBindingsubjects: - kind: ServiceAccount name: shoehorn-deployer namespace: shoehornAnti-Patterns Avoided
Section titled “Anti-Patterns Avoided”| Anti-Pattern | Risk | What We Do Instead |
|---|---|---|
Duplicate releaseName in namespace | Race condition: HelmReleases delete each other’s resources | releaseName set to metadata.name |
version: "*" or mutable OCI tags | Non-reproducible, silent upgrades | Pin to semver 0.1.0 |
| Default 5m timeout for stateful workloads | PostgreSQL/Meilisearch PV provisioning times out | 15m install, 10m upgrade |
retries: 0 + remediateLastFailure: false | No rollback ever triggers on failure | retries: 3 + remediateLastFailure: true |
Missing cleanupOnFail | Failed upgrades leave orphaned resources | Enabled on upgrade spec |
force: true on upgrades | Deletes resources before recreation = downtime | Only use when changing immutable fields |
| Plaintext secrets in Git | Credentials exposed in version history | valuesFrom referencing pre-created Secrets |
Cross-resource dependsOn | HelmRelease can’t depend on Kustomization | Wrap in Kustomization for layered ordering |
| Helm charts creating Namespaces | Namespace is cluster-scoped, chart is namespace-scoped | Namespace in Kustomization, not in chart |
Missing serviceAccountName | Controller uses its own cluster-admin privileges | Explicit service account with namespace-scoped RBAC |
retryInterval same as interval | No benefit from retry — waits the full reconciliation cycle | interval: 30m, retryInterval: 2m |
valuesFrom ConfigMap changes ignored when Stalled | HelmRelease stays stuck after config fix | Use Kustomize configMapGenerator with hash suffixes for dynamic refs |
Managing Secrets
Section titled “Managing Secrets”Option A: SOPS with Age (Built-in Flux Support)
Section titled “Option A: SOPS with Age (Built-in Flux Support)”# Generate an Age keyage-keygen -o age.key
# Create the decryption secret in flux-system namespacekubectl create secret generic sops-age-key \ --from-file=age.agekey=age.key \ -n flux-system
# Encrypt a secret filesops --age=age1... --encrypt --in-place secret.yamlEnable decryption on the Kustomization:
spec: decryption: provider: sops secretRef: name: sops-age-key # Must be in the same namespace as the KustomizationImportant: The SOPS decryption secret must be in the same namespace as the Kustomization. Cross-namespace secret references are not allowed.
Option B: External Secrets Operator
Section titled “Option B: External Secrets Operator”Deploy ESO independently, then create ExternalSecret resources:
apiVersion: external-secrets.io/v1beta1kind: ExternalSecretmetadata: name: database-credentials namespace: shoehornspec: refreshInterval: 1h secretStoreRef: name: aws-secretsmanager kind: ClusterSecretStore target: name: database-credentials data: - secretKey: postgres_password remoteRef: key: shoehorn/database property: postgres_password - secretKey: db_password remoteRef: key: shoehorn/database property: db_passwordOption C: Pre-created Secrets
Section titled “Option C: Pre-created Secrets”Create secrets before Flux reconciles (as shown in the Helm guide). Reference them in valuesFrom on the HelmRelease.
Upgrading Shoehorn
Section titled “Upgrading Shoehorn”- Update
spec.chart.spec.versioninrelease.yamlto the new chart version - Review the changelog for breaking changes
- Commit and push — Flux reconciles the upgrade on the next interval
- Force immediate reconciliation:
flux reconcile helmrelease shoehorn -n shoehorn- Monitor:
flux get helmrelease shoehorn -n shoehornflux events --for HelmRelease/shoehorn -n shoehornMulti-Tenant Configuration
Section titled “Multi-Tenant Configuration”For production multi-tenant clusters, enable these Flux controller flags:
# Prevent tenants from referencing resources in other namespaces--no-cross-namespace-refs=true
# Default service account for all reconciliations (safety net)--default-service-account=defaultSet these on all controllers: helm-controller, kustomize-controller, notification-controller, image-reflector-controller, image-automation-controller.
Scaling: Helm Controller Memory
Section titled “Scaling: Helm Controller Memory”At scale (60+ HelmReleases), the helm-controller’s default 1Gi memory limit may not be enough. Symptoms: OOMKilled restarts.
# flux-system/helm-controller patchspec: template: spec: containers: - name: manager resources: limits: memory: 3Gi args: # Graceful shutdown before OOM kill - --feature-gates=OOMWatch=true - --oom-watch-memory-threshold=95 # Process more releases in parallel - --concurrent=8Debugging
Section titled “Debugging”# Check HelmRelease statusflux get helmrelease shoehorn -n shoehorn
# View controller eventsflux events --for HelmRelease/shoehorn -n shoehorn
# View Helm release state (use storageNamespace, not targetNamespace)helm list -n shoehornhelm history shoehorn -n shoehorn
# If stuck after exhausted retries, suspend and resumeflux suspend helmrelease shoehorn -n shoehornflux resume helmrelease shoehorn -n shoehorn
# Nuclear option: delete and let Flux re-installflux delete helmrelease shoehorn -n shoehorn# Then re-apply the HelmRelease manifestDrift Detection
Section titled “Drift Detection”Flux can detect and correct cluster drift (manual kubectl changes overwritten by Helm on next reconciliation). Enable explicitly:
spec: driftDetection: mode: enabled # Exclude specific resources from drift detection if they cause loops ignore: - paths: ["/spec/replicas"] target: kind: DeploymentKnown issue: CRDs do not support StrategicMergePatch. If drift detection causes endless upgrade loops on CRs, disable it for those specific resources via the
ignorelist.