Skip to content

Deploying Shoehorn with FluxCD

Production-ready FluxCD manifests for deploying the Shoehorn Helm chart without common anti-patterns.

  • Flux v2.3+ bootstrapped in your cluster
  • flux CLI installed
  • Shoehorn secrets already created (see Helm deployment guide)
  • OCI registry access configured

The deployment is structured in three layers to handle dependency ordering correctly:

Layer 1: Infrastructure (Kustomization)
└── Namespace, Secrets, CRD controllers (cert-manager)
Layer 2: Dependencies (Kustomization, dependsOn: Layer 1)
└── PostgreSQL, Meilisearch, Valkey, Redpanda, Cerbos
Layer 3: Shoehorn (HelmRelease, wrapped in Kustomization dependsOn: Layer 2)
└── API, Web, Worker, Crawler, Forge, EventBus

This layered approach avoids the CRD chicken-and-egg problem and ensures databases are ready before application services start.

  • dependsOn only works between resources of the same kind (HelmRelease-to-HelmRelease, Kustomization-to-Kustomization)
  • To order a HelmRelease after a Kustomization, wrap the HelmRelease in a Kustomization and use Kustomization-level dependsOn
  • dependsOn is reliable on first install but less so on updates — design charts to tolerate out-of-order updates
clusters/production/sources/shoehorn-repo.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: shoehorn
namespace: flux-system
spec:
type: oci
# No oci:// prefix needed for type: oci repositories
url: oci://ghcr.io/shoehorn-dev/helm-charts
interval: 10m

Note: OCI HelmRepositories show no stored artifact in status. This is expected. The chart is only fetched when a HelmRelease references it.

clusters/production/shoehorn/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: shoehorn

Create namespaces via Kustomization, not inside Helm charts. Helm releases are namespace-scoped and should not manage cluster-scoped resources.

clusters/production/shoehorn/release.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: shoehorn
namespace: shoehorn
spec:
# releaseName must be unique per namespace -- set equal to metadata.name
# to avoid duplicate releaseName race conditions
releaseName: shoehorn
interval: 30m
# Faster retry on failure (don't wait the full 30m interval)
retryInterval: 2m
chart:
spec:
chart: shoehorn
# Always pin to semver -- never use '*' or mutable tags
version: "0.1.0"
sourceRef:
kind: HelmRepository
name: shoehorn
namespace: flux-system
interval: 10m
# --- Timeouts ---
# Shoehorn includes PostgreSQL and Meilisearch which need PV provisioning
# Default 5m is too short for stateful workloads
timeout: 15m
install:
timeout: 15m
# Create the target namespace if it doesn't exist
createNamespace: true
remediation:
retries: 3
upgrade:
timeout: 10m
# Clean up orphaned resources from failed upgrades
cleanupOnFail: true
remediation:
retries: 3
# Roll back on failure after exhausting retries
remediateLastFailure: true
# Do NOT set force: true unless you need to change immutable fields
# force: true causes delete+recreate which means downtime
uninstall:
timeout: 5m
# --- Values ---
values:
global:
domain: shoehorn.example.com
environment: production
logLevel: info
storageClass: "gp3"
organization:
name: "Acme Corp"
slug: "acme-corp"
image:
# Pin to a release. Avoid 'latest' for reproducibility.
tag: "v2026.3.0"
pullPolicy: IfNotPresent
replicaCount:
api: 2
web: 2
worker: 3
crawler: 2
forge: 2
eventbus: 1
auth:
provider: zitadel
zitadel:
externalUrl: https://auth.example.com
projectId: "CHANGE_ME"
clientId: "CHANGE_ME"
postgresql:
enabled: true
persistence:
size: 20Gi
meilisearch:
enabled: true
persistence:
size: 10Gi
valkey:
enabled: true
redpanda:
enabled: true
cerbos:
enabled: true
ingressRoute:
enabled: true
tls:
enabled: true
certResolver: letsencrypt
# --- Values from Secrets ---
# Merge sensitive values from pre-created Kubernetes Secrets
# These secrets are managed outside Flux (never in Git)
valuesFrom:
- kind: Secret
name: database-credentials
valuesKey: postgres_password
targetPath: postgresql.password
- kind: Secret
name: database-credentials
valuesKey: db_password
targetPath: postgresql.appUserPassword
- kind: Secret
name: auth-credentials
valuesKey: session-encryption-key
targetPath: auth.sessionEncryptionKey
- kind: Secret
name: service-credentials
valuesKey: meilisearchMasterKey
targetPath: meilisearch.masterKey

Wrap the HelmRelease in a Kustomization for dependency ordering:

clusters/production/shoehorn/kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: shoehorn
namespace: flux-system
spec:
interval: 30m
retryInterval: 2m
# Service account with namespace-scoped permissions only
serviceAccountName: shoehorn-deployer
sourceRef:
kind: GitRepository
name: flux-system
path: ./clusters/production/shoehorn
prune: true
wait: true
timeout: 20m
# If using SOPS for encrypted secrets in this path:
# decryption:
# provider: sops
# secretRef:
# name: sops-age-key

And the supporting Kustomize file:

# clusters/production/shoehorn/kustomization.yaml (Kustomize file)
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- release.yaml

Step 5: Create the ServiceAccount (Multi-Tenant)

Section titled “Step 5: Create the ServiceAccount (Multi-Tenant)”

Restrict what the Shoehorn Kustomization can deploy:

clusters/production/shoehorn/rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: shoehorn-deployer
namespace: shoehorn
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: shoehorn-deployer
namespace: shoehorn
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin # Scoped to namespace via RoleBinding
subjects:
- kind: ServiceAccount
name: shoehorn-deployer
namespace: shoehorn
Anti-PatternRiskWhat We Do Instead
Duplicate releaseName in namespaceRace condition: HelmReleases delete each other’s resourcesreleaseName set to metadata.name
version: "*" or mutable OCI tagsNon-reproducible, silent upgradesPin to semver 0.1.0
Default 5m timeout for stateful workloadsPostgreSQL/Meilisearch PV provisioning times out15m install, 10m upgrade
retries: 0 + remediateLastFailure: falseNo rollback ever triggers on failureretries: 3 + remediateLastFailure: true
Missing cleanupOnFailFailed upgrades leave orphaned resourcesEnabled on upgrade spec
force: true on upgradesDeletes resources before recreation = downtimeOnly use when changing immutable fields
Plaintext secrets in GitCredentials exposed in version historyvaluesFrom referencing pre-created Secrets
Cross-resource dependsOnHelmRelease can’t depend on KustomizationWrap in Kustomization for layered ordering
Helm charts creating NamespacesNamespace is cluster-scoped, chart is namespace-scopedNamespace in Kustomization, not in chart
Missing serviceAccountNameController uses its own cluster-admin privilegesExplicit service account with namespace-scoped RBAC
retryInterval same as intervalNo benefit from retry — waits the full reconciliation cycleinterval: 30m, retryInterval: 2m
valuesFrom ConfigMap changes ignored when StalledHelmRelease stays stuck after config fixUse Kustomize configMapGenerator with hash suffixes for dynamic refs

Option A: SOPS with Age (Built-in Flux Support)

Section titled “Option A: SOPS with Age (Built-in Flux Support)”
Terminal window
# Generate an Age key
age-keygen -o age.key
# Create the decryption secret in flux-system namespace
kubectl create secret generic sops-age-key \
--from-file=age.agekey=age.key \
-n flux-system
# Encrypt a secret file
sops --age=age1... --encrypt --in-place secret.yaml

Enable decryption on the Kustomization:

spec:
decryption:
provider: sops
secretRef:
name: sops-age-key # Must be in the same namespace as the Kustomization

Important: The SOPS decryption secret must be in the same namespace as the Kustomization. Cross-namespace secret references are not allowed.

Deploy ESO independently, then create ExternalSecret resources:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
namespace: shoehorn
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secretsmanager
kind: ClusterSecretStore
target:
name: database-credentials
data:
- secretKey: postgres_password
remoteRef:
key: shoehorn/database
property: postgres_password
- secretKey: db_password
remoteRef:
key: shoehorn/database
property: db_password

Create secrets before Flux reconciles (as shown in the Helm guide). Reference them in valuesFrom on the HelmRelease.

  1. Update spec.chart.spec.version in release.yaml to the new chart version
  2. Review the changelog for breaking changes
  3. Commit and push — Flux reconciles the upgrade on the next interval
  4. Force immediate reconciliation:
Terminal window
flux reconcile helmrelease shoehorn -n shoehorn
  1. Monitor:
Terminal window
flux get helmrelease shoehorn -n shoehorn
flux events --for HelmRelease/shoehorn -n shoehorn

For production multi-tenant clusters, enable these Flux controller flags:

Terminal window
# Prevent tenants from referencing resources in other namespaces
--no-cross-namespace-refs=true
# Default service account for all reconciliations (safety net)
--default-service-account=default

Set these on all controllers: helm-controller, kustomize-controller, notification-controller, image-reflector-controller, image-automation-controller.

At scale (60+ HelmReleases), the helm-controller’s default 1Gi memory limit may not be enough. Symptoms: OOMKilled restarts.

# flux-system/helm-controller patch
spec:
template:
spec:
containers:
- name: manager
resources:
limits:
memory: 3Gi
args:
# Graceful shutdown before OOM kill
- --feature-gates=OOMWatch=true
- --oom-watch-memory-threshold=95
# Process more releases in parallel
- --concurrent=8
Terminal window
# Check HelmRelease status
flux get helmrelease shoehorn -n shoehorn
# View controller events
flux events --for HelmRelease/shoehorn -n shoehorn
# View Helm release state (use storageNamespace, not targetNamespace)
helm list -n shoehorn
helm history shoehorn -n shoehorn
# If stuck after exhausted retries, suspend and resume
flux suspend helmrelease shoehorn -n shoehorn
flux resume helmrelease shoehorn -n shoehorn
# Nuclear option: delete and let Flux re-install
flux delete helmrelease shoehorn -n shoehorn
# Then re-apply the HelmRelease manifest

Flux can detect and correct cluster drift (manual kubectl changes overwritten by Helm on next reconciliation). Enable explicitly:

spec:
driftDetection:
mode: enabled
# Exclude specific resources from drift detection if they cause loops
ignore:
- paths: ["/spec/replicas"]
target:
kind: Deployment

Known issue: CRDs do not support StrategicMergePatch. If drift detection causes endless upgrade loops on CRs, disable it for those specific resources via the ignore list.