Changelog
Notable changes across the Shoehorn platform. Format follows Keep a Changelog; versions follow SemVer.
Per-component release notes live in their own repos:
[Unreleased]
Section titled “[Unreleased]”- Helm release detection. The Kubernetes agent can now spot the Helm releases running in your clusters and show which workloads each release manages, right in the Operations view. It’s off by default. Turn it on in the agent’s Helm chart or with the Terraform module. See Helm Release Detection.
- Attach documentation to a team. Add
related_teams: [your-team]to a doc’s frontmatter and it shows up on that team’s Documentation tab, under “Bound to this team”, even when the doc isn’t tied to a single service. Good for team charters, on-call policies, and ways of working. The tab now splits docs bound to the team from docs that come in through the team’s services. - Catalog list now has a Tier column. A small set of bars shows each service’s criticality (Critical, High, Medium, Low) at a glance, and you can sort by it to bring the most critical services to the top.
- Notification subscriptions. Route alerts to Slack and email destinations alongside the existing internal inbox.
- Notification channel secrets. Paste a Slack or webhook secret and Shoehorn stores it encrypted, or point a subscription at a secret you’ve already configured. Choose per field in the wizard.
- Team notifications tab. Team admins see their team’s subscriptions in one table, can fire a test send, and resume a paused subscription.
- Tenant feature flags
gitopsandmetrics. They control whether the GitOps signal card and the resource-usage section show up in the Operations drawer. Turn them on once you’ve connected a GitOps tool or metrics-server. - Knowledge Risk tile on the home dashboard. Shows how many repositories lean on a single maintainer, so you can spot bus-factor risk at a glance. Click through to the full breakdown.
- Admin → Directory → Users now lists every user synced from your IdP, not only people who have signed in. A new “Signed in” column shows whether each user has completed at least one login.
- In-place upgrades for the bundled Meilisearch search engine. When a release moves to a newer search version, you can migrate the existing data instead of rebuilding the index from scratch. See Upgrading Shoehorn for the steps.
Changed
Section titled “Changed”- The Repositories page got an overview strip showing totals, public/private split, languages, and repos without a license. Click a repository to see its details in a side panel, including license, topics, and ownership. You can also filter by license now. The direct link to GitHub moved to its own column.
- Service tier now reads as Critical, High, Medium, or Low. If your manifests use the older Bronze, Silver, Gold, or Platinum labels, they still work and show up as Low, Medium, High, and Critical.
- Home dashboard KPI cards. Each card now shows a 14-day sparkline and trend, built from a daily snapshot. History starts when you upgrade, so cards show just the number until there are two days to chart.
- Operations resource table has a new Signals column. A row with nothing firing stays empty.
- Visual refresh across the Operations page, Entity page, Dashboard, and Insights.
- Operations is now one page. Workloads, Overview, and GitOps merged into a single Resources view: workloads aggregated across clusters, a detail drawer per resource, and filters that live in the URL so a filtered view is shareable. The old sub-page URLs redirect to it.
- The Governance page now groups actions by urgency, so overdue work sits at the top. You can resolve or dismiss several at once, filter to overdue, critical, or high priority, and reach resolved actions in their own group. The duplicate completeness card is gone (it still lives on the Insights overview).
- Free tier no longer caps node count. Licensing is now per active K8s agent. Free stays at 1 active agent (one cluster). If you were running more than 5 nodes on Free, your agent heartbeats stop returning a 402.
- Admin → Directory → Users provider column now shows the real IdP (okta, zitadel, entra-id) instead of always showing “local”. “local” is reserved for users with no IdP identity row.
- K8s agent Prometheus metrics renamed to the
shoehorn_namespace prefix to match the rest of the platform.k8s_watcher_events_totalbecomesshoehorn_k8s_watcher_events_total, and the same applies toevents_dropped_total,events_dropped_current,events_filtered_total,noop_updates_total, anddrop_rate. Dashboards and alert rules that referenced the bare names need updating. - Health checks on Eventbus and worker binaries no longer advertise unregistered gRPC services. Probes hitting the empty-key (overall) health still report SERVING; per-service probes for
eventbus.v1.EventBusServiceandworker-servicenow return NOT_FOUND. - K8s-imported catalog entities now show their real workload kind as the type. A Deployment shows as
deployment, a StatefulSet asstatefulset, a CronJob ascronjob, the same classification the Operations page already uses. Until now they all showed asservice(DaemonSets asresource), so every k8s entity looked alike in the catalog. If you have scorecard rules or list filters matchingtype = 'service'ortype = 'resource'on k8s entities, switch them to the kind value. To classify a workload differently, set theshoehorn.dev/typeannotation (a kebab-case value up to 32 characters) and Shoehorn uses that instead. - Dark theme: primary buttons and focus rings no longer glow. The blue was tuned down so Save, Review, and other primary actions read as confident, not neon.
Removed
Section titled “Removed”- The standalone GitOps fleet page. Its ArgoCD/Flux sync-status view isn’t in the unified Operations page yet; for now it lives on the entity GitOps tab.
- The Pipelines page under Code, along with its home dashboard widget and KPI tile.
- The optional network observer add-on. Shoehorn now maps connections between services from your cluster’s configuration, so there’s no separate component to install.
- Org chart now shows team managers by name. Clicking a team opened a drawer showing the manager’s raw Okta ID instead of their name, and the team’s box on the chart had the same issue. Both now show the name, and fall back to “Unknown user” when the person isn’t in the directory.
- Stale cloud resources no longer linger in the catalog. When the crawler stops seeing a cloud resource (a deleted UpCloud Kubernetes cluster, database, or server), it marks the entry end-of-life after three missed syncs, hides it from the catalog list right away, and removes it from the database after a 7-day grace window. If the resource comes back during that window it reappears, and deep links to a hidden entry still resolve until it’s removed.
- Going back from a service’s detail page no longer resets the catalog. Your filters, search, and place in the list are kept when you return.
- Team dashboard widgets now link to the team they’re scoped to. “View all” on Team Members opens the team page; Repositories, Pipelines, and Governance open their respective list page filtered by
?owner=<teamId>. Before, all four jumped to unfiltered global pages. - K8s reconciliation now runs in production. Bugs in the agent’s resource count and push behavior were causing the startup sweep to be skipped, so out-of-scope catalog entries were never pruned. The count now reflects only kinds that land in the catalog as instances (Deployment, StatefulSet, DaemonSet, CronJob, Pod), matching what the platform actually stamps. The agent no longer POSTs empty batches that the platform rejects with 400. The completing sync marker still goes through unchanged. Both fixed in the K8s agent; no platform change.
- K8s agent no longer watches individual Jobs. CronJob is the catalog entity; an individual Job run is its execution and was creating catalog churn (each CronJob tick = one new instance row that immediately got deleted on completion) plus tripping reconciliation guards when a Job completed mid-sync. If you relied on seeing standalone Jobs in the catalog, the CronJob view is the supported replacement.
- Stale Kubernetes resources now get cleaned out of the catalog. On startup the agent sends a full cluster snapshot, and the platform prunes anything no longer there: workloads in namespaces you’ve added to
SHOEHORN_EXCLUDE_NAMESPACES, and resources deleted while the agent was offline. Before, the agent only sent incremental add/update/delete events, so those resources stayed in the catalog indefinitely. Reconciliation kicks in once both the platform and the K8s agent are on this version; an older agent keeps working unchanged, just without the cleanup. - Bootstrap admin on first login:
TENANT_ADMIN_USERagain grants the admin role. v0.5.3’semail_verifiedgate locked operators out when their IdP didn’t surface the claim (Okta API-created users, sandbox accounts). The operator already controls both the IdP andTENANT_ADMIN_USER, so the IdP itself is the gatekeeper; the extra gate was overcorrect for the enterprise IdPs we support. - OIDC callback now reads
email_verifiedfrom the verified ID token instead of the access token. Okta puts the claim only in the ID token per OIDC Core spec. - Helm install on clusters without a default StorageClass now works. PVCs for postgres, meilisearch, and redpanda no longer emit
storageClassName: ""; they omit the field when no class is configured, so the cluster default applies. Valkey already worked; the three other statefulsets now match its pattern. - Valkey now enforces the password it was wired up to use. The server starts with
--requirepass $(VALKEY_PASSWORD)whenvalkey.passwordSecretRef.keyis set, and the liveness probe authenticates viaREDISCLI_AUTHso it fails on a NOAUTH reply. - Chart no longer renders
:latestimage tags whenimage.tagis unset. Theshoehorn.componentImagehelper fails with a clear error message instead. - RBAC sync Job (
--set rbac.sync.enabled=true --set rbac.sync.mode=automatic) now renders. It previously referenced undefined helpers, the wrongpostgresql.sslModepath, a missingrbac.defaultRolevalue, used a hard-codedvalkey:9.0.0image, and had an indentation bug that produced invalid YAML. - Chart README TL;DR
helm installcommand now includes the required values (global.domain, organization slug, Zitadel projectId/clientId/externalUrl) and no longer references acustom-values.yamlfile that was never explained. global.domainplaceholder now blocks render. The chart fails fast if you forget to set your hostname instead of silently deploying withidp.example.com.- Forge service now honors
postgresql.tls.enabled. It was hard-coded toDB_SSLMODE=disableregardless of TLS config, which silently disabled TLS for forge while the other services used it. - PodDisruptionBudgets only render for components with 2+ replicas (or autoscaling enabled). With a single replica and
minAvailable: 1the PDB blocked node drains indefinitely, which matched no real intent. - RBAC sync Job name is now deterministic (uses
Release.Revision). The previous{{ now | date ... }}suffix madehelm diffthink every render changed the job and broke idempotent apply. - Install guide image-tag example updated to
v0.5.22(current Chart appVersion). The docs were stuck on a future-pretendingv0.7.0. - API keys admin page now deep-clones the SSR data prop before assigning to
$state. Without the clone, a child component mutating a row could crash the page withstate_unsafe_mutation. - The 30-second pod-status poller in K8sWorkloadsCard now stops when you navigate to a different entity, not only on component destroy. Old pollers no longer write back into a stale entity’s state.
- Profile dropdown in the top bar is no longer hidden behind page headers. The dropdown’s z-index was clamped by the top bar’s stacking context, so page-level sticky sub-headers on catalog, admin, forge, and other section pages painted over it. The top bar now sits above page content.
- Templates that referenced
shoehorn.io/for labels and annotations now useshoehorn.dev/, the actual org domain. - Helm install on a cluster without Traefik now fails fast with a clear message instead of producing IngressRoute manifests Kubernetes can’t accept. The check is skipped during
helm templatedry-runs. - Meilisearch env var standardized to
MEILISEARCH_MASTER_KEYacross api, worker, eventbus, crawler, and forge, matching upstream Meilisearch naming. Existing deployments usingMEILISEARCH_API_KEYstill work via a fallback in the client code; new installs use the canonical name. - Removed the unused
global.tracing.sampleRatevalue. It was declared but no backend consumed it. If sampling becomes wired later it’ll surface asOTEL_TRACES_SAMPLER_ARG, with docs. - Meilisearch pod now sets
seccompProfile: RuntimeDefault, matching the other backend pods. - “Bootstrap admin” renamed to “initial admin” in get-started docs. The
bootstrapjargon stays internal-only. - Get-started prerequisites table now says Helm 4.0+ (was incorrectly Helm 3.12+); matches the chart README and install guide.
- Drawer types in the operations UI (
DeploymentDetailDrawer,SLODetailDrawer) now use namedLinkedIncidentandSLOViolationinterfaces instead ofany[]. Backend shape changes surface as type errors at build time instead of silently breaking the drawer. getEntityDocsnow distinguishes “empty docs” from “API returned no data”. The latter logs a warning via the structured logger.GITHUB_FORGE_PRIVATE_KEY_PATHenv var on api + forge pods is only set whenauth.github.forge.appIdis configured. Stops the pod from advertising a path to a file that’s not mounted on default installs.- Startup fatal errors now flush the log entry before exiting. The five service binaries (api, forge, worker, eventbus, crawler) used
log.Fatal, which can lose buffered stdout on k8s pipes; they now go through alogger.Diehelper that callsSync()first.
Security
Section titled “Security”- Agent push API now rejects
dashboardUrlvalues whose scheme isn’thttporhttps. Closes a stored-XSS vector where a malicious or misconfiguredSHOEHORN_DASHBOARD_URL(e.g.javascript:...,data:text/html,...) would render as a clickable link in the UI. EmptydashboardUrlstays valid.
- Chart README now points to a Traefik install command for clusters without an ingress controller, and includes a PowerShell-native equivalent of the
openssl randcredential generator.
[v0.5.3] - 2026-05-13
Section titled “[v0.5.3] - 2026-05-13”Changed
Section titled “Changed”- Pricing model: clusters and nodes replace entities. Free covers 1 cluster with up to 5 nodes. Standard covers 3 clusters at €299/month with everything else unlimited. Extra clusters cost €99/month each.
- Existing licenses auto-convert. Beta customers keep going. When the Beta period ends, you get a 30-day read-only window to switch to Standard or drop back to Free. Catalog reads, dashboards, and agent traffic stay live during the window. Only catalog writes pause.
- After the 30-day window, tenants still over the Free limit get auto-trimmed to one cluster (smallest node count, oldest first). Data stays in place. The disconnected agents stop authenticating until reactivated under a paid license.
- Pending cluster registrations panel. When an agent registers past your licensed cluster count, it shows up here so an admin can request a license upgrade or dismiss the row.
[v0.5.22] - 2026-05-09
Section titled “[v0.5.22] - 2026-05-09”- Filter for profile entities by query, type, and team.
- Infinite login loop when a user without any assigned roles signed in. They’re now redirected to an error page explaining the missing access.
[v0.5.21] - 2026-05-08
Section titled “[v0.5.21] - 2026-05-08”- Public launch of the Shoehorn Platform.