Skip to content

K8s Agent

The Shoehorn K8s Agent runs inside each Kubernetes cluster you want to monitor. It watches workloads via Kubernetes informers, transforms them into catalog entities, and pushes the results to the Shoehorn API over HTTPS. No inbound connectivity is required — the agent is entirely push-based.

Install the agent with the official Helm chart from shoehorn-dev/helm-charts. The chart handles RBAC, service accounts, secrets, health probes, and leader election configuration.

The agent watches the Kubernetes API server through informers and reports changes as they happen:

ResourceWhat is collected
Deployments, StatefulSets, DaemonSetsReplica counts, image versions, rollout status
CronJobs, JobsSchedule, last run, completion status
Services, IngressesEndpoints, ports, hostnames
PodsContainer states, restart counts, QoS class
EventsOOM kills, scheduling failures, probe failures (Warning-level only)
NamespacesLabels for automatic team ownership inference
NetworkPoliciesIngress/egress rules for security posture visibility

When GitOps discovery is enabled, the agent also watches:

ToolCRDs watched
ArgoCDApplication (argoproj.io/v1alpha1)
FluxCDKustomization (kustomize.toolkit.fluxcd.io/v1), HelmRelease (helm.toolkit.fluxcd.io/v2)

If metrics-server is deployed in the cluster, the agent samples CPU and memory usage every 5 minutes and maintains a 7-day rolling window. Summaries (p95, max, avg) are pushed alongside workload data, giving optimization recommendations in the Shoehorn UI without any extra tooling.

Before deploying the agent you need an agent token. Register the cluster in the Shoehorn UI under Admin > K8s Agents, or via the API. The token is shown once at registration time. See Registering Clusters for the full workflow.

By default, the agent watches all namespaces. For large clusters, consider excluding system namespaces to reduce noise:

  • kube-system, kube-public, kube-node-lease
  • cert-manager, ingress-nginx, or other infrastructure namespaces

You can also use a namespace whitelist or a label selector to restrict discovery to specific workloads. Configure these in the Helm values.

The agent supports leader election for HA deployments. When running multiple replicas, exactly one pod is the active leader. The leader watches Kubernetes resources and pushes data. The remaining pods are followers.

Followers pass readiness probes and remain in the Service’s endpoint set, ready for immediate failover if the leader stops. Followers do not push data, so they are not subject to the leader’s 5-minute success-based degradation check. If the leader dies, a new leader is elected within seconds and begins pushing.

For production clusters, running 2-3 replicas with a PodDisruptionBudget is recommended. The Helm chart supports this out of the box.

The agent validates all configuration at startup. If an environment variable is set to an unparseable value (for example, a duration like 30secods instead of 30s, or a non-numeric batch size), the agent refuses to start and logs every invalid value. This prevents silent misconfiguration in unattended deployments.

The agent applies defense-in-depth:

  • Read-only RBAC — the Helm chart creates a ClusterRole limited to get/list/watch. No access to Secrets or ConfigMaps.
  • Non-root container — runs as UID 1000, drops all capabilities, read-only root filesystem, seccomp RuntimeDefault.
  • HTTPS recommended — the agent warns at startup if the API endpoint uses http://. TLS 1.2 is the minimum supported version.
  • Token redaction — the API token is wrapped in a type that prevents it from appearing in logs, stack traces, or fmt output.
  • No redirect following — the HTTP client does not follow redirects, preventing Bearer token leakage to redirect targets.
  • Annotation sanitizationkubectl.kubernetes.io/last-applied-configuration is stripped before pushing to prevent leaking secrets embedded in environment variables.
  • Error body sanitization — API error responses are sanitized before logging.

The agent is lightweight. Typical resource consumption for a cluster with a few hundred workloads:

RequestLimit
CPU50m200m
Memory64Mi128Mi

The agent reads Kubernetes annotations and namespace labels to enrich entities in the catalog. You can go from zero-config discovery to fully annotated entities without changing the agent’s configuration. See Entity Enrichment and Annotations Reference.

The network observer is a separate component that consumes network flow data from Cilium Hubble Relay. It maps connections between workloads and pushes topology data to the Shoehorn API. See Network Observer for details.

The agent exposes health endpoints for Kubernetes probes. See Agent Health and Readiness for the full model, including leader vs. follower behavior.

See the shoehorn-dev/helm-charts repository for the Helm chart, values reference, and installation instructions.