K8s Agent

The Shoehorn K8s Agent runs inside each Kubernetes cluster you want to monitor. It watches workloads via Kubernetes informers, transforms them into catalog entities, and pushes the results to the Shoehorn API over HTTPS. No inbound connectivity is required — the agent is entirely push-based.

Install the agent with the official Helm chart from shoehorn-dev/helm-charts. The chart handles RBAC, service accounts, secrets, health probes, and leader election configuration.

What the Agent Does

The agent watches the Kubernetes API server through informers and reports changes as they happen:

Resource	What is collected
Deployments, StatefulSets, DaemonSets	Replica counts, image versions, rollout status
CronJobs, Jobs	Schedule, last run, completion status
Services, Ingresses	Endpoints, ports, hostnames
Pods	Container states, restart counts, QoS class
Events	OOM kills, scheduling failures, probe failures (Warning-level only)
Namespaces	Labels for automatic team ownership inference
NetworkPolicies	Ingress/egress rules for security posture visibility

When GitOps discovery is enabled, the agent also watches:

Tool	CRDs watched
ArgoCD	`Application` (`argoproj.io/v1alpha1`)
FluxCD	`Kustomization` (`kustomize.toolkit.fluxcd.io/v1`), `HelmRelease` (`helm.toolkit.fluxcd.io/v2`)

If metrics-server is deployed in the cluster, the agent samples CPU and memory usage every 5 minutes and maintains a 7-day rolling window. Summaries (p95, max, avg) are pushed alongside workload data, giving optimization recommendations in the Shoehorn UI without any extra tooling.

Things to Consider

Cluster Registration

Before deploying the agent you need an agent token. Register the cluster in the Shoehorn UI under Admin > K8s Agents, or via the API. The token is shown once at registration time. See Registering Clusters for the full workflow.

Namespace Filtering

By default, the agent watches all namespaces. For large clusters, consider excluding system namespaces to reduce noise:

kube-system, kube-public, kube-node-lease
cert-manager, ingress-nginx, or other infrastructure namespaces

You can also use a namespace whitelist or a label selector to restrict discovery to specific workloads. Configure these in the Helm values.

High Availability

The agent supports leader election for HA deployments. When running multiple replicas, exactly one pod is the active leader. The leader watches Kubernetes resources and pushes data. The remaining pods are followers.

Followers pass readiness probes and remain in the Service’s endpoint set, ready for immediate failover if the leader stops. Followers do not push data, so they are not subject to the leader’s 5-minute success-based degradation check. If the leader dies, a new leader is elected within seconds and begins pushing.

For production clusters, running 2-3 replicas with a PodDisruptionBudget is recommended. The Helm chart supports this out of the box.

Configuration Validation

The agent validates all configuration at startup. If an environment variable is set to an unparseable value (for example, a duration like 30secods instead of 30s, or a non-numeric batch size), the agent refuses to start and logs every invalid value. This prevents silent misconfiguration in unattended deployments.

Security

The agent applies defense-in-depth:

Read-only RBAC — the Helm chart creates a ClusterRole limited to get/list/watch. No access to Secrets or ConfigMaps.
Non-root container — runs as UID 1000, drops all capabilities, read-only root filesystem, seccomp RuntimeDefault.
HTTPS recommended — the agent warns at startup if the API endpoint uses http://. TLS 1.2 is the minimum supported version.
Token redaction — the API token is wrapped in a type that prevents it from appearing in logs, stack traces, or fmt output.
No redirect following — the HTTP client does not follow redirects, preventing Bearer token leakage to redirect targets.
Annotation sanitization — kubectl.kubernetes.io/last-applied-configuration is stripped before pushing to prevent leaking secrets embedded in environment variables.
Error body sanitization — API error responses are sanitized before logging.

Resource Usage

The agent is lightweight. Typical resource consumption for a cluster with a few hundred workloads:

	Request	Limit
CPU	50m	200m
Memory	64Mi	128Mi

Entity Enrichment

The agent reads Kubernetes annotations and namespace labels to enrich entities in the catalog. You can go from zero-config discovery to fully annotated entities without changing the agent’s configuration. See Entity Enrichment and Annotations Reference.

Network Observer (Optional)

The network observer is a separate component that consumes network flow data from Cilium Hubble Relay. It maps connections between workloads and pushes topology data to the Shoehorn API. See Network Observer for details.

Health Endpoints

The agent exposes health endpoints for Kubernetes probes. See Agent Health and Readiness for the full model, including leader vs. follower behavior.

Installation

See the shoehorn-dev/helm-charts repository for the Helm chart, values reference, and installation instructions.