K8s Agent
The Shoehorn K8s Agent runs inside each Kubernetes cluster you want to monitor. It watches workloads via Kubernetes informers, transforms them into catalog entities, and pushes the results to the Shoehorn API over HTTPS. No inbound connectivity is required — the agent is entirely push-based.
Install the agent with the official Helm chart from shoehorn-dev/helm-charts. The chart handles RBAC, service accounts, secrets, health probes, and leader election configuration.
What the Agent Does
Section titled “What the Agent Does”The agent watches the Kubernetes API server through informers and reports changes as they happen:
| Resource | What is collected |
|---|---|
| Deployments, StatefulSets, DaemonSets | Replica counts, image versions, rollout status |
| CronJobs, Jobs | Schedule, last run, completion status |
| Services, Ingresses | Endpoints, ports, hostnames |
| Pods | Container states, restart counts, QoS class |
| Events | OOM kills, scheduling failures, probe failures (Warning-level only) |
| Namespaces | Labels for automatic team ownership inference |
| NetworkPolicies | Ingress/egress rules for security posture visibility |
When GitOps discovery is enabled, the agent also watches:
| Tool | CRDs watched |
|---|---|
| ArgoCD | Application (argoproj.io/v1alpha1) |
| FluxCD | Kustomization (kustomize.toolkit.fluxcd.io/v1), HelmRelease (helm.toolkit.fluxcd.io/v2) |
If metrics-server is deployed in the cluster, the agent samples CPU and memory usage every 5 minutes and maintains a 7-day rolling window. Summaries (p95, max, avg) are pushed alongside workload data, giving optimization recommendations in the Shoehorn UI without any extra tooling.
Things to Consider
Section titled “Things to Consider”Cluster Registration
Section titled “Cluster Registration”Before deploying the agent you need an agent token. Register the cluster in the Shoehorn UI under Admin > K8s Agents, or via the API. The token is shown once at registration time. See Registering Clusters for the full workflow.
Namespace Filtering
Section titled “Namespace Filtering”By default, the agent watches all namespaces. For large clusters, consider excluding system namespaces to reduce noise:
kube-system,kube-public,kube-node-leasecert-manager,ingress-nginx, or other infrastructure namespaces
You can also use a namespace whitelist or a label selector to restrict discovery to specific workloads. Configure these in the Helm values.
High Availability
Section titled “High Availability”The agent supports leader election for HA deployments. When running multiple replicas, exactly one pod is the active leader. The leader watches Kubernetes resources and pushes data. The remaining pods are followers.
Followers pass readiness probes and remain in the Service’s endpoint set, ready for immediate failover if the leader stops. Followers do not push data, so they are not subject to the leader’s 5-minute success-based degradation check. If the leader dies, a new leader is elected within seconds and begins pushing.
For production clusters, running 2-3 replicas with a PodDisruptionBudget is recommended. The Helm chart supports this out of the box.
Configuration Validation
Section titled “Configuration Validation”The agent validates all configuration at startup. If an environment variable is set to an unparseable value (for example, a duration like 30secods instead of 30s, or a non-numeric batch size), the agent refuses to start and logs every invalid value. This prevents silent misconfiguration in unattended deployments.
Security
Section titled “Security”The agent applies defense-in-depth:
- Read-only RBAC — the Helm chart creates a ClusterRole limited to get/list/watch. No access to Secrets or ConfigMaps.
- Non-root container — runs as UID 1000, drops all capabilities, read-only root filesystem, seccomp RuntimeDefault.
- HTTPS recommended — the agent warns at startup if the API endpoint uses
http://. TLS 1.2 is the minimum supported version. - Token redaction — the API token is wrapped in a type that prevents it from appearing in logs, stack traces, or fmt output.
- No redirect following — the HTTP client does not follow redirects, preventing Bearer token leakage to redirect targets.
- Annotation sanitization —
kubectl.kubernetes.io/last-applied-configurationis stripped before pushing to prevent leaking secrets embedded in environment variables. - Error body sanitization — API error responses are sanitized before logging.
Resource Usage
Section titled “Resource Usage”The agent is lightweight. Typical resource consumption for a cluster with a few hundred workloads:
| Request | Limit | |
|---|---|---|
| CPU | 50m | 200m |
| Memory | 64Mi | 128Mi |
Entity Enrichment
Section titled “Entity Enrichment”The agent reads Kubernetes annotations and namespace labels to enrich entities in the catalog. You can go from zero-config discovery to fully annotated entities without changing the agent’s configuration. See Entity Enrichment and Annotations Reference.
Network Observer (Optional)
Section titled “Network Observer (Optional)”The network observer is a separate component that consumes network flow data from Cilium Hubble Relay. It maps connections between workloads and pushes topology data to the Shoehorn API. See Network Observer for details.
Health Endpoints
Section titled “Health Endpoints”The agent exposes health endpoints for Kubernetes probes. See Agent Health and Readiness for the full model, including leader vs. follower behavior.
Installation
Section titled “Installation”See the shoehorn-dev/helm-charts repository for the Helm chart, values reference, and installation instructions.