Skip to content

Monitoring Overview

Shoehorn provides built-in observability with Prometheus metrics, Grafana dashboards, and Jaeger distributed tracing.

ComponentPurposeDefault Port
PrometheusMetrics collection and alerting9090
GrafanaDashboards and visualization3000
JaegerDistributed tracing16686

In your Helm values:

monitoring:
enabled: true
serviceMonitor:
enabled: true
prometheusRule:
enabled: true
global:
tracing:
enabled: true
otlpEndpoint: "jaeger:4317"
sampleRate: 1.0

All Shoehorn platform services expose a /health endpoint:

Terminal window
curl http://localhost:8080/health
# {"status": "healthy", "version": "0.7.0"}
StatusHTTP CodeMeaning
healthy200All dependencies reachable
unhealthy503One or more dependencies failed

The K8s agent exposes its own health endpoints on port 8080 (configurable):

EndpointPurpose
/healthzLiveness probe — is the process alive
/readyzReadiness probe — is the agent functioning
/livezAlias for /healthz
/metricsPrometheus metrics

Readiness behavior differs between leader and follower pods in HA deployments. Leaders report degraded if no successful push or heartbeat has occurred in 5 minutes. Followers remain ready indefinitely as standby replicas. See Agent Health and Readiness for the full model.

All services expose Prometheus metrics at /metrics (port 9090 by default):

Terminal window
curl http://localhost:9090/metrics
Services ──> /metrics ──> Prometheus ──> Grafana
| |
+──> OTLP traces ──> Jaeger ──────────+