Skip to content

Monitoring Overview

Shoehorn provides built-in observability with Prometheus metrics, Grafana dashboards, and Jaeger distributed tracing.

ComponentPurposeDefault Port
PrometheusMetrics collection and alerting9090
GrafanaDashboards and visualization3000
JaegerDistributed tracing16686

In your Helm values:

monitoring:
enabled: true
serviceMonitor:
enabled: true
prometheusRule:
enabled: true
global:
tracing:
enabled: true
otlpEndpoint: "jaeger:4317"
sampleRate: 1.0

Shoehorn platform services (api, worker, eventbus, crawler) expose /healthz for liveness and /readiness for dependency checks. auth-proxy keeps /health as a compatibility alias.

Terminal window
curl http://localhost:8080/healthz
# {"status": "healthy", "version": "0.7.0"}
EndpointHTTP CodeMeaning
/healthz200Process is alive
/readiness200All dependencies reachable
/readiness503One or more dependencies failed

The K8s agent exposes its own health endpoints on port 8080 (configurable):

EndpointPurpose
/healthzLiveness probe — is the process alive
/readyzReadiness probe — is the agent functioning
/livezAlias for /healthz
/metricsPrometheus metrics

Readiness behavior differs between leader and follower pods in HA deployments. Leaders report degraded if no successful push or heartbeat has occurred in 5 minutes. Followers remain ready indefinitely as standby replicas. See Agent Health and Readiness for the full model.

All services expose Prometheus metrics at /metrics (port 9090 by default):

Terminal window
curl http://localhost:9090/metrics
Services ──> /metrics ──> Prometheus ──> Grafana
| |
+──> OTLP traces ──> Jaeger ──────────+