Monitoring Overview
Shoehorn provides built-in observability with Prometheus metrics, Grafana dashboards, and Jaeger distributed tracing.
Monitoring Stack
Section titled “Monitoring Stack”| Component | Purpose | Default Port |
|---|---|---|
| Prometheus | Metrics collection and alerting | 9090 |
| Grafana | Dashboards and visualization | 3000 |
| Jaeger | Distributed tracing | 16686 |
Enabling Monitoring
Section titled “Enabling Monitoring”In your Helm values:
monitoring: enabled: true serviceMonitor: enabled: true prometheusRule: enabled: true
global: tracing: enabled: true otlpEndpoint: "jaeger:4317" sampleRate: 1.0Service Health
Section titled “Service Health”All Shoehorn platform services expose a /health endpoint:
curl http://localhost:8080/health# {"status": "healthy", "version": "0.7.0"}| Status | HTTP Code | Meaning |
|---|---|---|
healthy | 200 | All dependencies reachable |
unhealthy | 503 | One or more dependencies failed |
K8s Agent Health
Section titled “K8s Agent Health”The K8s agent exposes its own health endpoints on port 8080 (configurable):
| Endpoint | Purpose |
|---|---|
/healthz | Liveness probe — is the process alive |
/readyz | Readiness probe — is the agent functioning |
/livez | Alias for /healthz |
/metrics | Prometheus metrics |
Readiness behavior differs between leader and follower pods in HA deployments. Leaders report degraded if no successful push or heartbeat has occurred in 5 minutes. Followers remain ready indefinitely as standby replicas. See Agent Health and Readiness for the full model.
Metrics Endpoint
Section titled “Metrics Endpoint”All services expose Prometheus metrics at /metrics (port 9090 by default):
curl http://localhost:9090/metricsArchitecture
Section titled “Architecture”Services ──> /metrics ──> Prometheus ──> Grafana | | +──> OTLP traces ──> Jaeger ──────────+