Monitoring Overview

Shoehorn provides built-in observability with Prometheus metrics, Grafana dashboards, and Jaeger distributed tracing.

Monitoring Stack

Component	Purpose	Default Port
Prometheus	Metrics collection and alerting	9090
Grafana	Dashboards and visualization	3000
Jaeger	Distributed tracing	16686

Enabling Monitoring

In your Helm values:

monitoring:
  enabled: true
  serviceMonitor:
    enabled: true
  prometheusRule:
    enabled: true

global:
  tracing:
    enabled: true
    otlpEndpoint: "jaeger:4317"
    sampleRate: 1.0

Service Health

Shoehorn platform services (api, worker, eventbus, crawler) expose /healthz for liveness and /readiness for dependency checks. auth-proxy keeps /health as a compatibility alias.

curl http://localhost:8080/healthz
# {"status": "healthy", "version": "0.7.0"}

Endpoint	HTTP Code	Meaning
`/healthz`	200	Process is alive
`/readiness`	200	All dependencies reachable
`/readiness`	503	One or more dependencies failed

K8s Agent Health

The K8s agent exposes its own health endpoints on port 8080 (configurable):

Endpoint	Purpose
`/healthz`	Liveness probe — is the process alive
`/readyz`	Readiness probe — is the agent functioning
`/livez`	Alias for `/healthz`
`/metrics`	Prometheus metrics

Readiness behavior differs between leader and follower pods in HA deployments. Leaders report degraded if no successful push or heartbeat has occurred in 5 minutes. Followers remain ready indefinitely as standby replicas. See Agent Health and Readiness for the full model.

Metrics Endpoint

All services expose Prometheus metrics at /metrics (port 9090 by default):

curl http://localhost:9090/metrics

Architecture

Services ──> /metrics ──> Prometheus ──> Grafana
    |                                      |
    +──> OTLP traces ──> Jaeger ──────────+