Skip to content

Distributed Tracing

Shoehorn supports distributed tracing via OpenTelemetry, with Jaeger as the default trace backend.

# Helm values
global:
tracing:
enabled: true
otlpEndpoint: "jaeger:4317"
sampleRate: 1.0 # 1.0 = 100% of requests traced

Or via environment variables:

Terminal window
TRACING_ENABLED=true
OTLP_ENDPOINT=jaeger:4317
TRACING_SAMPLE_RATE=1.0

When deployed, Jaeger UI is available at:

http://jaeger.shoehorn.example.com

Or via port-forward:

Terminal window
kubectl port-forward -n monitoring svc/jaeger-query 16686:16686

Traces follow requests across Shoehorn microservices:

Browser -> API Gateway -> gRPC call -> Microservice -> Database
│ │
└── trace_id propagated ────────────────┘

Each span includes:

  • Service name
  • Operation (HTTP path or gRPC method)
  • Duration
  • Status code
  • Custom attributes (tenant_id, user_id)

Traces are propagated between services using the W3C Trace Context standard (traceparent header). All gRPC calls between Shoehorn services automatically propagate trace context.

For production deployments with high traffic, reduce the sample rate:

Terminal window
TRACING_SAMPLE_RATE=0.1 # 10% of requests

This reduces overhead while still providing representative trace data.

  1. Open Jaeger UI
  2. Select the service (e.g., shoehorn-api)
  3. Search by trace ID, operation, or time range
  4. Click a trace to see the full request flow
  5. Identify slow spans or errors in the waterfall view