Distributed Tracing
Shoehorn supports distributed tracing via OpenTelemetry, with Jaeger as the default trace backend.
Enabling Tracing
Section titled “Enabling Tracing”# Helm valuesglobal: tracing: enabled: true otlpEndpoint: "jaeger:4317" sampleRate: 1.0 # 1.0 = 100% of requests tracedOr via environment variables:
TRACING_ENABLED=trueOTLP_ENDPOINT=jaeger:4317TRACING_SAMPLE_RATE=1.0Accessing Jaeger
Section titled “Accessing Jaeger”When deployed, Jaeger UI is available at:
http://jaeger.shoehorn.example.comOr via port-forward:
kubectl port-forward -n monitoring svc/jaeger-query 16686:16686What’s Traced
Section titled “What’s Traced”Traces follow requests across Shoehorn microservices:
Browser -> API Gateway -> gRPC call -> Microservice -> Database │ │ └── trace_id propagated ────────────────┘Each span includes:
- Service name
- Operation (HTTP path or gRPC method)
- Duration
- Status code
- Custom attributes (tenant_id, user_id)
Trace Propagation
Section titled “Trace Propagation”Traces are propagated between services using the W3C Trace Context standard (traceparent header). All gRPC calls between Shoehorn services automatically propagate trace context.
Sample Rate
Section titled “Sample Rate”For production deployments with high traffic, reduce the sample rate:
TRACING_SAMPLE_RATE=0.1 # 10% of requestsThis reduces overhead while still providing representative trace data.
Troubleshooting with Traces
Section titled “Troubleshooting with Traces”- Open Jaeger UI
- Select the service (e.g.,
shoehorn-api) - Search by trace ID, operation, or time range
- Click a trace to see the full request flow
- Identify slow spans or errors in the waterfall view