Distributed Tracing

Shoehorn supports distributed tracing via OpenTelemetry, with Jaeger as the default trace backend.

Enabling Tracing

# Helm values
global:
  tracing:
    enabled: true
    otlpEndpoint: "jaeger:4317"
    sampleRate: 1.0  # 1.0 = 100% of requests traced

Or via environment variables:

TRACING_ENABLED=true
OTLP_ENDPOINT=jaeger:4317
TRACING_SAMPLE_RATE=1.0

Accessing Jaeger

When deployed, Jaeger UI is available at:

http://jaeger.shoehorn.example.com

Or via port-forward:

kubectl port-forward -n monitoring svc/jaeger-query 16686:16686

What’s Traced

Traces follow requests across Shoehorn microservices:

Browser -> API Gateway -> gRPC call -> Microservice -> Database
   │                                       │
   └── trace_id propagated ────────────────┘

Each span includes:

Service name
Operation (HTTP path or gRPC method)
Duration
Status code
Custom attributes (tenant_id, user_id)

Trace Propagation

Traces are propagated between services using the W3C Trace Context standard (traceparent header). All gRPC calls between Shoehorn services automatically propagate trace context.

Sample Rate

For production deployments with high traffic, reduce the sample rate:

TRACING_SAMPLE_RATE=0.1  # 10% of requests

This reduces overhead while still providing representative trace data.

Troubleshooting with Traces

Open Jaeger UI
Select the service (e.g., shoehorn-api)
Search by trace ID, operation, or time range
Click a trace to see the full request flow
Identify slow spans or errors in the waterfall view