Skip to content

OpenTelemetry & Distributed Tracing

See the full journey of a request across microservices with OpenTelemetry and Jaeger distributed tracing.

Time: ~20 minutes Difficulty: Intermediate

  • The three pillars of observability: metrics (Prometheus), logs (EFK), traces (this lab)
  • What a distributed trace is and why it matters for microservices
  • How spans connect to form a trace across service boundaries
  • Using the Jaeger UI to find slow services and debug request flows
  • How OpenTelemetry standardizes telemetry collection

Tracing completes the observability triad: metrics (Lab 22), logs (Lab 35), and traces (this lab). See the deep-dive for why tracing matters and how it differs from metrics and logs.

Navigate to the demo directory:

Terminal window
cd demos/opentelemetry-tracing

Deploy the resources:

Terminal window
kubectl apply -f manifests/namespace.yaml
kubectl apply -f manifests/

Wait for pods to be ready:

Terminal window
kubectl wait --for=condition=ready pod -l app=jaeger -n otel-demo --timeout=120s
kubectl wait --for=condition=ready pod -l app=hotrod -n otel-demo --timeout=60s

Open two port-forwards in separate terminals:

Terminal window
# Terminal 1: Jaeger UI
kubectl port-forward svc/jaeger 16686:16686 -n otel-demo
# Terminal 2: HotROD app
kubectl port-forward svc/hotrod 8080:8080 -n otel-demo

Open http://localhost:8080. This is the HotROD ride-sharing demo app.

Click any of the four customer buttons (Rachel, Trom, Japanese, Antonio). Each click simulates a ride request that flows through 4 microservices, generating a distributed trace.

Now open http://localhost:16686. This is Jaeger.

  1. Select service frontend from the dropdown
  2. Click “Find Traces”
  3. Click on any trace to see the full span tree

You should see:

  • The frontend receiving the HTTP request
  • The frontend calling the customer service
  • The customer service calling the driver service
  • The driver service querying for available drivers with parallel requests
  • Each span shows its duration, so you can immediately see which service is slow
manifests/
namespace.yaml # otel-demo namespace
jaeger.yaml # Jaeger all-in-one: UI + collector + in-memory storage
hotrod.yaml # HotROD: 4 microservices in one binary, pre-instrumented

Trace flow:

User clicks button
-> HotROD frontend (creates root span)
-> customer service (child span)
-> driver service (child span)
-> route calculation (child span, adds latency)
-> All spans sent to Jaeger via OTLP (port 4318)
-> Jaeger stores and visualizes the trace

Each span carries a trace ID and parent span ID. This is how Jaeger reconstructs the full request tree. The trace context propagates via HTTP headers (W3C traceparent format).

The HotROD app is already instrumented with OpenTelemetry SDKs. In production, you would add the OTEL SDK to your application code and configure it to send traces to a collector.

  1. Click multiple customer buttons rapidly and then search for traces in Jaeger. Filter by:

    • Service name (frontend, customer, driver, route)
    • Min duration (find slow requests)
    • Tags (click “Tags” in the search to filter by HTTP method, status code, error state)
  2. In the Jaeger UI, click “System Architecture” (or “Dependencies” depending on version) to see a service dependency graph built automatically from trace data.

  3. Look at the span details: click any span in a trace view. You will see tags (http.method, http.status_code) and logs (events that happened during the span).

  4. Compare two traces side-by-side: select two traces from the search results and click “Compare Traces” to see how request times differ between calls.

  5. Simulate errors: some HotROD requests randomly fail. Generate more traces and look for error traces in Jaeger. Error spans show up in red and have an error=true tag.

  6. Check the trace timeline: traces show you not just total duration but also which operations happened in parallel vs sequentially. This is impossible to get from logs alone.

Terminal window
kubectl delete namespace otel-demo

See docs/deep-dive.md for a detailed explanation of the three pillars of observability, how distributed tracing works under the hood, trace context propagation (W3C traceparent headers), the OpenTelemetry ecosystem (SDKs, Collector, protocols), Jaeger architecture, sampling strategies, and when to use tracing in production.

Congratulations! You have completed all 51 labs in the K8s Learn by Doing series. Return to the main README to explore other learning tracks or revisit concepts.