Istio Service Mesh: Deep Dive

This document explains why service meshes exist, how Istio implements a mesh using control and data planes, and when to use Istio versus alternatives like Linkerd or no mesh at all. It connects the demo manifests to the underlying Envoy proxy configuration and traffic management patterns.

Why Service Meshes Exist

Before service meshes, each application had to implement observability, security, and traffic management on its own.

You needed distributed tracing. Every service added instrumentation libraries. You wanted mutual TLS. Every service managed certificates. You needed retries and circuit breaking. Every service added a resilience library. You wanted traffic splitting for canary deployments. You modified your load balancer config or deployed a separate A/B testing framework.

This approach has problems. Every team implements these features differently. Libraries drift across languages (Go, Python, Java, Rust). Updating a policy requires redeploying every service. Security is inconsistent because not every team gets it right.

A service mesh moves these concerns out of application code and into infrastructure. Instead of each service implementing mTLS, the mesh does it transparently. Instead of application-level retries, the proxy handles it. Instead of custom code for traffic splitting, you configure it in a VirtualService.

The application just sends HTTP requests to a service name. The mesh handles the rest.

How Istio Works: Control Plane and Data Plane

Istio has two parts: the control plane (istiod) and the data plane (Envoy sidecar proxies).

Control Plane: istiod

The control plane is a single binary called istiod. It replaced three separate components (Pilot, Citadel, Galley) in Istio 1.5. It runs in the istio-system namespace and handles three jobs.

Configuration distribution. You create Istio CRDs like VirtualService, DestinationRule, and Gateway. istiod watches these resources and translates them into Envoy proxy configuration. It pushes this configuration to every Envoy sidecar via the xDS protocol (Cluster Discovery Service, Route Discovery Service, Listener Discovery Service, etc.).

Certificate authority. istiod is a certificate authority for the mesh. It generates x.509 certificates for every workload and rotates them automatically. These certificates are used for mutual TLS between services.

Service discovery. istiod watches Kubernetes Service and Endpoint resources. It tells Envoy proxies which pods back each service. When a pod restarts and gets a new IP, istiod updates the proxy configuration without the application knowing anything changed.

Data Plane: Envoy Sidecars

Every pod in the mesh gets an Envoy proxy injected as a sidecar container. This happens automatically when you label the namespace:

# From manifests/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: istio-demo
  labels:
    istio-injection: enabled

The istio-injection: enabled label tells the Istio mutating admission webhook to inject a sidecar into every pod created in this namespace.

The Envoy sidecar intercepts all inbound and outbound traffic. It does this by configuring iptables rules in the pod’s network namespace. Traffic to port 80 (or any application port) is redirected to the Envoy listener. Envoy applies routing, retries, load balancing, and mTLS, then forwards the traffic to the real application container.

From the application’s perspective, nothing changes. It listens on port 80 and makes HTTP calls to service names. The sidecar makes those calls resilient, observable, and secure.

How a Request Flows Through the Mesh

Frontend pod                          Backend pod
 ┌────────────────────┐               ┌────────────────────┐
 │ nginx container    │               │ httpbin container  │
 │   localhost:80     │               │   localhost:80     │
 └──────┬─────────────┘               └─────────┬──────────┘
        │                                       ▲
        │ outbound to backend:80                │
        ▼                                       │
 ┌────────────────────┐               ┌────────┴───────────┐
 │ Envoy sidecar      │──────────────>│ Envoy sidecar      │
 │  - mTLS encrypt    │   mTLS conn   │  - mTLS decrypt    │
 │  - load balance    │               │  - route to :80    │
 │  - apply routing   │               │  - record metrics  │
 └────────────────────┘               └────────────────────┘

The frontend container calls http://backend:80.
Iptables redirects this to the outbound Envoy listener.
Envoy looks up routing rules (VirtualService) and applies them.
Envoy establishes an mTLS connection to the backend’s Envoy sidecar.
The backend’s Envoy sidecar decrypts the traffic and forwards it to the backend container on localhost:80.
The response flows back through the same path.

The application sees a plain HTTP call. The mesh sees encrypted, routed, load-balanced, traced traffic.

Key Istio Resources, Field by Field

DestinationRule: Traffic Policies and Subsets

# From manifests/destination-rule.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: backend
  namespace: istio-demo
spec:
  host: backend
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2

spec.host

The Kubernetes service name this rule applies to. Here it is backend, which resolves to the ClusterIP service at backend.istio-demo.svc.cluster.local.

spec.trafficPolicy.tls.mode

Controls how Envoy handles TLS for connections to this service.

ISTIO_MUTUAL: Use mutual TLS with Istio-generated certificates. Both client and server authenticate each other. This is the recommended mode when both sides are in the mesh.
SIMPLE: Use TLS, but only the server presents a certificate (like HTTPS). The client does not authenticate.
DISABLE: Do not use TLS. Plain HTTP. Only use this for testing or when calling services outside the mesh.
MUTUAL: Use mutual TLS, but bring your own certificates. You specify the CA and client certs.

In this demo, ISTIO_MUTUAL means Envoy encrypts all traffic to the backend service using certificates issued by istiod.

spec.subsets

Subsets partition the service’s endpoints by labels. This demo has two versions of the backend:

metadata:
  labels:
    app: backend
    version: v1

# manifests/deployment-backend-v2.yaml
metadata:
  labels:
    app: backend
    version: v2

Both deployments match the backend Service selector (app: backend). Without subsets, traffic is distributed evenly across all pods. Subsets let you target a specific version in routing rules.

You can also apply subset-specific traffic policies. For example:

subsets:
  - name: v1
    labels:
      version: v1
    trafficPolicy:
      connectionPool:
        tcp:
          maxConnections: 10

This limits connections to v1 pods only.

VirtualService: Routing Rules

# From manifests/virtual-service.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: backend
  namespace: istio-demo
spec:
  hosts:
    - backend
  http:
    - match:
        - headers:
            version:
              exact: v2
      route:
        - destination:
            host: backend
            subset: v2
    - route:
        - destination:
            host: backend
            subset: v1
          weight: 80
        - destination:
            host: backend
            subset: v2
          weight: 20

spec.hosts

The service name this VirtualService applies to. When a client calls http://backend, Envoy checks if there is a VirtualService for backend and applies the routing rules.

spec.http

A list of HTTP routing rules. Envoy evaluates them in order. The first matching rule wins.

First rule: If the request has a header version: v2, route 100% of traffic to the v2 subset. This is a header-based routing rule, useful for testing a new version with specific requests.

Second rule: If no header matches, route 80% of traffic to v1 and 20% to v2. This is a weighted routing rule for canary deployments. You can gradually shift traffic by changing the weights.

Fault Injection

VirtualServices also support fault injection for chaos engineering:

http:
  - fault:
      delay:
        percentage:
          value: 50.0
        fixedDelay: 5s
    route:
      - destination:
          host: backend
          subset: v1

This injects a 5-second delay on 50% of requests to v1. It tests how the frontend handles slow backends.

You can also inject errors:

fault:
  abort:
    percentage:
      value: 10.0
    httpStatus: 500

This returns HTTP 500 on 10% of requests, simulating backend failures.

Retries and Timeouts

http:
  - route:
      - destination:
          host: backend
    retries:
      attempts: 3
      perTryTimeout: 2s
    timeout: 10s

Envoy retries failed requests up to 3 times, with a 2-second timeout per attempt. The entire operation must complete within 10 seconds.

Gateway: Ingress Traffic

# From manifests/gateway.yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: istio-gateway
  namespace: istio-demo
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"

spec.selector

This Gateway binds to Envoy instances with the label istio: ingressgateway. Istio installs a gateway deployment called istio-ingressgateway in the istio-system namespace. It is a standalone Envoy proxy (not a sidecar) that handles ingress traffic.

spec.servers

A list of ports and protocols the Gateway listens on. This Gateway listens on port 80 for HTTP traffic. It accepts requests for any hostname (*).

For HTTPS, you add TLS configuration:

servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: my-tls-cert
    hosts:
      - "example.com"

The credentialName points to a Kubernetes Secret containing the TLS certificate.

A Gateway alone does nothing. You pair it with a VirtualService that binds to the Gateway:

# From manifests/gateway.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: frontend-gateway
  namespace: istio-demo
spec:
  hosts:
    - "*"
  gateways:
    - istio-gateway
  http:
    - match:
        - uri:
            prefix: /
      route:
        - destination:
            host: frontend
            port:
              number: 80

This VirtualService routes all traffic arriving at the istio-gateway Gateway to the frontend service.

Traffic Management Patterns

Canary Deployments

Deploy a new version alongside the old one. Route a small percentage of traffic to the new version. Monitor error rates and latency. If the new version is healthy, shift more traffic. If it fails, roll back by changing the weights.

route:
  - destination:
      host: backend
      subset: v1
    weight: 90
  - destination:
      host: backend
      subset: v2
    weight: 10

Start with 10% to v2. Increase to 50%, then 100% as confidence grows.

Blue-Green Deployments

Deploy the new version (green) but send no traffic to it. Run smoke tests against green. When green is verified, switch 100% of traffic from blue to green in one atomic change.

route:
  - destination:
      host: backend
      subset: v2  # green
    weight: 100

If green fails, switch back to blue by changing the subset to v1.

Dark Traffic (Mirroring)

Send a copy of production traffic to a new version without affecting the response. The client only sees the response from the live version. The mirrored version processes the request, and you observe logs and metrics.

http:
  - route:
      - destination:
          host: backend
          subset: v1
    mirror:
      host: backend
      subset: v2
    mirrorPercentage:
      value: 100

All traffic goes to v1 for the real response. A copy also goes to v2 for testing. This is safer than canary because v2 errors do not affect users.

Circuit Breaking

Limit the number of concurrent connections and requests to prevent cascading failures.

trafficPolicy:
  connectionPool:
    tcp:
      maxConnections: 10
    http:
      http1MaxPendingRequests: 1
      maxRequestsPerConnection: 1
  outlierDetection:
    consecutiveErrors: 5
    interval: 30s
    baseEjectionTime: 30s
    maxEjectionPercent: 50

connectionPool limits how many connections Envoy opens to the backend. Requests exceeding this limit get an immediate 503.

outlierDetection removes unhealthy pods from the load balancing pool. If a pod returns 5 consecutive errors, it is ejected for 30 seconds. Up to 50% of pods can be ejected at once (to prevent ejecting the entire backend).

Mutual TLS (mTLS)

How mTLS Works in Istio

When two services communicate, their Envoy sidecars establish a TLS connection. Both sides present x.509 certificates issued by istiod. Envoy validates the peer certificate against the Istio CA. If valid, the connection is encrypted and authenticated.

The application code sends plain HTTP. Envoy upgrades it to mTLS transparently.

STRICT vs PERMISSIVE Modes

Istio supports two mTLS modes at the mesh level.

PERMISSIVE (default): Accept both plain HTTP and mTLS. This is for migrating services into the mesh. Services without sidecars send plain HTTP. Services with sidecars send mTLS. Both work.

STRICT: Only accept mTLS. Reject plain HTTP connections. This enforces that all traffic in the mesh is encrypted.

You configure this with a PeerAuthentication resource:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT

This applies mesh-wide. You can also set it per namespace or per workload.

The DestinationRule in this demo sets mode: ISTIO_MUTUAL, which means the client side uses mTLS when connecting to the backend. Combined with a STRICT PeerAuthentication, the entire path is encrypted.

Certificate Rotation

Istio certificates have a default lifetime of 24 hours. istiod rotates them automatically. The sidecar requests a new certificate before the old one expires. No downtime, no manual intervention.

Observability

Istio generates three types of telemetry automatically.

Metrics

Envoy exports Prometheus metrics for every request. Standard metrics include:

istio_requests_total: Total requests, labeled by source, destination, response code.
istio_request_duration_milliseconds: Request latency distribution.
istio_request_bytes: Request size.
istio_response_bytes: Response size.

You can visualize these in Grafana. Istio includes pre-built Grafana dashboards for service graphs, workload metrics, and performance.

Distributed Tracing

Envoy generates trace spans for every request. It propagates trace headers (B3, W3C Trace Context) across service boundaries. The spans are exported to a tracing backend like Jaeger or Zipkin.

A single user request through multiple services produces a trace with multiple spans. You can see the entire call graph, latency at each hop, and where errors occurred.

The application must forward trace headers (like x-request-id, x-b3-traceid). Istio does the rest.

Access Logs

Envoy can log every request. By default, access logs are disabled to reduce noise. You enable them with a Telemetry resource:

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: access-logging
  namespace: istio-demo
spec:
  accessLogging:
    - providers:
        - name: envoy

Logs go to the sidecar’s stdout. You can also export them to a centralized logging system.

Trade-offs: Istio vs Linkerd vs No Mesh

Istio

Pros:

Full-featured: traffic management, security, observability.
Large ecosystem and community.
Works with VMs and Kubernetes.
Highly configurable.

Cons:

Complex. Steep learning curve.
Higher resource overhead (Envoy is heavier than simpler proxies).
More moving parts to debug.

Linkerd

Pros:

Simpler and faster. Lower resource usage.
Automatic mTLS with no configuration.
Better performance (linkerd-proxy is optimized for service mesh use cases).

Cons:

Less flexible. Fewer traffic management features.
Smaller ecosystem.
Kubernetes-only (no VM support).

No Mesh

Pros:

No additional infrastructure.
No performance overhead.
Simpler operational model.

Cons:

Every service implements observability, retries, mTLS, etc.
Inconsistent across services and languages.
Harder to enforce policies.

When to Use Istio

You have many microservices and need consistent policies across all of them.
You need advanced traffic management (canary, A/B testing, dark traffic).
You want zero-trust security with mTLS between all services.
You need observability without changing application code.
You are willing to invest in learning and operating the mesh.

When Not to Use Istio

You have a monolith or a small number of services.
Your services already have good observability and resilience libraries.
You cannot afford the resource overhead (CPU, memory, network latency).
You do not have the team expertise to debug Envoy and Istio internals.

Production Considerations

Resource Overhead

Each Envoy sidecar uses CPU and memory. Typical overhead:

CPU: 0.1 to 0.5 vCPU per sidecar under load.
Memory: 50 to 150 MB per sidecar.

For a 100-pod cluster, this adds 10-50 vCPUs and 5-15 GB of memory. Budget for this when sizing your cluster.

Latency Impact

The sidecar adds latency to every request. Typical p99 latency increase is 1-5ms for in-cluster calls. For most services, this is acceptable. For ultra-low-latency services (high-frequency trading, real-time gaming), it may be too much.

Control Plane High Availability

istiod is a single point of failure. If it is down, existing sidecars continue to work with their current configuration, but you cannot push new config changes or generate new certificates.

Run at least 2 replicas of istiod:

values:
  pilot:
    replicaCount: 2

Spread them across availability zones.

Multi-Cluster

Istio supports multi-cluster meshes. You can have services in multiple Kubernetes clusters communicate as if they were in the same cluster. Istio handles service discovery and routing across clusters.

Two deployment models:

Primary-Remote: One cluster runs istiod (primary). Other clusters connect to it (remote). Good for a hub-and-spoke topology.

Multi-Primary: Each cluster runs its own istiod. They share a common root CA and replicate service discovery. Good for high availability and geo-distributed clusters.

Upgrades

Istio upgrades require upgrading both the control plane (istiod) and the data plane (sidecar proxies). The control plane supports N-1 sidecar versions, so you can upgrade control plane first, then roll sidecars.

Use canary upgrades. Run two versions of istiod side by side, migrate workloads gradually, then remove the old version.

Common Pitfalls

Sidecar Injection Not Working

Symptom: Pods have 1 container instead of 2. No Envoy sidecar.

Cause: The namespace is not labeled with istio-injection: enabled. Or the pod has the annotation sidecar.istio.io/inject: "false".

Fix: Label the namespace:

kubectl label namespace istio-demo istio-injection=enabled

Then restart the pods:

kubectl rollout restart deployment -n istio-demo

Traffic Not Using mTLS

Symptom: Traffic is plain HTTP even though you configured mTLS.

Cause: The PeerAuthentication is set to PERMISSIVE, and the client is sending plain HTTP. Or there is no DestinationRule setting mode: ISTIO_MUTUAL.

Fix: Set a STRICT PeerAuthentication and ensure the DestinationRule uses ISTIO_MUTUAL:

kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-demo
spec:
  mtls:
    mode: STRICT
EOF

VirtualService Not Taking Effect

Symptom: Traffic does not follow the routing rules.

Cause: The VirtualService does not match the request. Check the hosts field. It must match the service name the client is calling.

Debug: Use istioctl proxy-config to see the actual Envoy routes:

istioctl proxy-config routes <pod-name> -n istio-demo

Circuit Breaker Not Triggering

Symptom: You set connection limits but requests are not failing fast.

Cause: Connection pooling is applied at the client side (the Envoy sidecar making the request). If the client pod has low traffic, it may never hit the limit.

Test: Generate load from multiple concurrent clients.

High Memory Usage

Symptom: Envoy sidecars use more memory than expected.

Cause: Istio pushes the full mesh configuration to every sidecar. In large clusters (1000+ services), this config is huge.

Fix: Use Sidecar resources to limit the set of services each proxy knows about:

apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
  name: default
  namespace: istio-demo
spec:
  egress:
    - hosts:
        - "istio-demo/*"
        - "istio-system/*"

This tells the sidecars in istio-demo to only care about services in istio-demo and istio-system, not the entire mesh.

Debugging with istioctl

istioctl is the primary debugging tool.

Check proxy status:

istioctl proxy-status

Shows which sidecars are connected to istiod and whether their config is in sync.

View Envoy config:

istioctl proxy-config cluster <pod-name> -n istio-demo
istioctl proxy-config route <pod-name> -n istio-demo
istioctl proxy-config listener <pod-name> -n istio-demo

This shows the low-level Envoy configuration. Useful when VirtualServices do not behave as expected.

Analyze configuration issues:

istioctl analyze -n istio-demo

Detects common misconfigurations (missing DestinationRules, conflicting VirtualServices, etc.).

View Envoy logs:

kubectl logs <pod-name> -n istio-demo -c istio-proxy

Envoy logs connection errors, TLS failures, and rejected requests.

Connection to the Demo

This demo deploys a frontend and two versions of a backend. It shows three Istio features:

Automatic sidecar injection: The istio-injection: enabled label on the namespace causes every pod to get an Envoy sidecar.
Weighted traffic splitting: The VirtualService routes 80% of traffic to v1 and 20% to v2.
Mutual TLS: The DestinationRule enforces ISTIO_MUTUAL for backend traffic.

The Gateway exposes the frontend to external traffic. The VirtualService binds to the Gateway and routes requests to the frontend service. From there, the frontend’s sidecar applies the backend VirtualService rules.

You can test canary behavior by sending requests and observing the pod logs. You can verify mTLS by using istioctl authn tls-check. You can inject faults by patching the VirtualService.

This is a minimal example. Real production meshes add AuthorizationPolicies for RBAC, RequestAuthentication for JWT validation, and Telemetry resources for custom metrics.

Istio Service Mesh: Deep Dive

Why Service Meshes Exist

How Istio Works: Control Plane and Data Plane

Control Plane: istiod

Data Plane: Envoy Sidecars

How a Request Flows Through the Mesh

Key Istio Resources, Field by Field

DestinationRule: Traffic Policies and Subsets

spec.host

spec.trafficPolicy.tls.mode

spec.subsets

VirtualService: Routing Rules

spec.hosts

spec.http

Fault Injection

Retries and Timeouts

Gateway: Ingress Traffic

spec.selector

spec.servers

Traffic Management Patterns

Canary Deployments

Blue-Green Deployments

Dark Traffic (Mirroring)

Circuit Breaking

Mutual TLS (mTLS)

How mTLS Works in Istio

STRICT vs PERMISSIVE Modes

Certificate Rotation

Observability

Metrics

Distributed Tracing

Access Logs

Trade-offs: Istio vs Linkerd vs No Mesh

Istio

Linkerd

No Mesh

When to Use Istio

When Not to Use Istio

Production Considerations

Resource Overhead

Latency Impact

Control Plane High Availability

Multi-Cluster

Upgrades

Common Pitfalls

Sidecar Injection Not Working

Traffic Not Using mTLS

VirtualService Not Taking Effect

Circuit Breaker Not Triggering

High Memory Usage

Debugging with istioctl

Connection to the Demo

Further Reading

See Also