Skip to content

Istio Service Mesh: Deep Dive

This document explains why service meshes exist, how Istio implements a mesh using control and data planes, and when to use Istio versus alternatives like Linkerd or no mesh at all. It connects the demo manifests to the underlying Envoy proxy configuration and traffic management patterns.


Before service meshes, each application had to implement observability, security, and traffic management on its own.

You needed distributed tracing. Every service added instrumentation libraries. You wanted mutual TLS. Every service managed certificates. You needed retries and circuit breaking. Every service added a resilience library. You wanted traffic splitting for canary deployments. You modified your load balancer config or deployed a separate A/B testing framework.

This approach has problems. Every team implements these features differently. Libraries drift across languages (Go, Python, Java, Rust). Updating a policy requires redeploying every service. Security is inconsistent because not every team gets it right.

A service mesh moves these concerns out of application code and into infrastructure. Instead of each service implementing mTLS, the mesh does it transparently. Instead of application-level retries, the proxy handles it. Instead of custom code for traffic splitting, you configure it in a VirtualService.

The application just sends HTTP requests to a service name. The mesh handles the rest.


How Istio Works: Control Plane and Data Plane

Section titled “How Istio Works: Control Plane and Data Plane”

Istio has two parts: the control plane (istiod) and the data plane (Envoy sidecar proxies).

The control plane is a single binary called istiod. It replaced three separate components (Pilot, Citadel, Galley) in Istio 1.5. It runs in the istio-system namespace and handles three jobs.

Configuration distribution. You create Istio CRDs like VirtualService, DestinationRule, and Gateway. istiod watches these resources and translates them into Envoy proxy configuration. It pushes this configuration to every Envoy sidecar via the xDS protocol (Cluster Discovery Service, Route Discovery Service, Listener Discovery Service, etc.).

Certificate authority. istiod is a certificate authority for the mesh. It generates x.509 certificates for every workload and rotates them automatically. These certificates are used for mutual TLS between services.

Service discovery. istiod watches Kubernetes Service and Endpoint resources. It tells Envoy proxies which pods back each service. When a pod restarts and gets a new IP, istiod updates the proxy configuration without the application knowing anything changed.

Every pod in the mesh gets an Envoy proxy injected as a sidecar container. This happens automatically when you label the namespace:

# From manifests/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: istio-demo
labels:
istio-injection: enabled

The istio-injection: enabled label tells the Istio mutating admission webhook to inject a sidecar into every pod created in this namespace.

The Envoy sidecar intercepts all inbound and outbound traffic. It does this by configuring iptables rules in the pod’s network namespace. Traffic to port 80 (or any application port) is redirected to the Envoy listener. Envoy applies routing, retries, load balancing, and mTLS, then forwards the traffic to the real application container.

From the application’s perspective, nothing changes. It listens on port 80 and makes HTTP calls to service names. The sidecar makes those calls resilient, observable, and secure.

Frontend pod Backend pod
┌────────────────────┐ ┌────────────────────┐
│ nginx container │ │ httpbin container │
│ localhost:80 │ │ localhost:80 │
└──────┬─────────────┘ └─────────┬──────────┘
│ ▲
│ outbound to backend:80 │
▼ │
┌────────────────────┐ ┌────────┴───────────┐
│ Envoy sidecar │──────────────>│ Envoy sidecar │
│ - mTLS encrypt │ mTLS conn │ - mTLS decrypt │
│ - load balance │ │ - route to :80 │
│ - apply routing │ │ - record metrics │
└────────────────────┘ └────────────────────┘
  1. The frontend container calls http://backend:80.
  2. Iptables redirects this to the outbound Envoy listener.
  3. Envoy looks up routing rules (VirtualService) and applies them.
  4. Envoy establishes an mTLS connection to the backend’s Envoy sidecar.
  5. The backend’s Envoy sidecar decrypts the traffic and forwards it to the backend container on localhost:80.
  6. The response flows back through the same path.

The application sees a plain HTTP call. The mesh sees encrypted, routed, load-balanced, traced traffic.


DestinationRule: Traffic Policies and Subsets

Section titled “DestinationRule: Traffic Policies and Subsets”
# From manifests/destination-rule.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: backend
namespace: istio-demo
spec:
host: backend
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2

The Kubernetes service name this rule applies to. Here it is backend, which resolves to the ClusterIP service at backend.istio-demo.svc.cluster.local.

Controls how Envoy handles TLS for connections to this service.

  • ISTIO_MUTUAL: Use mutual TLS with Istio-generated certificates. Both client and server authenticate each other. This is the recommended mode when both sides are in the mesh.
  • SIMPLE: Use TLS, but only the server presents a certificate (like HTTPS). The client does not authenticate.
  • DISABLE: Do not use TLS. Plain HTTP. Only use this for testing or when calling services outside the mesh.
  • MUTUAL: Use mutual TLS, but bring your own certificates. You specify the CA and client certs.

In this demo, ISTIO_MUTUAL means Envoy encrypts all traffic to the backend service using certificates issued by istiod.

Subsets partition the service’s endpoints by labels. This demo has two versions of the backend:

manifests/deployment-backend-v1.yaml
metadata:
labels:
app: backend
version: v1
# manifests/deployment-backend-v2.yaml
metadata:
labels:
app: backend
version: v2

Both deployments match the backend Service selector (app: backend). Without subsets, traffic is distributed evenly across all pods. Subsets let you target a specific version in routing rules.

You can also apply subset-specific traffic policies. For example:

subsets:
- name: v1
labels:
version: v1
trafficPolicy:
connectionPool:
tcp:
maxConnections: 10

This limits connections to v1 pods only.

# From manifests/virtual-service.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: backend
namespace: istio-demo
spec:
hosts:
- backend
http:
- match:
- headers:
version:
exact: v2
route:
- destination:
host: backend
subset: v2
- route:
- destination:
host: backend
subset: v1
weight: 80
- destination:
host: backend
subset: v2
weight: 20

The service name this VirtualService applies to. When a client calls http://backend, Envoy checks if there is a VirtualService for backend and applies the routing rules.

A list of HTTP routing rules. Envoy evaluates them in order. The first matching rule wins.

First rule: If the request has a header version: v2, route 100% of traffic to the v2 subset. This is a header-based routing rule, useful for testing a new version with specific requests.

Second rule: If no header matches, route 80% of traffic to v1 and 20% to v2. This is a weighted routing rule for canary deployments. You can gradually shift traffic by changing the weights.

VirtualServices also support fault injection for chaos engineering:

http:
- fault:
delay:
percentage:
value: 50.0
fixedDelay: 5s
route:
- destination:
host: backend
subset: v1

This injects a 5-second delay on 50% of requests to v1. It tests how the frontend handles slow backends.

You can also inject errors:

fault:
abort:
percentage:
value: 10.0
httpStatus: 500

This returns HTTP 500 on 10% of requests, simulating backend failures.

http:
- route:
- destination:
host: backend
retries:
attempts: 3
perTryTimeout: 2s
timeout: 10s

Envoy retries failed requests up to 3 times, with a 2-second timeout per attempt. The entire operation must complete within 10 seconds.

# From manifests/gateway.yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: istio-gateway
namespace: istio-demo
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"

This Gateway binds to Envoy instances with the label istio: ingressgateway. Istio installs a gateway deployment called istio-ingressgateway in the istio-system namespace. It is a standalone Envoy proxy (not a sidecar) that handles ingress traffic.

A list of ports and protocols the Gateway listens on. This Gateway listens on port 80 for HTTP traffic. It accepts requests for any hostname (*).

For HTTPS, you add TLS configuration:

servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: my-tls-cert
hosts:
- "example.com"

The credentialName points to a Kubernetes Secret containing the TLS certificate.

A Gateway alone does nothing. You pair it with a VirtualService that binds to the Gateway:

# From manifests/gateway.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: frontend-gateway
namespace: istio-demo
spec:
hosts:
- "*"
gateways:
- istio-gateway
http:
- match:
- uri:
prefix: /
route:
- destination:
host: frontend
port:
number: 80

This VirtualService routes all traffic arriving at the istio-gateway Gateway to the frontend service.


Deploy a new version alongside the old one. Route a small percentage of traffic to the new version. Monitor error rates and latency. If the new version is healthy, shift more traffic. If it fails, roll back by changing the weights.

route:
- destination:
host: backend
subset: v1
weight: 90
- destination:
host: backend
subset: v2
weight: 10

Start with 10% to v2. Increase to 50%, then 100% as confidence grows.

Deploy the new version (green) but send no traffic to it. Run smoke tests against green. When green is verified, switch 100% of traffic from blue to green in one atomic change.

route:
- destination:
host: backend
subset: v2 # green
weight: 100

If green fails, switch back to blue by changing the subset to v1.

Send a copy of production traffic to a new version without affecting the response. The client only sees the response from the live version. The mirrored version processes the request, and you observe logs and metrics.

http:
- route:
- destination:
host: backend
subset: v1
mirror:
host: backend
subset: v2
mirrorPercentage:
value: 100

All traffic goes to v1 for the real response. A copy also goes to v2 for testing. This is safer than canary because v2 errors do not affect users.

Limit the number of concurrent connections and requests to prevent cascading failures.

trafficPolicy:
connectionPool:
tcp:
maxConnections: 10
http:
http1MaxPendingRequests: 1
maxRequestsPerConnection: 1
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50

connectionPool limits how many connections Envoy opens to the backend. Requests exceeding this limit get an immediate 503.

outlierDetection removes unhealthy pods from the load balancing pool. If a pod returns 5 consecutive errors, it is ejected for 30 seconds. Up to 50% of pods can be ejected at once (to prevent ejecting the entire backend).


When two services communicate, their Envoy sidecars establish a TLS connection. Both sides present x.509 certificates issued by istiod. Envoy validates the peer certificate against the Istio CA. If valid, the connection is encrypted and authenticated.

The application code sends plain HTTP. Envoy upgrades it to mTLS transparently.

Istio supports two mTLS modes at the mesh level.

PERMISSIVE (default): Accept both plain HTTP and mTLS. This is for migrating services into the mesh. Services without sidecars send plain HTTP. Services with sidecars send mTLS. Both work.

STRICT: Only accept mTLS. Reject plain HTTP connections. This enforces that all traffic in the mesh is encrypted.

You configure this with a PeerAuthentication resource:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT

This applies mesh-wide. You can also set it per namespace or per workload.

The DestinationRule in this demo sets mode: ISTIO_MUTUAL, which means the client side uses mTLS when connecting to the backend. Combined with a STRICT PeerAuthentication, the entire path is encrypted.

Istio certificates have a default lifetime of 24 hours. istiod rotates them automatically. The sidecar requests a new certificate before the old one expires. No downtime, no manual intervention.


Istio generates three types of telemetry automatically.

Envoy exports Prometheus metrics for every request. Standard metrics include:

  • istio_requests_total: Total requests, labeled by source, destination, response code.
  • istio_request_duration_milliseconds: Request latency distribution.
  • istio_request_bytes: Request size.
  • istio_response_bytes: Response size.

You can visualize these in Grafana. Istio includes pre-built Grafana dashboards for service graphs, workload metrics, and performance.

Envoy generates trace spans for every request. It propagates trace headers (B3, W3C Trace Context) across service boundaries. The spans are exported to a tracing backend like Jaeger or Zipkin.

A single user request through multiple services produces a trace with multiple spans. You can see the entire call graph, latency at each hop, and where errors occurred.

The application must forward trace headers (like x-request-id, x-b3-traceid). Istio does the rest.

Envoy can log every request. By default, access logs are disabled to reduce noise. You enable them with a Telemetry resource:

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
name: access-logging
namespace: istio-demo
spec:
accessLogging:
- providers:
- name: envoy

Logs go to the sidecar’s stdout. You can also export them to a centralized logging system.


Pros:

  • Full-featured: traffic management, security, observability.
  • Large ecosystem and community.
  • Works with VMs and Kubernetes.
  • Highly configurable.

Cons:

  • Complex. Steep learning curve.
  • Higher resource overhead (Envoy is heavier than simpler proxies).
  • More moving parts to debug.

Pros:

  • Simpler and faster. Lower resource usage.
  • Automatic mTLS with no configuration.
  • Better performance (linkerd-proxy is optimized for service mesh use cases).

Cons:

  • Less flexible. Fewer traffic management features.
  • Smaller ecosystem.
  • Kubernetes-only (no VM support).

Pros:

  • No additional infrastructure.
  • No performance overhead.
  • Simpler operational model.

Cons:

  • Every service implements observability, retries, mTLS, etc.
  • Inconsistent across services and languages.
  • Harder to enforce policies.
  • You have many microservices and need consistent policies across all of them.
  • You need advanced traffic management (canary, A/B testing, dark traffic).
  • You want zero-trust security with mTLS between all services.
  • You need observability without changing application code.
  • You are willing to invest in learning and operating the mesh.
  • You have a monolith or a small number of services.
  • Your services already have good observability and resilience libraries.
  • You cannot afford the resource overhead (CPU, memory, network latency).
  • You do not have the team expertise to debug Envoy and Istio internals.

Each Envoy sidecar uses CPU and memory. Typical overhead:

  • CPU: 0.1 to 0.5 vCPU per sidecar under load.
  • Memory: 50 to 150 MB per sidecar.

For a 100-pod cluster, this adds 10-50 vCPUs and 5-15 GB of memory. Budget for this when sizing your cluster.

The sidecar adds latency to every request. Typical p99 latency increase is 1-5ms for in-cluster calls. For most services, this is acceptable. For ultra-low-latency services (high-frequency trading, real-time gaming), it may be too much.

istiod is a single point of failure. If it is down, existing sidecars continue to work with their current configuration, but you cannot push new config changes or generate new certificates.

Run at least 2 replicas of istiod:

values:
pilot:
replicaCount: 2

Spread them across availability zones.

Istio supports multi-cluster meshes. You can have services in multiple Kubernetes clusters communicate as if they were in the same cluster. Istio handles service discovery and routing across clusters.

Two deployment models:

Primary-Remote: One cluster runs istiod (primary). Other clusters connect to it (remote). Good for a hub-and-spoke topology.

Multi-Primary: Each cluster runs its own istiod. They share a common root CA and replicate service discovery. Good for high availability and geo-distributed clusters.

Istio upgrades require upgrading both the control plane (istiod) and the data plane (sidecar proxies). The control plane supports N-1 sidecar versions, so you can upgrade control plane first, then roll sidecars.

Use canary upgrades. Run two versions of istiod side by side, migrate workloads gradually, then remove the old version.


Symptom: Pods have 1 container instead of 2. No Envoy sidecar.

Cause: The namespace is not labeled with istio-injection: enabled. Or the pod has the annotation sidecar.istio.io/inject: "false".

Fix: Label the namespace:

Terminal window
kubectl label namespace istio-demo istio-injection=enabled

Then restart the pods:

Terminal window
kubectl rollout restart deployment -n istio-demo

Symptom: Traffic is plain HTTP even though you configured mTLS.

Cause: The PeerAuthentication is set to PERMISSIVE, and the client is sending plain HTTP. Or there is no DestinationRule setting mode: ISTIO_MUTUAL.

Fix: Set a STRICT PeerAuthentication and ensure the DestinationRule uses ISTIO_MUTUAL:

Terminal window
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-demo
spec:
mtls:
mode: STRICT
EOF

Symptom: Traffic does not follow the routing rules.

Cause: The VirtualService does not match the request. Check the hosts field. It must match the service name the client is calling.

Debug: Use istioctl proxy-config to see the actual Envoy routes:

Terminal window
istioctl proxy-config routes <pod-name> -n istio-demo

Symptom: You set connection limits but requests are not failing fast.

Cause: Connection pooling is applied at the client side (the Envoy sidecar making the request). If the client pod has low traffic, it may never hit the limit.

Test: Generate load from multiple concurrent clients.

Symptom: Envoy sidecars use more memory than expected.

Cause: Istio pushes the full mesh configuration to every sidecar. In large clusters (1000+ services), this config is huge.

Fix: Use Sidecar resources to limit the set of services each proxy knows about:

apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
name: default
namespace: istio-demo
spec:
egress:
- hosts:
- "istio-demo/*"
- "istio-system/*"

This tells the sidecars in istio-demo to only care about services in istio-demo and istio-system, not the entire mesh.

istioctl is the primary debugging tool.

Check proxy status:

Terminal window
istioctl proxy-status

Shows which sidecars are connected to istiod and whether their config is in sync.

View Envoy config:

Terminal window
istioctl proxy-config cluster <pod-name> -n istio-demo
istioctl proxy-config route <pod-name> -n istio-demo
istioctl proxy-config listener <pod-name> -n istio-demo

This shows the low-level Envoy configuration. Useful when VirtualServices do not behave as expected.

Analyze configuration issues:

Terminal window
istioctl analyze -n istio-demo

Detects common misconfigurations (missing DestinationRules, conflicting VirtualServices, etc.).

View Envoy logs:

Terminal window
kubectl logs <pod-name> -n istio-demo -c istio-proxy

Envoy logs connection errors, TLS failures, and rejected requests.


This demo deploys a frontend and two versions of a backend. It shows three Istio features:

  1. Automatic sidecar injection: The istio-injection: enabled label on the namespace causes every pod to get an Envoy sidecar.
  2. Weighted traffic splitting: The VirtualService routes 80% of traffic to v1 and 20% to v2.
  3. Mutual TLS: The DestinationRule enforces ISTIO_MUTUAL for backend traffic.

The Gateway exposes the frontend to external traffic. The VirtualService binds to the Gateway and routes requests to the frontend service. From there, the frontend’s sidecar applies the backend VirtualService rules.

You can test canary behavior by sending requests and observing the pod logs. You can verify mTLS by using istioctl authn tls-check. You can inject faults by patching the VirtualService.

This is a minimal example. Real production meshes add AuthorizationPolicies for RBAC, RequestAuthentication for JWT validation, and Telemetry resources for custom metrics.