Deployment Strategies: Deep Dive

This document explains how the Deployment controller manages ReplicaSets, how rolling updates work at the math level, how blue/green and canary patterns are implemented in Kubernetes, and how tools like Argo Rollouts extend these patterns with traffic splitting and automated analysis.

Deployment Controller Internals

A Deployment does not manage pods directly. It manages ReplicaSets. Each ReplicaSet manages a set of pods. The Deployment controller is a control loop running in the kube-controller-manager.

The Control Loop

Every few seconds, the Deployment controller:

Reads the desired state from the Deployment spec.
Lists all ReplicaSets owned by this Deployment.
Compares current state to desired state.
Creates, scales, or deletes ReplicaSets to converge.

When you update a Deployment’s pod template (change the image, add an env var, modify a volume), the controller creates a new ReplicaSet with the updated template. The old ReplicaSet is gradually scaled down while the new one scales up.

ReplicaSet Management

Each unique pod template version creates a new ReplicaSet. The Deployment tracks which ReplicaSet is “current” (new) and which are “old.”

The demo’s rolling update Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rolling-app
  namespace: deploy-strategy-demo
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  selector:
    matchLabels:
      app: rolling-app
  template:
    metadata:
      labels:
        app: rolling-app
        version: v1
    spec:
      containers:
        - name: app
          image: nginx:1.24-alpine

When the image changes to nginx:1.25.3-alpine, the controller creates a new ReplicaSet. The old ReplicaSet (nginx:1.24) still exists. During the rollout, both ReplicaSets have pods running.

Revision History

Every ReplicaSet gets a revision number. The Deployment stores the revision in an annotation:

deployment.kubernetes.io/revision: "3"

By default, Kubernetes keeps the last 10 ReplicaSets (controlled by revisionHistoryLimit). Old ReplicaSets are scaled to 0 pods but kept for rollback purposes.

kubectl rollout history deploy/rolling-app -n deploy-strategy-demo

Shows all revisions. You can rollback to any revision:

kubectl rollout undo deploy/rolling-app --to-revision=2 -n deploy-strategy-demo

Setting revisionHistoryLimit: 0 deletes old ReplicaSets immediately. This saves etcd space but removes rollback capability.

Rolling Update Math

The maxSurge and maxUnavailable parameters control the pace of a rolling update. Understanding the math is essential for tuning.

maxSurge

Maximum number of pods that can exist above the desired replica count during an update. Can be an absolute number or percentage.

maxUnavailable

Maximum number of pods that can be unavailable during an update. Can be an absolute number or percentage.

The Calculation

With 4 replicas, maxSurge: 1, maxUnavailable: 1:

Max total pods = replicas + maxSurge = 4 + 1 = 5
Min available pods = replicas - maxUnavailable = 4 - 1 = 3

The rollout proceeds:

Step	Old RS	New RS	Total	Available	Action
0	4	0	4	4	Start
1	4	1	5	4	Scale up new RS (maxSurge allows 5)
2	3	1	4	3*	Scale down old RS (maxUnavailable allows 3)
3	3	2	5	4	New pod ready, scale up again
4	2	2	4	3*	Scale down old RS
5	2	3	5	4	New pod ready, scale up
6	1	3	4	3*	Scale down old RS
7	1	4	5	4	New pod ready, scale up
8	0	4	4	4	Complete

*During scale-down, availability temporarily drops to 3.

Percentage Math

With 10 replicas, maxSurge: 25%, maxUnavailable: 25%:

maxSurge = ceil(10 * 0.25) = 3 (rounded up)
maxUnavailable = floor(10 * 0.25) = 2 (rounded down)

Max total pods = 10 + 3 = 13
Min available pods = 10 - 2 = 8

Percentages round in the “safer” direction: maxSurge rounds up (more pods allowed), maxUnavailable rounds down (fewer pods can be missing).

Special Cases

maxSurge: 0, maxUnavailable: 1: Scale down first, then scale up. Old pods are removed before new pods are created. Fewer total pods at any time, but brief capacity reduction.

maxSurge: 100%, maxUnavailable: 0: Create all new pods first, then remove all old pods once ready. This is effectively a blue/green deployment. Double the resource cost during transition.

maxSurge: 0, maxUnavailable: 0: Invalid. No progress can be made. The API server rejects this configuration.

Recreate Strategy

The Recreate strategy kills all old pods before creating new ones:

spec:
  strategy:
    type: Recreate

This causes downtime. All old pods are terminated. Then all new pods are created. There is a gap where no pods are running.

Use Recreate when:

The application cannot run two versions simultaneously (database schema conflicts)
You have a single-instance resource that cannot be shared (exclusive file lock)
Downtime is acceptable

Blue/Green Deployment

Blue/green runs two complete environments simultaneously. A Service selector switches traffic between them.

How the Demo Implements It

Two Deployments, one Service:

# Blue environment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: bluegreen-app
      version: blue
  template:
    metadata:
      labels:
        app: bluegreen-app
        version: blue
    spec:
      containers:
        - name: app
          image: nginx:1.24-alpine

---
# Green environment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: bluegreen-app
      version: green
  template:
    metadata:
      labels:
        app: bluegreen-app
        version: green
    spec:
      containers:
        - name: app
          image: nginx:1.25.3-alpine

---
# Service points to blue
apiVersion: v1
kind: Service
metadata:
  name: bluegreen-app
spec:
  selector:
    app: bluegreen-app
    version: blue          # <-- This selects blue pods
  ports:
    - port: 80

The switch happens by patching the Service selector:

# Switch to green
kubectl patch svc bluegreen-app -n deploy-strategy-demo \
  -p '{"spec":{"selector":{"version":"green"}}}'

# Rollback to blue
kubectl patch svc bluegreen-app -n deploy-strategy-demo \
  -p '{"spec":{"selector":{"version":"blue"}}}'

Blue/Green Trade-offs

Advantages:

Instant rollback (change selector back)
Full testing of new version before switching
Zero downtime

Disadvantages:

Double resource consumption (both environments run simultaneously)
Database schema changes require careful coordination
No gradual traffic shifting

The DNS Propagation Gap

When you change a Service selector, kube-proxy updates iptables rules on each node. This is not instantaneous. There is a brief window (usually 1-5 seconds) where some nodes still route to the old version while others route to the new one.

For truly instant switching, use an ingress controller with traffic splitting capabilities instead of Service selector changes.

Canary Deployment

Canary sends a small percentage of traffic to the new version. The percentage is controlled by the ratio of pods.

How the Demo Implements It

Two Deployments share a label that the Service selects on:

# Stable: 4 replicas
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-stable
spec:
  replicas: 4
  selector:
    matchLabels:
      app: canary-app
      track: stable
  template:
    metadata:
      labels:
        app: canary-app      # <-- Shared label
        track: stable

---
# Canary: 1 replica
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-canary
spec:
  replicas: 1
  selector:
    matchLabels:
      app: canary-app
      track: canary
  template:
    metadata:
      labels:
        app: canary-app      # <-- Same shared label
        track: canary

---
# Service selects on shared label only
apiVersion: v1
kind: Service
metadata:
  name: canary-app
spec:
  selector:
    app: canary-app          # <-- Matches BOTH deployments
  ports:
    - port: 80

The Service selector app: canary-app matches pods from both Deployments. With 4 stable pods and 1 canary pod, roughly 20% of requests go to the canary.

Canary Promotion

To promote the canary to full rollout:

kubectl scale deploy app-canary --replicas=4 -n deploy-strategy-demo
kubectl scale deploy app-stable --replicas=0 -n deploy-strategy-demo

Or more gradually:

# Phase 1: 60/40
kubectl scale deploy app-stable --replicas=3 -n deploy-strategy-demo
kubectl scale deploy app-canary --replicas=2 -n deploy-strategy-demo

# Phase 2: 20/80
kubectl scale deploy app-stable --replicas=1 -n deploy-strategy-demo
kubectl scale deploy app-canary --replicas=4 -n deploy-strategy-demo

# Phase 3: Full promotion
kubectl scale deploy app-stable --replicas=0 -n deploy-strategy-demo

Canary Limitations with Plain Kubernetes

The replica-ratio approach has significant limitations:

Granularity: With 5 total pods, you can do 0%, 20%, 40%, 60%, 80%, 100%. You cannot do 5% or 1%.
No session affinity: A user might hit the canary on one request and stable on the next.
Manual process: Scaling and monitoring is manual.
No automated rollback: If the canary is bad, you must manually scale it down.

Argo Rollouts for Advanced Canary

Argo Rollouts is a Kubernetes controller that replaces the Deployment resource with a Rollout CRD. It provides fine-grained traffic splitting and automated analysis.

Traffic Splitting

Instead of replica ratios, Argo Rollouts integrates with ingress controllers and service meshes to split traffic at the network level:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: web-app
spec:
  replicas: 5
  strategy:
    canary:
      canaryService: web-app-canary
      stableService: web-app-stable
      trafficRouting:
        nginx:
          stableIngress: web-app-ingress
      steps:
        - setWeight: 5      # 5% to canary
        - pause: {duration: 5m}
        - setWeight: 20
        - pause: {duration: 10m}
        - setWeight: 50
        - pause: {duration: 15m}
        - setWeight: 100

This sends exactly 5% of traffic to the canary, regardless of how many canary pods exist. The traffic split happens at the ingress/service mesh level.

Automated Analysis

Argo Rollouts can query Prometheus during a canary and automatically roll back if metrics look bad:

steps:
  - setWeight: 20
  - analysis:
      templates:
        - templateName: success-rate
      args:
        - name: service-name
          value: web-app-canary

---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
    - name: success-rate
      interval: 1m
      successCondition: result[0] >= 0.95
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(http_requests_total{service="{{args.service-name}}", status=~"2.*"}[5m]))
            /
            sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))

If the success rate drops below 95%, the rollout automatically aborts and reverts to the stable version.

Traffic Mirroring

Traffic mirroring (also called shadowing) sends a copy of production traffic to the new version without affecting users. The response from the mirror is discarded.

This is not built into Kubernetes Deployments. You need a service mesh (Istio, Linkerd) or ingress controller (Nginx, Envoy) that supports mirroring.

# Istio VirtualService with mirroring
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: web-app
spec:
  hosts:
    - web-app
  http:
    - route:
        - destination:
            host: web-app-stable
      mirror:
        host: web-app-canary
      mirrorPercentage:
        value: 100.0

All requests go to stable. A copy goes to canary. The canary processes the request, but the response is discarded. This lets you test the canary with real production traffic without risk.

A/B Testing

A/B testing routes traffic based on user attributes (headers, cookies, geography) rather than random percentages.

With an Istio VirtualService:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: web-app
spec:
  hosts:
    - web-app
  http:
    - match:
        - headers:
            x-user-group:
              exact: beta
      route:
        - destination:
            host: web-app-canary
    - route:
        - destination:
            host: web-app-stable

Users with the x-user-group: beta header see the canary. Everyone else sees stable. This is true A/B testing with deterministic routing.

Strategy Comparison

Strategy	Downtime	Rollback	Resource Cost	Risk	Complexity
Rolling Update	None	`rollout undo` (seconds)	1x + surge	Low	Built-in
Recreate	Yes	Redeploy (minutes)	1x	High	Built-in
Blue/Green	None	Selector patch (instant)	2x	Low	Manual
Canary (replica)	None	Scale down (seconds)	1x + canary	Lowest	Manual
Canary (Argo)	None	Automatic	1x + canary	Lowest	Argo CRD
Traffic Mirror	None	N/A	1x + mirror	None	Service mesh

Choosing a Strategy

Use Rolling Update for zero-config deployments when multiple versions can coexist. Use Blue/Green when you need instant rollback and have double the resources. Use Canary to minimize blast radius with metric-driven confidence building. Use Argo Rollouts for fine-grained traffic control (1%, 5%) with automated rollback based on Prometheus queries.

Deployment Strategies: Deep Dive

Deployment Controller Internals

The Control Loop

ReplicaSet Management

Revision History

Rolling Update Math

maxSurge

maxUnavailable

The Calculation

Percentage Math

Special Cases

Recreate Strategy

Blue/Green Deployment

How the Demo Implements It

Blue/Green Trade-offs

The DNS Propagation Gap

Canary Deployment

How the Demo Implements It

Canary Promotion

Canary Limitations with Plain Kubernetes

Argo Rollouts for Advanced Canary

Traffic Splitting

Automated Analysis

Traffic Mirroring

A/B Testing

Strategy Comparison

Choosing a Strategy

Related Resources