Skip to content

Progressive Delivery: Deep Dive

Progressive delivery extends continuous delivery by adding gradual rollouts with automated verification. Instead of deploying a new version to all users at once, traffic shifts incrementally while health metrics are monitored. If something goes wrong, the rollout stops or rolls back automatically.

The term was coined by James Governor (RedMonk) and popularized by tools like Argo Rollouts, Flagger, and LaunchDarkly.

Why Not Just Use Deployment Rolling Updates

Section titled “Why Not Just Use Deployment Rolling Updates”

A standard Kubernetes Deployment rolling update replaces old pods with new ones at a controlled rate (maxSurge, maxUnavailable). But it has limitations:

  • No traffic splitting - as soon as a new pod is ready, it receives equal traffic. There is no way to send 10% of traffic to the new version.
  • No automated analysis - the Deployment controller checks readiness probes, but it cannot query Prometheus for error rates or latency.
  • No automatic rollback on metrics - if the new version passes readiness probes but has higher error rates, the Deployment does not roll back.
  • No pause between steps - the rollout proceeds as fast as pods become ready.

Argo Rollouts solves all of these.

  • Rollout Controller - watches Rollout CRDs and manages ReplicaSets, similar to the Deployment controller
  • AnalysisRun Controller - executes analysis templates and reports results
  • Rollout CRD - replaces Deployment with the same pod spec but adds strategy configuration
  • AnalysisTemplate CRD - defines metrics to check during rollout
  • AnalysisRun CRD - an instance of an AnalysisTemplate, created automatically during rollout
  1. You apply a Rollout with a new pod template (e.g., new image tag)
  2. The controller creates a canary ReplicaSet with the new template
  3. The controller scales canary pods according to the step weights
  4. If the Rollout references an AnalysisTemplate, the controller creates an AnalysisRun
  5. The AnalysisRun periodically checks metrics (Prometheus, HTTP, Job-based)
  6. If analysis passes, the controller proceeds to the next step
  7. If analysis fails, the controller aborts and scales down the canary

For precise traffic splitting (not just replica-based weighting), Argo Rollouts integrates with:

  • Istio - uses VirtualService to split traffic by percentage
  • NGINX Ingress - uses canary annotations on Ingress
  • AWS ALB - uses target group weights
  • SMI (Service Mesh Interface) - generic traffic split API

Without a service mesh, traffic splitting is approximated by the ratio of canary to stable pods. With 4 replicas and 1 canary pod, roughly 25% of traffic goes to canary (not exactly 10% as specified). The service mesh enables exact percentages.

The most common production pattern queries Prometheus for error rates:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: error-rate
spec:
args:
- name: service-name
metrics:
- name: error-rate
interval: 30s
count: 5
successCondition: result[0] < 0.05
failureLimit: 2
provider:
prometheus:
address: http://prometheus.monitoring:9090
query: |
sum(rate(http_requests_total{service="{{args.service-name}}",status=~"5.."}[5m]))
/
sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))

This checks that the 5xx error rate stays below 5% over 5 measurements. If 2 measurements fail, the rollout aborts.

For environments without Prometheus (like this demo), a Job runs a health check script:

provider:
job:
spec:
template:
spec:
containers:
- name: check
image: curlimages/curl:latest
command: [curl, -sf, "http://canary-service/health"]
restartPolicy: Never

The Job’s exit code determines success (0) or failure (non-zero).

For external monitoring services:

provider:
web:
url: https://monitoring.example.com/api/v1/canary/{{args.service-name}}/health
headers:
- key: Authorization
value: Bearer {{args.api-token}}
jsonPath: "{$.healthy}"
timeoutSeconds: 30
  • Two complete environments (blue = current, green = new)
  • All traffic switches at once (0% -> 100%)
  • Simpler to implement and reason about
  • Requires 2x resources during deployment
  • Rollback is instant (switch back to blue)
  • Gradual traffic shift (10% -> 50% -> 100%)
  • Less resource overhead (only canary pods are extra)
  • More complex traffic management
  • Can catch issues that only appear under real traffic
  • Rollback requires scaling down canary pods
  • Blue-green: database migrations, breaking API changes, regulated environments requiring full validation before exposure
  • Canary: stateless services, high-traffic services where gradual exposure catches edge cases, services with good observability

Argo Rollouts integrates naturally with Argo CD for GitOps workflows:

  1. Developer merges a PR that updates the image tag in the Rollout manifest
  2. Argo CD syncs the change to the cluster
  3. Argo Rollouts detects the spec change and begins the canary rollout
  4. AnalysisRuns verify health at each step
  5. If the rollout succeeds, the new version is stable
  6. If the rollout fails, Argo Rollouts aborts and Argo CD shows the degraded status

The key insight: Argo CD manages the desired state, Argo Rollouts manages the transition to that state.

Design your canary steps based on your traffic volume:

  • Low traffic (< 100 req/s): fewer steps, longer pauses (need time to collect meaningful metrics)
  • High traffic (> 10k req/s): more steps, shorter pauses (statistically significant data arrives quickly)

Example for a high-traffic service:

steps:
- setWeight: 1
- pause: { duration: 5m }
- setWeight: 5
- pause: { duration: 5m }
- setWeight: 25
- pause: { duration: 5m }
- setWeight: 50
- pause: { duration: 5m }
- setWeight: 100

Canary is primarily designed for stateless services. For stateful services:

  • Ensure backward-compatible schema changes (expand-and-contract pattern)
  • Consider feature flags instead of canary deployments
  • Use blue-green if the service requires exclusive access to shared state

Argo Rollouts includes anti-rollback protection by default. If you try to roll back to a previous revision that already failed analysis, the controller blocks it. Override with:

Terminal window
kubectl argo rollouts undo canary-app -n rollouts-demo --to-revision=2

Argo Rollouts supports notifications via Argo Notifications:

  • Slack messages on rollout start, promotion, abort
  • Webhook triggers for external systems
  • Custom templates for notification content

Comparison with Other Progressive Delivery Tools

Section titled “Comparison with Other Progressive Delivery Tools”
  • Works with Istio, Linkerd, App Mesh, NGINX, and others
  • Tighter Prometheus integration out of the box
  • Uses Deployments (not a custom CRD), wrapping them with a Canary CRD
  • More opinionated about traffic management
  • Use Tekton pipelines to orchestrate canary steps
  • Manual traffic management via kubectl
  • More flexible but more work to set up
  • Good if you already use Tekton for CI/CD
  • Application-level, not infrastructure-level
  • Toggle features per user, not per pod
  • Can combine with canary: deploy canary pods with feature flag enabled
  • Better for A/B testing and gradual feature exposure