Pod Disruption Budgets: Deep Dive

This document explains how PodDisruptionBudgets protect application availability during voluntary disruptions. It covers the eviction API, the difference between voluntary and involuntary disruptions, PDB interaction with rolling updates, and strategies for stateful workloads.

Voluntary vs Involuntary Disruptions

Kubernetes distinguishes between two categories of disruption. This distinction determines whether PDBs have any effect.

Voluntary Disruptions

These are planned, controllable events initiated by a human or an automated system through the Kubernetes API:

kubectl drain (node maintenance, OS upgrades)
Cluster autoscaler scaling down underutilized nodes
Deployment rolling updates
Manual pod eviction via the eviction API
Cluster upgrades
Spot instance reclamation (when using preemptible VMs with proper integration)

PDBs are respected during all voluntary disruptions. The system checks the PDB before evicting a pod and blocks the eviction if it would violate the budget.

Involuntary Disruptions

These are unplanned, uncontrollable events:

Node hardware failure
Kernel panic
OOM killer terminating a container
Cloud provider VM deletion
Network partition isolating a node

PDBs have no effect on involuntary disruptions. The pod is already gone before anything can check a budget. You cannot prevent a hardware failure with a YAML file.

This is a key understanding: PDBs protect against operational actions, not infrastructure failures. For infrastructure failures, you need replicas, anti-affinity rules, and multi-zone deployments.

The Eviction API

The eviction API is the mechanism that enforces PDBs. It is the “polite” way to remove a pod.

How It Works

When something wants to evict a pod (kubectl drain, cluster autoscaler, etc.), it sends a POST request to:

POST /api/v1/namespaces/{namespace}/pods/{pod}/eviction

The API server intercepts this request and:

Finds all PDBs that select the target pod.
Calculates whether evicting this pod would violate any PDB.
If the eviction would violate a PDB, returns 429 Too Many Requests or 500 with a retry-after header.
If the eviction is allowed, the pod is terminated (SIGTERM, grace period, SIGKILL).

Eviction vs Deletion

kubectl delete pod bypasses PDBs entirely. It sends a DELETE request, not an eviction. The API server does not check PDBs for DELETE requests.

This is intentional. kubectl delete pod is for emergency situations. The eviction API is for planned maintenance.

kubectl drain uses the eviction API internally. If a PDB blocks an eviction, kubectl drain waits and retries. You see messages like:

evicting pod "web-app-xyz"
error when evicting pods/"web-app-xyz" -n "pdb-demo":
Cannot evict pod as it would violate the pod's disruption budget.

The drain command keeps retrying until the PDB allows the eviction or a timeout is reached.

PDB Configuration

A PDB selects pods using the same label selector mechanism as Deployments, Services, and other controllers.

minAvailable

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
  namespace: pdb-demo
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-app

At least 2 pods matching app: web-app must remain running. If 4 replicas exist, 2 can be evicted. If 2 replicas exist, none can be evicted.

maxUnavailable

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb-alt
  namespace: pdb-demo
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: web-app

At most 1 pod can be unavailable at any time. With 4 replicas, 1 can be evicted. With 100 replicas, still only 1 can be evicted.

Choosing Between Them

You can set minAvailable or maxUnavailable, but not both. They express the same constraint from different angles.

minAvailable is better when:

You have a quorum requirement (etcd needs 2 of 3 nodes)
You have a fixed minimum for functionality (at least 1 worker must be running)

maxUnavailable is better when:

You want to control blast radius (drain one node at a time)
The total replica count changes frequently
You are using percentage-based PDBs

Percentage-Based PDBs

Both fields accept percentages:

spec:
  maxUnavailable: "25%"

With 8 replicas, this allows 2 unavailable (25% of 8 = 2). With 4 replicas, it allows 1 (25% of 4 = 1, rounded up).

Percentages are better for autoscaled workloads where the replica count changes. A fixed maxUnavailable: 1 on a 100-pod deployment means draining a node takes a very long time because pods are evicted one at a time. maxUnavailable: "25%" allows 25 simultaneous evictions.

The ALLOWED DISRUPTIONS Calculation

The ALLOWED DISRUPTIONS field shown by kubectl get pdb is computed as:

For minAvailable:

allowed = currentHealthy - minAvailable

For maxUnavailable:

allowed = maxUnavailable - (expectedPods - currentHealthy)

Where currentHealthy is the number of ready pods and expectedPods is the desired replica count.

The demo shows this with 4 replicas and minAvailable: 2:

allowed = 4 (healthy) - 2 (minAvailable) = 2

Scale to 2 replicas:

allowed = 2 (healthy) - 2 (minAvailable) = 0

When ALLOWED DISRUPTIONS is 0, no voluntary evictions are possible. A kubectl drain on the node will hang until pods are rescheduled or the PDB is deleted.

PDB Interaction with Rolling Updates

Rolling updates are voluntary disruptions, but they do not use the eviction API. The Deployment controller creates and deletes pods directly. So how do PDBs interact with rolling updates?

The Deployment controller respects PDBs indirectly through maxUnavailable in the rolling update strategy:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: pdb-demo
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1

The Deployment controller’s maxUnavailable and the PDB’s constraints interact:

The Deployment controller deletes old pods and creates new ones.
It respects its own maxUnavailable setting.
If a PDB exists, the controller also checks the PDB before terminating old pods.

Starting in Kubernetes v1.22, the Deployment controller uses the eviction API when a PDB exists. If the PDB would be violated, the eviction is blocked and the rollout slows down.

This means: if you have a PDB with minAvailable: 3 on a 4-replica Deployment, and the Deployment has maxUnavailable: 2, the PDB wins. Only 1 pod can be taken down at a time (4 - 3 = 1).

Rollout Deadlocks

A common problem: if minAvailable equals the replica count, no voluntary disruptions are allowed. Rolling updates freeze because the controller cannot terminate any old pod.

# This WILL deadlock rolling updates
spec:
  minAvailable: 4    # Same as replica count

The rollout creates a new pod (via maxSurge) but cannot delete the old one. The new pod starts, but the old pod is never evicted. The rollout hangs.

Rule of thumb: minAvailable should always be less than the replica count, or use maxUnavailable instead.

Unhealthy Pod Eviction Policy

Kubernetes v1.27 introduced the unhealthyPodEvictionPolicy field. This controls whether unhealthy pods (not-ready, unschedulable) are counted when calculating allowed disruptions.

AlwaysAllow (default in v1.27+)

apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
  minAvailable: 2
  unhealthyPodEvictionPolicy: AlwaysAllow
  selector:
    matchLabels:
      app: web-app

Unhealthy pods can always be evicted, even if it would violate the PDB. This prevents a scenario where unhealthy pods block node drain because the PDB counts them as “available.”

IfHealthy

spec:
  unhealthyPodEvictionPolicy: IfHealthy

Unhealthy pods are only evictable if the PDB is satisfied. This is the legacy behavior.

Why This Matters

Consider: 4 replicas, minAvailable: 2, but 2 pods are in CrashLoopBackOff. With IfHealthy, the healthy count is 2, which equals minAvailable, so ALLOWED DISRUPTIONS is 0. A node drain cannot proceed even though the unhealthy pods are not serving traffic anyway.

With AlwaysAllow, the unhealthy pods can be evicted regardless. The drain proceeds. This is almost always the behavior you want.

PDB Strategies for StatefulSets

StatefulSets have unique PDB considerations because pods have stable identities and ordered startup/shutdown.

Single-Instance StatefulSets

For a StatefulSet with replicas: 1 (common for databases), any PDB with minAvailable: 1 blocks all voluntary disruptions. The single pod cannot be evicted without violating the budget.

Options:

Use maxUnavailable: 1 to allow the single pod to be evicted
Accept that minAvailable: 1 blocks drain (you manage the pod manually)
Use minAvailable: 0 to always allow disruption (but this makes the PDB useless)

Quorum-Based Systems

For systems like etcd or ZooKeeper with quorum requirements:

# etcd: 3 nodes, quorum needs 2
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
  maxUnavailable: 1    # Only 1 node can be down
  selector:
    matchLabels:
      app: etcd

This ensures quorum is maintained during drain operations. Two nodes cannot be evicted simultaneously.

Ordered Shutdown

StatefulSets with podManagementPolicy: OrderedReady shut down pods in reverse order (pod N-1 before pod N-2). PDBs add an additional constraint on top of this ordering. The PDB controls how many pods can be down; the StatefulSet controls which order they go down.

Drain Behavior

kubectl drain cordons the node (marks it unschedulable) and then evicts all pods:

kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

The drain process:

Cordons the node (no new pods can be scheduled).
Lists all evictable pods on the node.
For each pod, sends an eviction request.
If a PDB blocks the eviction, retries with exponential backoff.
Waits until all pods are evicted or the timeout is reached.

Drain Stuck on PDB

If a PDB blocks eviction, drain logs:

evicting pod "web-app-abc"
error when evicting pods/"web-app-abc": Cannot evict pod as it would violate the pod's disruption budget.

The drain retries until:

The PDB allows the eviction (another replica comes up elsewhere)
The --timeout flag expires (default: no timeout, waits forever)
You press Ctrl+C

Force Drain

If drain is stuck, you can use --force to delete pods that are not managed by a controller. But --force does not bypass PDBs. To truly bypass PDBs, you must delete the PDB or use kubectl delete pod directly.

In emergencies:

# Delete the PDB temporarily
kubectl delete pdb web-app-pdb -n pdb-demo

# Drain the node
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

# Recreate the PDB after drain
kubectl apply -f manifests/pdb-min-available.yaml

Cluster Autoscaler Interaction

The cluster autoscaler scales down nodes that are underutilized. Before removing a node, it checks:

Can all pods on this node be rescheduled elsewhere?
Would evicting these pods violate any PDB?

If a PDB blocks the eviction, the autoscaler skips that node. It moves on to other candidates.

This means PDBs can prevent cluster scale-down. A single PDB with ALLOWED DISRUPTIONS: 0 on any pod on a node prevents that node from being removed, even if the node is 95% idle.

Autoscaler Annotations

The cluster autoscaler respects the cluster-autoscaler.kubernetes.io/safe-to-evict annotation:

metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

This tells the autoscaler that the pod can be evicted even without a PDB, overriding the default behavior of keeping pods without a controller.

PDB Anti-Patterns

1. minAvailable = replicas

spec:
  minAvailable: 4    # With 4 replicas

No disruptions ever allowed. Blocks drain, blocks rollouts, blocks autoscaler.

2. PDB Without Enough Replicas

spec:
  minAvailable: 3    # With 2 replicas

ALLOWED DISRUPTIONS is negative (clamped to 0). PDB is permanently blocking. This is usually a misconfiguration after scaling down.

3. PDB Selector Matches No Pods

If the label selector does not match any pods, the PDB is inert. No warnings. Check kubectl get pdb and verify the counts make sense.

4. Multiple PDBs Selecting the Same Pods

If two PDBs select the same pods, both must be satisfied. The more restrictive one wins. This can create unexpected deadlocks.

5. Forgetting DaemonSet Pods

DaemonSet pods run on every node. During drain, they are handled separately (--ignore-daemonsets). PDBs on DaemonSet pods can block drain but DaemonSet pods cannot be rescheduled to another node (they are per-node by definition). Be careful with DaemonSet PDBs.

PDB Status

Check PDB status to understand the current state:

kubectl get pdb -n pdb-demo -o yaml

The status section shows:

status:
  currentHealthy: 4
  desiredHealthy: 2
  disruptionsAllowed: 2
  expectedPods: 4
  observedGeneration: 1

currentHealthy: Pods matching the selector that are ready
desiredHealthy: Minimum healthy pods required (from minAvailable or calculated)
disruptionsAllowed: How many pods can be evicted right now
expectedPods: Total pods matching the selector

PDB Best Practices

For Stateless Web Services

spec:
  maxUnavailable: "25%"

Percentage-based, scales with replica count. Allows reasonable parallelism during drain.

For Databases and Stateful Services

spec:
  maxUnavailable: 1

One at a time. Preserves quorum and data replication.

For Critical Infrastructure

spec:
  minAvailable: 2
  unhealthyPodEvictionPolicy: AlwaysAllow

Fixed minimum with healthy-only counting. Prevents broken pods from blocking operations.

For Development/Staging

Consider not using PDBs at all. They add operational complexity and can block CI/CD pipelines.