Skip to content

Pod Disruption Budgets: Deep Dive

This document explains how PodDisruptionBudgets protect application availability during voluntary disruptions. It covers the eviction API, the difference between voluntary and involuntary disruptions, PDB interaction with rolling updates, and strategies for stateful workloads.

Kubernetes distinguishes between two categories of disruption. This distinction determines whether PDBs have any effect.

These are planned, controllable events initiated by a human or an automated system through the Kubernetes API:

  • kubectl drain (node maintenance, OS upgrades)
  • Cluster autoscaler scaling down underutilized nodes
  • Deployment rolling updates
  • Manual pod eviction via the eviction API
  • Cluster upgrades
  • Spot instance reclamation (when using preemptible VMs with proper integration)

PDBs are respected during all voluntary disruptions. The system checks the PDB before evicting a pod and blocks the eviction if it would violate the budget.

These are unplanned, uncontrollable events:

  • Node hardware failure
  • Kernel panic
  • OOM killer terminating a container
  • Cloud provider VM deletion
  • Network partition isolating a node

PDBs have no effect on involuntary disruptions. The pod is already gone before anything can check a budget. You cannot prevent a hardware failure with a YAML file.

This is a key understanding: PDBs protect against operational actions, not infrastructure failures. For infrastructure failures, you need replicas, anti-affinity rules, and multi-zone deployments.

The eviction API is the mechanism that enforces PDBs. It is the “polite” way to remove a pod.

When something wants to evict a pod (kubectl drain, cluster autoscaler, etc.), it sends a POST request to:

POST /api/v1/namespaces/{namespace}/pods/{pod}/eviction

The API server intercepts this request and:

  1. Finds all PDBs that select the target pod.
  2. Calculates whether evicting this pod would violate any PDB.
  3. If the eviction would violate a PDB, returns 429 Too Many Requests or 500 with a retry-after header.
  4. If the eviction is allowed, the pod is terminated (SIGTERM, grace period, SIGKILL).

kubectl delete pod bypasses PDBs entirely. It sends a DELETE request, not an eviction. The API server does not check PDBs for DELETE requests.

This is intentional. kubectl delete pod is for emergency situations. The eviction API is for planned maintenance.

kubectl drain uses the eviction API internally. If a PDB blocks an eviction, kubectl drain waits and retries. You see messages like:

evicting pod "web-app-xyz"
error when evicting pods/"web-app-xyz" -n "pdb-demo":
Cannot evict pod as it would violate the pod's disruption budget.

The drain command keeps retrying until the PDB allows the eviction or a timeout is reached.

A PDB selects pods using the same label selector mechanism as Deployments, Services, and other controllers.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-app-pdb
namespace: pdb-demo
spec:
minAvailable: 2
selector:
matchLabels:
app: web-app

At least 2 pods matching app: web-app must remain running. If 4 replicas exist, 2 can be evicted. If 2 replicas exist, none can be evicted.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-app-pdb-alt
namespace: pdb-demo
spec:
maxUnavailable: 1
selector:
matchLabels:
app: web-app

At most 1 pod can be unavailable at any time. With 4 replicas, 1 can be evicted. With 100 replicas, still only 1 can be evicted.

You can set minAvailable or maxUnavailable, but not both. They express the same constraint from different angles.

minAvailable is better when:

  • You have a quorum requirement (etcd needs 2 of 3 nodes)
  • You have a fixed minimum for functionality (at least 1 worker must be running)

maxUnavailable is better when:

  • You want to control blast radius (drain one node at a time)
  • The total replica count changes frequently
  • You are using percentage-based PDBs

Both fields accept percentages:

spec:
maxUnavailable: "25%"

With 8 replicas, this allows 2 unavailable (25% of 8 = 2). With 4 replicas, it allows 1 (25% of 4 = 1, rounded up).

Percentages are better for autoscaled workloads where the replica count changes. A fixed maxUnavailable: 1 on a 100-pod deployment means draining a node takes a very long time because pods are evicted one at a time. maxUnavailable: "25%" allows 25 simultaneous evictions.

The ALLOWED DISRUPTIONS field shown by kubectl get pdb is computed as:

For minAvailable:

allowed = currentHealthy - minAvailable

For maxUnavailable:

allowed = maxUnavailable - (expectedPods - currentHealthy)

Where currentHealthy is the number of ready pods and expectedPods is the desired replica count.

The demo shows this with 4 replicas and minAvailable: 2:

allowed = 4 (healthy) - 2 (minAvailable) = 2

Scale to 2 replicas:

allowed = 2 (healthy) - 2 (minAvailable) = 0

When ALLOWED DISRUPTIONS is 0, no voluntary evictions are possible. A kubectl drain on the node will hang until pods are rescheduled or the PDB is deleted.

Rolling updates are voluntary disruptions, but they do not use the eviction API. The Deployment controller creates and deletes pods directly. So how do PDBs interact with rolling updates?

The Deployment controller respects PDBs indirectly through maxUnavailable in the rolling update strategy:

apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
namespace: pdb-demo
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1

The Deployment controller’s maxUnavailable and the PDB’s constraints interact:

  1. The Deployment controller deletes old pods and creates new ones.
  2. It respects its own maxUnavailable setting.
  3. If a PDB exists, the controller also checks the PDB before terminating old pods.

Starting in Kubernetes v1.22, the Deployment controller uses the eviction API when a PDB exists. If the PDB would be violated, the eviction is blocked and the rollout slows down.

This means: if you have a PDB with minAvailable: 3 on a 4-replica Deployment, and the Deployment has maxUnavailable: 2, the PDB wins. Only 1 pod can be taken down at a time (4 - 3 = 1).

A common problem: if minAvailable equals the replica count, no voluntary disruptions are allowed. Rolling updates freeze because the controller cannot terminate any old pod.

# This WILL deadlock rolling updates
spec:
minAvailable: 4 # Same as replica count

The rollout creates a new pod (via maxSurge) but cannot delete the old one. The new pod starts, but the old pod is never evicted. The rollout hangs.

Rule of thumb: minAvailable should always be less than the replica count, or use maxUnavailable instead.

Kubernetes v1.27 introduced the unhealthyPodEvictionPolicy field. This controls whether unhealthy pods (not-ready, unschedulable) are counted when calculating allowed disruptions.

apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
minAvailable: 2
unhealthyPodEvictionPolicy: AlwaysAllow
selector:
matchLabels:
app: web-app

Unhealthy pods can always be evicted, even if it would violate the PDB. This prevents a scenario where unhealthy pods block node drain because the PDB counts them as “available.”

spec:
unhealthyPodEvictionPolicy: IfHealthy

Unhealthy pods are only evictable if the PDB is satisfied. This is the legacy behavior.

Consider: 4 replicas, minAvailable: 2, but 2 pods are in CrashLoopBackOff. With IfHealthy, the healthy count is 2, which equals minAvailable, so ALLOWED DISRUPTIONS is 0. A node drain cannot proceed even though the unhealthy pods are not serving traffic anyway.

With AlwaysAllow, the unhealthy pods can be evicted regardless. The drain proceeds. This is almost always the behavior you want.

StatefulSets have unique PDB considerations because pods have stable identities and ordered startup/shutdown.

For a StatefulSet with replicas: 1 (common for databases), any PDB with minAvailable: 1 blocks all voluntary disruptions. The single pod cannot be evicted without violating the budget.

Options:

  • Use maxUnavailable: 1 to allow the single pod to be evicted
  • Accept that minAvailable: 1 blocks drain (you manage the pod manually)
  • Use minAvailable: 0 to always allow disruption (but this makes the PDB useless)

For systems like etcd or ZooKeeper with quorum requirements:

# etcd: 3 nodes, quorum needs 2
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
maxUnavailable: 1 # Only 1 node can be down
selector:
matchLabels:
app: etcd

This ensures quorum is maintained during drain operations. Two nodes cannot be evicted simultaneously.

StatefulSets with podManagementPolicy: OrderedReady shut down pods in reverse order (pod N-1 before pod N-2). PDBs add an additional constraint on top of this ordering. The PDB controls how many pods can be down; the StatefulSet controls which order they go down.

kubectl drain cordons the node (marks it unschedulable) and then evicts all pods:

Terminal window
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

The drain process:

  1. Cordons the node (no new pods can be scheduled).
  2. Lists all evictable pods on the node.
  3. For each pod, sends an eviction request.
  4. If a PDB blocks the eviction, retries with exponential backoff.
  5. Waits until all pods are evicted or the timeout is reached.

If a PDB blocks eviction, drain logs:

evicting pod "web-app-abc"
error when evicting pods/"web-app-abc": Cannot evict pod as it would violate the pod's disruption budget.

The drain retries until:

  • The PDB allows the eviction (another replica comes up elsewhere)
  • The --timeout flag expires (default: no timeout, waits forever)
  • You press Ctrl+C

If drain is stuck, you can use --force to delete pods that are not managed by a controller. But --force does not bypass PDBs. To truly bypass PDBs, you must delete the PDB or use kubectl delete pod directly.

In emergencies:

Terminal window
# Delete the PDB temporarily
kubectl delete pdb web-app-pdb -n pdb-demo
# Drain the node
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
# Recreate the PDB after drain
kubectl apply -f manifests/pdb-min-available.yaml

The cluster autoscaler scales down nodes that are underutilized. Before removing a node, it checks:

  1. Can all pods on this node be rescheduled elsewhere?
  2. Would evicting these pods violate any PDB?

If a PDB blocks the eviction, the autoscaler skips that node. It moves on to other candidates.

This means PDBs can prevent cluster scale-down. A single PDB with ALLOWED DISRUPTIONS: 0 on any pod on a node prevents that node from being removed, even if the node is 95% idle.

The cluster autoscaler respects the cluster-autoscaler.kubernetes.io/safe-to-evict annotation:

metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

This tells the autoscaler that the pod can be evicted even without a PDB, overriding the default behavior of keeping pods without a controller.

spec:
minAvailable: 4 # With 4 replicas

No disruptions ever allowed. Blocks drain, blocks rollouts, blocks autoscaler.

spec:
minAvailable: 3 # With 2 replicas

ALLOWED DISRUPTIONS is negative (clamped to 0). PDB is permanently blocking. This is usually a misconfiguration after scaling down.

If the label selector does not match any pods, the PDB is inert. No warnings. Check kubectl get pdb and verify the counts make sense.

If two PDBs select the same pods, both must be satisfied. The more restrictive one wins. This can create unexpected deadlocks.

DaemonSet pods run on every node. During drain, they are handled separately (--ignore-daemonsets). PDBs on DaemonSet pods can block drain but DaemonSet pods cannot be rescheduled to another node (they are per-node by definition). Be careful with DaemonSet PDBs.

Check PDB status to understand the current state:

Terminal window
kubectl get pdb -n pdb-demo -o yaml

The status section shows:

status:
currentHealthy: 4
desiredHealthy: 2
disruptionsAllowed: 2
expectedPods: 4
observedGeneration: 1
  • currentHealthy: Pods matching the selector that are ready
  • desiredHealthy: Minimum healthy pods required (from minAvailable or calculated)
  • disruptionsAllowed: How many pods can be evicted right now
  • expectedPods: Total pods matching the selector
spec:
maxUnavailable: "25%"

Percentage-based, scales with replica count. Allows reasonable parallelism during drain.

spec:
maxUnavailable: 1

One at a time. Preserves quorum and data replication.

spec:
minAvailable: 2
unhealthyPodEvictionPolicy: AlwaysAllow

Fixed minimum with healthy-only counting. Prevents broken pods from blocking operations.

Consider not using PDBs at all. They add operational complexity and can block CI/CD pipelines.