Pod Disruption Budgets: Deep Dive
This document explains how PodDisruptionBudgets protect application availability during voluntary disruptions. It covers the eviction API, the difference between voluntary and involuntary disruptions, PDB interaction with rolling updates, and strategies for stateful workloads.
Voluntary vs Involuntary Disruptions
Section titled “Voluntary vs Involuntary Disruptions”Kubernetes distinguishes between two categories of disruption. This distinction determines whether PDBs have any effect.
Voluntary Disruptions
Section titled “Voluntary Disruptions”These are planned, controllable events initiated by a human or an automated system through the Kubernetes API:
kubectl drain(node maintenance, OS upgrades)- Cluster autoscaler scaling down underutilized nodes
- Deployment rolling updates
- Manual pod eviction via the eviction API
- Cluster upgrades
- Spot instance reclamation (when using preemptible VMs with proper integration)
PDBs are respected during all voluntary disruptions. The system checks the PDB before evicting a pod and blocks the eviction if it would violate the budget.
Involuntary Disruptions
Section titled “Involuntary Disruptions”These are unplanned, uncontrollable events:
- Node hardware failure
- Kernel panic
- OOM killer terminating a container
- Cloud provider VM deletion
- Network partition isolating a node
PDBs have no effect on involuntary disruptions. The pod is already gone before anything can check a budget. You cannot prevent a hardware failure with a YAML file.
This is a key understanding: PDBs protect against operational actions, not infrastructure failures. For infrastructure failures, you need replicas, anti-affinity rules, and multi-zone deployments.
The Eviction API
Section titled “The Eviction API”The eviction API is the mechanism that enforces PDBs. It is the “polite” way to remove a pod.
How It Works
Section titled “How It Works”When something wants to evict a pod (kubectl drain, cluster autoscaler, etc.), it sends a POST request to:
POST /api/v1/namespaces/{namespace}/pods/{pod}/evictionThe API server intercepts this request and:
- Finds all PDBs that select the target pod.
- Calculates whether evicting this pod would violate any PDB.
- If the eviction would violate a PDB, returns 429 Too Many Requests or 500 with a retry-after header.
- If the eviction is allowed, the pod is terminated (SIGTERM, grace period, SIGKILL).
Eviction vs Deletion
Section titled “Eviction vs Deletion”kubectl delete pod bypasses PDBs entirely. It sends a DELETE request, not an eviction. The API server does not check PDBs for DELETE requests.
This is intentional. kubectl delete pod is for emergency situations. The eviction API is for planned maintenance.
kubectl drain uses the eviction API internally. If a PDB blocks an eviction, kubectl drain waits and retries. You see messages like:
evicting pod "web-app-xyz"error when evicting pods/"web-app-xyz" -n "pdb-demo":Cannot evict pod as it would violate the pod's disruption budget.The drain command keeps retrying until the PDB allows the eviction or a timeout is reached.
PDB Configuration
Section titled “PDB Configuration”A PDB selects pods using the same label selector mechanism as Deployments, Services, and other controllers.
minAvailable
Section titled “minAvailable”apiVersion: policy/v1kind: PodDisruptionBudgetmetadata: name: web-app-pdb namespace: pdb-demospec: minAvailable: 2 selector: matchLabels: app: web-appAt least 2 pods matching app: web-app must remain running. If 4 replicas exist, 2 can be evicted. If 2 replicas exist, none can be evicted.
maxUnavailable
Section titled “maxUnavailable”apiVersion: policy/v1kind: PodDisruptionBudgetmetadata: name: web-app-pdb-alt namespace: pdb-demospec: maxUnavailable: 1 selector: matchLabels: app: web-appAt most 1 pod can be unavailable at any time. With 4 replicas, 1 can be evicted. With 100 replicas, still only 1 can be evicted.
Choosing Between Them
Section titled “Choosing Between Them”You can set minAvailable or maxUnavailable, but not both. They express the same constraint from different angles.
minAvailable is better when:
- You have a quorum requirement (etcd needs 2 of 3 nodes)
- You have a fixed minimum for functionality (at least 1 worker must be running)
maxUnavailable is better when:
- You want to control blast radius (drain one node at a time)
- The total replica count changes frequently
- You are using percentage-based PDBs
Percentage-Based PDBs
Section titled “Percentage-Based PDBs”Both fields accept percentages:
spec: maxUnavailable: "25%"With 8 replicas, this allows 2 unavailable (25% of 8 = 2). With 4 replicas, it allows 1 (25% of 4 = 1, rounded up).
Percentages are better for autoscaled workloads where the replica count changes. A fixed maxUnavailable: 1 on a 100-pod deployment means draining a node takes a very long time because pods are evicted one at a time. maxUnavailable: "25%" allows 25 simultaneous evictions.
The ALLOWED DISRUPTIONS Calculation
Section titled “The ALLOWED DISRUPTIONS Calculation”The ALLOWED DISRUPTIONS field shown by kubectl get pdb is computed as:
For minAvailable:
allowed = currentHealthy - minAvailableFor maxUnavailable:
allowed = maxUnavailable - (expectedPods - currentHealthy)Where currentHealthy is the number of ready pods and expectedPods is the desired replica count.
The demo shows this with 4 replicas and minAvailable: 2:
allowed = 4 (healthy) - 2 (minAvailable) = 2Scale to 2 replicas:
allowed = 2 (healthy) - 2 (minAvailable) = 0When ALLOWED DISRUPTIONS is 0, no voluntary evictions are possible. A kubectl drain on the node will hang until pods are rescheduled or the PDB is deleted.
PDB Interaction with Rolling Updates
Section titled “PDB Interaction with Rolling Updates”Rolling updates are voluntary disruptions, but they do not use the eviction API. The Deployment controller creates and deletes pods directly. So how do PDBs interact with rolling updates?
The Deployment controller respects PDBs indirectly through maxUnavailable in the rolling update strategy:
apiVersion: apps/v1kind: Deploymentmetadata: name: web-app namespace: pdb-demospec: replicas: 4 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 1The Deployment controller’s maxUnavailable and the PDB’s constraints interact:
- The Deployment controller deletes old pods and creates new ones.
- It respects its own
maxUnavailablesetting. - If a PDB exists, the controller also checks the PDB before terminating old pods.
Starting in Kubernetes v1.22, the Deployment controller uses the eviction API when a PDB exists. If the PDB would be violated, the eviction is blocked and the rollout slows down.
This means: if you have a PDB with minAvailable: 3 on a 4-replica Deployment, and the Deployment has maxUnavailable: 2, the PDB wins. Only 1 pod can be taken down at a time (4 - 3 = 1).
Rollout Deadlocks
Section titled “Rollout Deadlocks”A common problem: if minAvailable equals the replica count, no voluntary disruptions are allowed. Rolling updates freeze because the controller cannot terminate any old pod.
# This WILL deadlock rolling updatesspec: minAvailable: 4 # Same as replica countThe rollout creates a new pod (via maxSurge) but cannot delete the old one. The new pod starts, but the old pod is never evicted. The rollout hangs.
Rule of thumb: minAvailable should always be less than the replica count, or use maxUnavailable instead.
Unhealthy Pod Eviction Policy
Section titled “Unhealthy Pod Eviction Policy”Kubernetes v1.27 introduced the unhealthyPodEvictionPolicy field. This controls whether unhealthy pods (not-ready, unschedulable) are counted when calculating allowed disruptions.
AlwaysAllow (default in v1.27+)
Section titled “AlwaysAllow (default in v1.27+)”apiVersion: policy/v1kind: PodDisruptionBudgetspec: minAvailable: 2 unhealthyPodEvictionPolicy: AlwaysAllow selector: matchLabels: app: web-appUnhealthy pods can always be evicted, even if it would violate the PDB. This prevents a scenario where unhealthy pods block node drain because the PDB counts them as “available.”
IfHealthy
Section titled “IfHealthy”spec: unhealthyPodEvictionPolicy: IfHealthyUnhealthy pods are only evictable if the PDB is satisfied. This is the legacy behavior.
Why This Matters
Section titled “Why This Matters”Consider: 4 replicas, minAvailable: 2, but 2 pods are in CrashLoopBackOff. With IfHealthy, the healthy count is 2, which equals minAvailable, so ALLOWED DISRUPTIONS is 0. A node drain cannot proceed even though the unhealthy pods are not serving traffic anyway.
With AlwaysAllow, the unhealthy pods can be evicted regardless. The drain proceeds. This is almost always the behavior you want.
PDB Strategies for StatefulSets
Section titled “PDB Strategies for StatefulSets”StatefulSets have unique PDB considerations because pods have stable identities and ordered startup/shutdown.
Single-Instance StatefulSets
Section titled “Single-Instance StatefulSets”For a StatefulSet with replicas: 1 (common for databases), any PDB with minAvailable: 1 blocks all voluntary disruptions. The single pod cannot be evicted without violating the budget.
Options:
- Use
maxUnavailable: 1to allow the single pod to be evicted - Accept that
minAvailable: 1blocks drain (you manage the pod manually) - Use
minAvailable: 0to always allow disruption (but this makes the PDB useless)
Quorum-Based Systems
Section titled “Quorum-Based Systems”For systems like etcd or ZooKeeper with quorum requirements:
# etcd: 3 nodes, quorum needs 2apiVersion: policy/v1kind: PodDisruptionBudgetspec: maxUnavailable: 1 # Only 1 node can be down selector: matchLabels: app: etcdThis ensures quorum is maintained during drain operations. Two nodes cannot be evicted simultaneously.
Ordered Shutdown
Section titled “Ordered Shutdown”StatefulSets with podManagementPolicy: OrderedReady shut down pods in reverse order (pod N-1 before pod N-2). PDBs add an additional constraint on top of this ordering. The PDB controls how many pods can be down; the StatefulSet controls which order they go down.
Drain Behavior
Section titled “Drain Behavior”kubectl drain cordons the node (marks it unschedulable) and then evicts all pods:
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-dataThe drain process:
- Cordons the node (no new pods can be scheduled).
- Lists all evictable pods on the node.
- For each pod, sends an eviction request.
- If a PDB blocks the eviction, retries with exponential backoff.
- Waits until all pods are evicted or the timeout is reached.
Drain Stuck on PDB
Section titled “Drain Stuck on PDB”If a PDB blocks eviction, drain logs:
evicting pod "web-app-abc"error when evicting pods/"web-app-abc": Cannot evict pod as it would violate the pod's disruption budget.The drain retries until:
- The PDB allows the eviction (another replica comes up elsewhere)
- The
--timeoutflag expires (default: no timeout, waits forever) - You press Ctrl+C
Force Drain
Section titled “Force Drain”If drain is stuck, you can use --force to delete pods that are not managed by a controller. But --force does not bypass PDBs. To truly bypass PDBs, you must delete the PDB or use kubectl delete pod directly.
In emergencies:
# Delete the PDB temporarilykubectl delete pdb web-app-pdb -n pdb-demo
# Drain the nodekubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
# Recreate the PDB after drainkubectl apply -f manifests/pdb-min-available.yamlCluster Autoscaler Interaction
Section titled “Cluster Autoscaler Interaction”The cluster autoscaler scales down nodes that are underutilized. Before removing a node, it checks:
- Can all pods on this node be rescheduled elsewhere?
- Would evicting these pods violate any PDB?
If a PDB blocks the eviction, the autoscaler skips that node. It moves on to other candidates.
This means PDBs can prevent cluster scale-down. A single PDB with ALLOWED DISRUPTIONS: 0 on any pod on a node prevents that node from being removed, even if the node is 95% idle.
Autoscaler Annotations
Section titled “Autoscaler Annotations”The cluster autoscaler respects the cluster-autoscaler.kubernetes.io/safe-to-evict annotation:
metadata: annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: "true"This tells the autoscaler that the pod can be evicted even without a PDB, overriding the default behavior of keeping pods without a controller.
PDB Anti-Patterns
Section titled “PDB Anti-Patterns”1. minAvailable = replicas
Section titled “1. minAvailable = replicas”spec: minAvailable: 4 # With 4 replicasNo disruptions ever allowed. Blocks drain, blocks rollouts, blocks autoscaler.
2. PDB Without Enough Replicas
Section titled “2. PDB Without Enough Replicas”spec: minAvailable: 3 # With 2 replicasALLOWED DISRUPTIONS is negative (clamped to 0). PDB is permanently blocking. This is usually a misconfiguration after scaling down.
3. PDB Selector Matches No Pods
Section titled “3. PDB Selector Matches No Pods”If the label selector does not match any pods, the PDB is inert. No warnings. Check kubectl get pdb and verify the counts make sense.
4. Multiple PDBs Selecting the Same Pods
Section titled “4. Multiple PDBs Selecting the Same Pods”If two PDBs select the same pods, both must be satisfied. The more restrictive one wins. This can create unexpected deadlocks.
5. Forgetting DaemonSet Pods
Section titled “5. Forgetting DaemonSet Pods”DaemonSet pods run on every node. During drain, they are handled separately (--ignore-daemonsets). PDBs on DaemonSet pods can block drain but DaemonSet pods cannot be rescheduled to another node (they are per-node by definition). Be careful with DaemonSet PDBs.
PDB Status
Section titled “PDB Status”Check PDB status to understand the current state:
kubectl get pdb -n pdb-demo -o yamlThe status section shows:
status: currentHealthy: 4 desiredHealthy: 2 disruptionsAllowed: 2 expectedPods: 4 observedGeneration: 1currentHealthy: Pods matching the selector that are readydesiredHealthy: Minimum healthy pods required (from minAvailable or calculated)disruptionsAllowed: How many pods can be evicted right nowexpectedPods: Total pods matching the selector
PDB Best Practices
Section titled “PDB Best Practices”For Stateless Web Services
Section titled “For Stateless Web Services”spec: maxUnavailable: "25%"Percentage-based, scales with replica count. Allows reasonable parallelism during drain.
For Databases and Stateful Services
Section titled “For Databases and Stateful Services”spec: maxUnavailable: 1One at a time. Preserves quorum and data replication.
For Critical Infrastructure
Section titled “For Critical Infrastructure”spec: minAvailable: 2 unhealthyPodEvictionPolicy: AlwaysAllowFixed minimum with healthy-only counting. Prevents broken pods from blocking operations.
For Development/Staging
Section titled “For Development/Staging”Consider not using PDBs at all. They add operational complexity and can block CI/CD pipelines.