Horizontal Pod Autoscaler (HPA)
Automatically scale pods up and down based on CPU utilization.
Time: ~15 minutes Difficulty: Intermediate
What You Will Learn
Section titled “What You Will Learn”- HPA: scaling replicas based on metrics
- CPU utilization targets and scaling thresholds
- Scale-up and scale-down behavior
- Stabilization windows to prevent flapping
- Why resource requests are required for HPA to work
Prerequisites
Section titled “Prerequisites”Metrics server must be enabled:
minikube addons enable metrics-serverWait a minute for metrics to start flowing:
kubectl top nodesDeploy
Section titled “Deploy”Navigate to the demo directory:
cd demos/hpakubectl apply -f manifests/namespace.yamlkubectl apply -f manifests/app.yamlkubectl apply -f manifests/hpa.yamlVerify the HPA can read metrics (may show <unknown> for a minute):
kubectl get hpa -n hpa-demo -wWait until TARGETS shows an actual percentage (e.g., 0%/50%).
Generate Load
Section titled “Generate Load”kubectl apply -f manifests/load-generator.yamlNow watch the HPA react in a separate terminal:
kubectl get hpa -n hpa-demo -wWithin 1-2 minutes, you should see:
- CPU utilization climbing above 50%
REPLICASincreasing from 1 to 2, then 3, then more- CPU utilization stabilizing as load is spread across pods
Watch pods scale up:
kubectl get pods -n hpa-demo -wStop Load and Watch Scale-Down
Section titled “Stop Load and Watch Scale-Down”kubectl delete pod load-generator -n hpa-demoWatch the HPA scale back down (takes about 60 seconds due to the stabilization window):
kubectl get hpa -n hpa-demo -wWhat is Happening
Section titled “What is Happening”manifests/ namespace.yaml # hpa-demo namespace app.yaml # Deployment + Service (CPU-intensive app) hpa.yaml # HPA targeting 50% CPU utilization load-generator.yaml # Pod that hammers the app with requestsHow the HPA decides to scale:
- Metrics server collects CPU usage from each pod every 15 seconds
- HPA checks metrics every 15 seconds (default)
- It calculates:
desiredReplicas = ceil(currentReplicas * (currentUtilization / targetUtilization)) - If CPU is at 100% with target 50% and 1 replica:
ceil(1 * (100/50)) = 2replicas - Scale-down waits for the stabilization window (60s) to avoid flapping
Why resource requests matter: The HPA calculates utilization as a percentage of the pod’s CPU requests. Without requests, the HPA cannot compute utilization and will not scale.
Experiment
Section titled “Experiment”-
Change the target to 30% and watch more aggressive scaling:
Terminal window kubectl patch hpa cpu-burner -n hpa-demo \--type=merge -p '{"spec":{"metrics":[{"type":"Resource","resource":{"name":"cpu","target":{"type":"Utilization","averageUtilization":30}}}]}}' -
Check HPA events to see scaling decisions:
Terminal window kubectl describe hpa cpu-burner -n hpa-demo -
Set a longer stabilization window to see slower scale-down:
Terminal window kubectl patch hpa cpu-burner -n hpa-demo \--type=merge -p '{"spec":{"behavior":{"scaleDown":{"stabilizationWindowSeconds":300}}}}'
Cleanup
Section titled “Cleanup”kubectl delete namespace hpa-demoFurther Reading
Section titled “Further Reading”See docs/deep-dive.md for a detailed explanation of the scaling algorithm, custom metrics (memory, requests-per-second), scale-up/scale-down policies, VPA vs HPA, and production tuning.
Next Step
Section titled “Next Step”Move on to RBAC to learn about ServiceAccounts, Roles, and access control.