Horizontal Pod Autoscaler (HPA)

Automatically scale pods up and down based on CPU utilization.

Time: ~15 minutes Difficulty: Intermediate

What You Will Learn

HPA: scaling replicas based on metrics
CPU utilization targets and scaling thresholds
Scale-up and scale-down behavior
Stabilization windows to prevent flapping
Why resource requests are required for HPA to work

Prerequisites

Metrics server must be enabled:

minikube addons enable metrics-server

Wait a minute for metrics to start flowing:

kubectl top nodes

Deploy

Navigate to the demo directory:

cd demos/hpa

kubectl apply -f manifests/namespace.yaml
kubectl apply -f manifests/app.yaml
kubectl apply -f manifests/hpa.yaml

Verify the HPA can read metrics (may show <unknown> for a minute):

kubectl get hpa -n hpa-demo -w

Wait until TARGETS shows an actual percentage (e.g., 0%/50%).

Generate Load

kubectl apply -f manifests/load-generator.yaml

Now watch the HPA react in a separate terminal:

kubectl get hpa -n hpa-demo -w

Within 1-2 minutes, you should see:

CPU utilization climbing above 50%
REPLICAS increasing from 1 to 2, then 3, then more
CPU utilization stabilizing as load is spread across pods

Watch pods scale up:

kubectl get pods -n hpa-demo -w

Stop Load and Watch Scale-Down

kubectl delete pod load-generator -n hpa-demo

Watch the HPA scale back down (takes about 60 seconds due to the stabilization window):

kubectl get hpa -n hpa-demo -w

What is Happening

manifests/
  namespace.yaml        # hpa-demo namespace
  app.yaml              # Deployment + Service (CPU-intensive app)
  hpa.yaml              # HPA targeting 50% CPU utilization
  load-generator.yaml   # Pod that hammers the app with requests

How the HPA decides to scale:

Metrics server collects CPU usage from each pod every 15 seconds
HPA checks metrics every 15 seconds (default)
It calculates: desiredReplicas = ceil(currentReplicas * (currentUtilization / targetUtilization))
If CPU is at 100% with target 50% and 1 replica: ceil(1 * (100/50)) = 2 replicas
Scale-down waits for the stabilization window (60s) to avoid flapping

Why resource requests matter: The HPA calculates utilization as a percentage of the pod’s CPU requests. Without requests, the HPA cannot compute utilization and will not scale.

Experiment

Change the target to 30% and watch more aggressive scaling:

kubectl patch hpa cpu-burner -n hpa-demo \
  --type=merge -p '{"spec":{"metrics":[{"type":"Resource","resource":{"name":"cpu","target":{"type":"Utilization","averageUtilization":30}}}]}}'

Check HPA events to see scaling decisions:
Terminal window
```
kubectl describe hpa cpu-burner -n hpa-demo
```

Set a longer stabilization window to see slower scale-down:

kubectl patch hpa cpu-burner -n hpa-demo \
  --type=merge -p '{"spec":{"behavior":{"scaleDown":{"stabilizationWindowSeconds":300}}}}'

Cleanup

kubectl delete namespace hpa-demo

Next Step

Move on to RBAC to learn about ServiceAccounts, Roles, and access control.