Vertical Pod Autoscaler

Automatically right-size resource requests and limits for pods based on actual usage.

Time: ~15 minutes Difficulty: Intermediate

What You Will Learn

VPA: automatically adjusting resource requests and limits
Recommendation mode vs Auto mode
VPA components: Recommender, Updater, Admission Controller
How VPA analyzes historical resource usage
When to use VPA vs HPA
Resource policy constraints (min/max allowed)

Prerequisites

VPA must be installed in your cluster. VPA is not enabled by default in minikube.

Install VPA from the official repository:

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

Verify VPA components are running:

kubectl get pods -n kube-system | grep vpa

You should see three pods: vpa-admission-controller, vpa-recommender, and vpa-updater.

Also enable metrics server:

minikube addons enable metrics-server

Deploy

Navigate to the demo directory:

cd demos/vertical-pod-autoscaler

Create the namespace and deploy a workload with intentionally low resource requests:

kubectl apply -f manifests/namespace.yaml
kubectl apply -f manifests/deployment.yaml

Check the current resource requests:

kubectl get deployment resource-consumer -n vpa-demo -o jsonpath='{.spec.template.spec.containers[0].resources}'

You will see requests of 50m CPU and 64Mi memory, but the stress container actually uses much more.

Apply the VPA in recommendation mode (it will not modify pods automatically):

kubectl apply -f manifests/vpa.yaml

Verify

Wait 2-3 minutes for the VPA Recommender to analyze usage patterns.

Check the VPA recommendations:

kubectl describe vpa resource-consumer-vpa -n vpa-demo

Look for the Recommendation section. You should see three types of recommendations:

Lower Bound: minimum resources needed for the app to function
Target: recommended request values based on actual usage
Uncapped Target: what VPA would recommend without policy constraints
Upper Bound: maximum resources the VPA considers reasonable

Compare the recommendations to actual pod usage:

kubectl top pods -n vpa-demo

The VPA recommendations should be higher than the current requests (50m CPU, 64Mi memory) because the stress workload is actually using more.

What is Happening

manifests/
  namespace.yaml        # vpa-demo namespace
  deployment.yaml       # Stress workload with low resource requests
  vpa.yaml              # VPA in "Off" mode (recommendation only)
  vpa-auto.yaml         # VPA in "Auto" mode (for experiments)

VPA has three components:

Recommender: watches pod metrics and calculates resource recommendations based on historical usage
Updater: evicts pods that need to be updated with new resource requests (only in Auto mode)
Admission Controller: mutates new pod specs with recommended resources when they are created

Update modes:

Off: only provides recommendations, does not change pods
Initial: sets resources when pods are first created, but does not update running pods
Recreate: evicts pods to apply new recommendations (legacy mode)
Auto: automatically applies recommendations by evicting and recreating pods

How VPA calculates recommendations:

VPA uses a histogram of past resource usage (typically 8 days) to compute recommendations. It aims for the 95th percentile of usage to ensure pods have enough resources most of the time while avoiding over-provisioning.

Experiment

Switch to Auto mode and watch VPA update the pods:
Terminal window
```
kubectl delete vpa resource-consumer-vpa -n vpa-demo
kubectl apply -f manifests/vpa-auto.yaml
```
Watch the pods being recreated:
Terminal window
```
kubectl get pods -n vpa-demo -w
```
After a few minutes, the VPA will evict pods one by one and recreate them with updated resource requests.
Check the new resource requests after VPA updates:
Terminal window
```
kubectl get deployment resource-consumer -n vpa-demo -o jsonpath='{.spec.template.spec.containers[0].resources}' | jq
```
You should see the requests have increased to match the recommendations.
Compare VPA behavior to HPA:
- HPA: scales the number of replicas (horizontal scaling)
- VPA: adjusts resource requests per pod (vertical scaling)
- Do not use both on the same metric: VPA and HPA on CPU/memory will conflict. You can use HPA on custom metrics (like requests-per-second) while VPA manages CPU/memory.
Test the resource policy constraints by checking if recommendations respect minAllowed and maxAllowed:
Terminal window
```
kubectl describe vpa resource-consumer-vpa-auto -n vpa-demo | grep -A 10 "Container Recommendations"
```
Recommendations should fall within the policy limits (50m-1000m CPU, 64Mi-512Mi memory).

Cleanup

kubectl delete namespace vpa-demo

Next Step

Move on to Advanced Ingress & Routing to explore Gateway API and Traefik for next-generation traffic management.