Vertical Pod Autoscaler
Automatically right-size resource requests and limits for pods based on actual usage.
Time: ~15 minutes Difficulty: Intermediate
What You Will Learn
Section titled “What You Will Learn”- VPA: automatically adjusting resource requests and limits
- Recommendation mode vs Auto mode
- VPA components: Recommender, Updater, Admission Controller
- How VPA analyzes historical resource usage
- When to use VPA vs HPA
- Resource policy constraints (min/max allowed)
Prerequisites
Section titled “Prerequisites”VPA must be installed in your cluster. VPA is not enabled by default in minikube.
Install VPA from the official repository:
git clone https://github.com/kubernetes/autoscaler.gitcd autoscaler/vertical-pod-autoscaler./hack/vpa-up.shVerify VPA components are running:
kubectl get pods -n kube-system | grep vpaYou should see three pods: vpa-admission-controller, vpa-recommender, and vpa-updater.
Also enable metrics server:
minikube addons enable metrics-serverDeploy
Section titled “Deploy”Navigate to the demo directory:
cd demos/vertical-pod-autoscalerCreate the namespace and deploy a workload with intentionally low resource requests:
kubectl apply -f manifests/namespace.yamlkubectl apply -f manifests/deployment.yamlCheck the current resource requests:
kubectl get deployment resource-consumer -n vpa-demo -o jsonpath='{.spec.template.spec.containers[0].resources}'You will see requests of 50m CPU and 64Mi memory, but the stress container actually uses much more.
Apply the VPA in recommendation mode (it will not modify pods automatically):
kubectl apply -f manifests/vpa.yamlVerify
Section titled “Verify”Wait 2-3 minutes for the VPA Recommender to analyze usage patterns.
Check the VPA recommendations:
kubectl describe vpa resource-consumer-vpa -n vpa-demoLook for the Recommendation section. You should see three types of recommendations:
- Lower Bound: minimum resources needed for the app to function
- Target: recommended request values based on actual usage
- Uncapped Target: what VPA would recommend without policy constraints
- Upper Bound: maximum resources the VPA considers reasonable
Compare the recommendations to actual pod usage:
kubectl top pods -n vpa-demoThe VPA recommendations should be higher than the current requests (50m CPU, 64Mi memory) because the stress workload is actually using more.
What is Happening
Section titled “What is Happening”manifests/ namespace.yaml # vpa-demo namespace deployment.yaml # Stress workload with low resource requests vpa.yaml # VPA in "Off" mode (recommendation only) vpa-auto.yaml # VPA in "Auto" mode (for experiments)VPA has three components:
- Recommender: watches pod metrics and calculates resource recommendations based on historical usage
- Updater: evicts pods that need to be updated with new resource requests (only in Auto mode)
- Admission Controller: mutates new pod specs with recommended resources when they are created
Update modes:
- Off: only provides recommendations, does not change pods
- Initial: sets resources when pods are first created, but does not update running pods
- Recreate: evicts pods to apply new recommendations (legacy mode)
- Auto: automatically applies recommendations by evicting and recreating pods
How VPA calculates recommendations:
VPA uses a histogram of past resource usage (typically 8 days) to compute recommendations. It aims for the 95th percentile of usage to ensure pods have enough resources most of the time while avoiding over-provisioning.
Experiment
Section titled “Experiment”-
Switch to Auto mode and watch VPA update the pods:
Terminal window kubectl delete vpa resource-consumer-vpa -n vpa-demokubectl apply -f manifests/vpa-auto.yamlWatch the pods being recreated:
Terminal window kubectl get pods -n vpa-demo -wAfter a few minutes, the VPA will evict pods one by one and recreate them with updated resource requests.
-
Check the new resource requests after VPA updates:
Terminal window kubectl get deployment resource-consumer -n vpa-demo -o jsonpath='{.spec.template.spec.containers[0].resources}' | jqYou should see the requests have increased to match the recommendations.
-
Compare VPA behavior to HPA:
- HPA: scales the number of replicas (horizontal scaling)
- VPA: adjusts resource requests per pod (vertical scaling)
- Do not use both on the same metric: VPA and HPA on CPU/memory will conflict. You can use HPA on custom metrics (like requests-per-second) while VPA manages CPU/memory.
-
Test the resource policy constraints by checking if recommendations respect
minAllowedandmaxAllowed:Terminal window kubectl describe vpa resource-consumer-vpa-auto -n vpa-demo | grep -A 10 "Container Recommendations"Recommendations should fall within the policy limits (50m-1000m CPU, 64Mi-512Mi memory).
Cleanup
Section titled “Cleanup”kubectl delete namespace vpa-demoFurther Reading
Section titled “Further Reading”See docs/deep-dive.md for a detailed explanation of VPA algorithms, update modes, resource policies, and production best practices.
Next Step
Section titled “Next Step”Move on to Advanced Ingress & Routing to explore Gateway API and Traefik for next-generation traffic management.