Skip to content

Vertical Pod Autoscaler

Automatically right-size resource requests and limits for pods based on actual usage.

Time: ~15 minutes Difficulty: Intermediate

  • VPA: automatically adjusting resource requests and limits
  • Recommendation mode vs Auto mode
  • VPA components: Recommender, Updater, Admission Controller
  • How VPA analyzes historical resource usage
  • When to use VPA vs HPA
  • Resource policy constraints (min/max allowed)

VPA must be installed in your cluster. VPA is not enabled by default in minikube.

Install VPA from the official repository:

Terminal window
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

Verify VPA components are running:

Terminal window
kubectl get pods -n kube-system | grep vpa

You should see three pods: vpa-admission-controller, vpa-recommender, and vpa-updater.

Also enable metrics server:

Terminal window
minikube addons enable metrics-server

Navigate to the demo directory:

Terminal window
cd demos/vertical-pod-autoscaler

Create the namespace and deploy a workload with intentionally low resource requests:

Terminal window
kubectl apply -f manifests/namespace.yaml
kubectl apply -f manifests/deployment.yaml

Check the current resource requests:

Terminal window
kubectl get deployment resource-consumer -n vpa-demo -o jsonpath='{.spec.template.spec.containers[0].resources}'

You will see requests of 50m CPU and 64Mi memory, but the stress container actually uses much more.

Apply the VPA in recommendation mode (it will not modify pods automatically):

Terminal window
kubectl apply -f manifests/vpa.yaml

Wait 2-3 minutes for the VPA Recommender to analyze usage patterns.

Check the VPA recommendations:

Terminal window
kubectl describe vpa resource-consumer-vpa -n vpa-demo

Look for the Recommendation section. You should see three types of recommendations:

  • Lower Bound: minimum resources needed for the app to function
  • Target: recommended request values based on actual usage
  • Uncapped Target: what VPA would recommend without policy constraints
  • Upper Bound: maximum resources the VPA considers reasonable

Compare the recommendations to actual pod usage:

Terminal window
kubectl top pods -n vpa-demo

The VPA recommendations should be higher than the current requests (50m CPU, 64Mi memory) because the stress workload is actually using more.

manifests/
namespace.yaml # vpa-demo namespace
deployment.yaml # Stress workload with low resource requests
vpa.yaml # VPA in "Off" mode (recommendation only)
vpa-auto.yaml # VPA in "Auto" mode (for experiments)

VPA has three components:

  1. Recommender: watches pod metrics and calculates resource recommendations based on historical usage
  2. Updater: evicts pods that need to be updated with new resource requests (only in Auto mode)
  3. Admission Controller: mutates new pod specs with recommended resources when they are created

Update modes:

  • Off: only provides recommendations, does not change pods
  • Initial: sets resources when pods are first created, but does not update running pods
  • Recreate: evicts pods to apply new recommendations (legacy mode)
  • Auto: automatically applies recommendations by evicting and recreating pods

How VPA calculates recommendations:

VPA uses a histogram of past resource usage (typically 8 days) to compute recommendations. It aims for the 95th percentile of usage to ensure pods have enough resources most of the time while avoiding over-provisioning.

  1. Switch to Auto mode and watch VPA update the pods:

    Terminal window
    kubectl delete vpa resource-consumer-vpa -n vpa-demo
    kubectl apply -f manifests/vpa-auto.yaml

    Watch the pods being recreated:

    Terminal window
    kubectl get pods -n vpa-demo -w

    After a few minutes, the VPA will evict pods one by one and recreate them with updated resource requests.

  2. Check the new resource requests after VPA updates:

    Terminal window
    kubectl get deployment resource-consumer -n vpa-demo -o jsonpath='{.spec.template.spec.containers[0].resources}' | jq

    You should see the requests have increased to match the recommendations.

  3. Compare VPA behavior to HPA:

    • HPA: scales the number of replicas (horizontal scaling)
    • VPA: adjusts resource requests per pod (vertical scaling)
    • Do not use both on the same metric: VPA and HPA on CPU/memory will conflict. You can use HPA on custom metrics (like requests-per-second) while VPA manages CPU/memory.
  4. Test the resource policy constraints by checking if recommendations respect minAllowed and maxAllowed:

    Terminal window
    kubectl describe vpa resource-consumer-vpa-auto -n vpa-demo | grep -A 10 "Container Recommendations"

    Recommendations should fall within the policy limits (50m-1000m CPU, 64Mi-512Mi memory).

Terminal window
kubectl delete namespace vpa-demo

See docs/deep-dive.md for a detailed explanation of VPA algorithms, update modes, resource policies, and production best practices.

Move on to Advanced Ingress & Routing to explore Gateway API and Traefik for next-generation traffic management.