Skip to content

Resource Quotas & LimitRanges: Deep Dive

This document explains how Kubernetes enforces resource governance at the namespace level. It covers the ResourceQuota admission controller, LimitRange mechanics, quota scopes, count quotas for custom resources, and strategies for multi-team namespace management.

ResourceQuota enforcement happens through an admission controller. When a pod creation or update request arrives at the API server, the ResourceQuota admission controller intercepts it.

The admission controller:

  1. Looks up all ResourceQuota objects in the target namespace.
  2. Calculates the total resource consumption after the request would be applied.
  3. If the new total exceeds any quota, the request is rejected with a 403 Forbidden error.
  4. If the request fits, the quota’s status.used field is updated.

This is a synchronous, blocking check. The pod is never created if it would exceed the quota. This differs from resource limits on containers, which allow the container to start and then enforce via the kernel (OOM kill, CPU throttling).

The demo defines a quota in compute-quota:

apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: quota-demo
spec:
hard:
requests.cpu: "1"
requests.memory: 1Gi
limits.cpu: "2"
limits.memory: 2Gi
pods: "5"
services: "3"
persistentvolumeclaims: "2"

When a ResourceQuota exists in a namespace that tracks compute resources (requests.cpu, limits.memory, etc.), every pod in that namespace must specify resource requests and limits for those tracked resources. If a pod does not have resource specs, it is rejected.

This is why LimitRange exists: it provides default resource specs for pods that do not define their own.

Without a LimitRange, a bare kubectl run nginx --image=nginx in a quota-enabled namespace fails with:

Error: pods "nginx" is forbidden: failed quota: compute-quota:
must specify requests.cpu, requests.memory, limits.cpu, limits.memory

The quota counts resources from all pods in the namespace, regardless of their status. A pod in Pending state still counts against the quota because its resource requests are reserved. Only Succeeded and Failed pods (terminal states) do not count.

This means: if you have 5 pods stuck in Pending (image pull error, scheduling failure), they still consume quota. New pods cannot be created until the stuck pods are cleaned up.

Quotas can be scoped to apply only to certain types of pods. Scopes narrow which pods count against the quota.

Applies only to pods with no resource requests or limits on any container:

apiVersion: v1
kind: ResourceQuota
metadata:
name: best-effort-quota
spec:
hard:
pods: "10"
scopes:
- BestEffort

This limits the number of BestEffort pods (pods without resource specs). Pods with resource specs are not affected.

The inverse. Applies only to pods that have at least one container with resource requests or limits:

apiVersion: v1
kind: ResourceQuota
metadata:
name: not-best-effort-quota
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
pods: "20"
scopes:
- NotBestEffort

These scope quotas based on whether pods have an activeDeadlineSeconds set:

  • Terminating: Pods with activeDeadlineSeconds set (Jobs, batch workloads)
  • NotTerminating: Pods without activeDeadlineSeconds (long-running services)
apiVersion: v1
kind: ResourceQuota
metadata:
name: batch-quota
spec:
hard:
pods: "50"
requests.cpu: "8"
scopes:
- Terminating

This allows up to 50 batch job pods consuming 8 CPU total, without affecting the quota for long-running services.

Quota can target pods of a specific priority class using scopeSelector:

apiVersion: v1
kind: ResourceQuota
metadata:
name: high-priority-quota
spec:
hard:
pods: "10"
requests.cpu: "4"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values:
- high-priority

This is powerful for multi-team clusters. You can give each team a quota for high-priority workloads and a separate, larger quota for low-priority workloads. High-priority pods are scarce and controlled. Low-priority pods can use more resources but get preempted first.

Added in v1.24. Limits the number of pods that use cross-namespace affinity terms:

apiVersion: v1
kind: ResourceQuota
metadata:
name: cross-ns-quota
spec:
hard:
pods: "5"
scopes:
- CrossNamespacePodAffinity

Cross-namespace affinity is a potential security concern because a pod in namespace A can influence scheduling near pods in namespace B. This scope limits that behavior.

Beyond compute resources, quotas can count any API object type:

apiVersion: v1
kind: ResourceQuota
metadata:
name: object-counts
spec:
hard:
pods: "5"
services: "3"
persistentvolumeclaims: "2"
configmaps: "10"
secrets: "10"
services.loadbalancers: "1"
services.nodeports: "2"

You can quota custom resources using the count/ prefix:

apiVersion: v1
kind: ResourceQuota
metadata:
name: crd-quota
spec:
hard:
count/certificates.cert-manager.io: "20"
count/virtualservices.networking.istio.io: "10"

The format is count/<resource>.<api-group>. This prevents teams from creating unbounded numbers of custom resources.

LimitRange operates at the individual container or pod level, while ResourceQuota operates at the namespace level. They serve different purposes:

  • ResourceQuota: Total budget for the namespace
  • LimitRange: Per-container constraints and defaults

The demo’s LimitRange:

apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: quota-demo
spec:
limits:
- type: Container
default:
cpu: 200m
memory: 128Mi
defaultRequest:
cpu: 50m
memory: 64Mi
min:
cpu: 25m
memory: 32Mi
max:
cpu: 500m
memory: 512Mi
FieldWhat It Does
defaultApplied as the container’s limit if none specified
defaultRequestApplied as the container’s request if none specified
minMinimum allowed request/limit. Pod is rejected if below this
maxMaximum allowed request/limit. Pod is rejected if above this
maxLimitRequestRatioMaximum allowed ratio of limit to request

The maxLimitRequestRatio is interesting. If set to 2, a container requesting 100m CPU can have at most 200m CPU limit. This prevents overcommit where a container requests 50m but has a limit of 4000m, potentially starving other containers.

LimitRange defaults are injected by a mutating admission controller. When a pod spec arrives without resource specs:

  1. The controller checks if any LimitRange exists in the namespace.
  2. For each container missing resources.limits, it injects the default values.
  3. For each container missing resources.requests, it injects the defaultRequest values.
  4. If defaultRequest is not set but default is, the request is set equal to the limit.

The injection happens before the ResourceQuota check. So even if a user submits a pod without resources, the LimitRange injects defaults, and then the ResourceQuota validates the totals.

LimitRange supports three types:

Container (most common):

limits:
- type: Container
default:
cpu: 200m
max:
cpu: 500m

Applies to individual containers within a pod.

Pod:

limits:
- type: Pod
max:
cpu: "2"
memory: 4Gi

Limits the total resources of all containers in a pod combined. This prevents a pod with 10 containers from consuming 10x the container max.

PersistentVolumeClaim:

limits:
- type: PersistentVolumeClaim
min:
storage: 1Gi
max:
storage: 100Gi

Controls PVC sizes. Prevents users from requesting 1 TiB PVCs when the storage class has limited capacity.

These two resources work together in a specific order:

  1. Pod spec arrives at the API server.
  2. LimitRange mutating admission injects defaults (if needed).
  3. LimitRange validating admission checks min/max/ratio constraints.
  4. ResourceQuota admission checks namespace totals.

If the LimitRange defaults cause the pod to exceed the ResourceQuota, the pod is rejected. The error message comes from ResourceQuota, not LimitRange, which can be confusing.

The demo shows this interaction:

small-app (fits within quota):

spec:
replicas: 2
template:
spec:
containers:
- name: nginx
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi

Total for 2 replicas: 200m CPU request, 400m CPU limit, 128Mi memory request, 256Mi memory limit.

greedy-app (exceeds quota):

spec:
replicas: 3
template:
spec:
containers:
- name: nginx
resources:
requests:
cpu: 400m
memory: 256Mi
limits:
cpu: 800m
memory: 512Mi

small-app already uses 200m CPU request. greedy-app wants 400m * 3 = 1200m CPU request. Total would be 1400m. Quota allows 1000m. Only 2 of the 3 greedy-app pods can be created (200m existing + 400m * 2 = 1000m exactly). The third pod is rejected.

The Deployment controller keeps retrying. If small-app is scaled down or deleted, the quota frees up and the third greedy-app pod can be created.

A namespace can have multiple ResourceQuota objects. Each one enforces independently. The pod must satisfy all of them.

# Compute quota
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
spec:
hard:
requests.cpu: "4"
limits.memory: 8Gi
---
# Object count quota
apiVersion: v1
kind: ResourceQuota
metadata:
name: count-quota
spec:
hard:
pods: "20"
services: "5"

Use separate quotas for different concerns: compute resources, object counts, storage. This makes it easier to adjust one without touching the other.

In multi-team clusters, quotas are the primary mechanism for resource governance.

The simplest model. Each team gets a namespace with a ResourceQuota:

apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "8"
requests.memory: 16Gi
limits.cpu: "16"
limits.memory: 32Gi
pods: "50"

Teams get separate namespaces for dev, staging, and production with different quotas. Dev gets a small quota (2 CPU, 10 pods). Production gets a larger one (16 CPU, 100 pods). The Hierarchical Namespace Controller (HNC) can propagate quotas from parent to child namespaces for more automated governance.

Kubernetes assigns QoS classes based on resource specs: Guaranteed (requests equal limits), Burstable (requests set but not equal to limits), and BestEffort (no requests or limits). The BestEffort and NotBestEffort quota scopes correspond to these classes. BestEffort pods are evicted first under memory pressure.

Quotas can limit storage with requests.storage and per-StorageClass quotas using the <storageclass>.storageclass.storage.k8s.io/ prefix. Monitor quota usage with kubectl describe resourcequota, the kube_resourcequota Prometheus metric, or kubectl get events --field-selector reason=FailedCreate.

If ResourceQuota tracks compute resources but no LimitRange provides defaults, pods without resource specs are rejected. Always pair ResourceQuota with LimitRange.

Pending and CrashLoopBackOff pods still count. A namespace full of broken pods prevents new pods from being created.

When a Deployment cannot create pods due to quota, the ReplicaSet controller retries with exponential backoff. The Deployment appears to accept the request, but replicas stay at a lower count. Check events, not just kubectl get deploy.

A LimitRange max of 500m CPU with 3 pods could mean 1500m total. But if the quota limits requests to 1000m, only 2 pods at 500m can exist. The LimitRange max and the ResourceQuota hard limit interact in non-obvious ways.

If you need to limit ephemeral storage (container logs, emptyDir volumes), add:

spec:
hard:
requests.ephemeral-storage: 10Gi
limits.ephemeral-storage: 20Gi

Without this, a single pod writing to an emptyDir can fill the node’s disk.