Resource Quotas & LimitRanges: Deep Dive

This document explains how Kubernetes enforces resource governance at the namespace level. It covers the ResourceQuota admission controller, LimitRange mechanics, quota scopes, count quotas for custom resources, and strategies for multi-team namespace management.

The ResourceQuota Admission Controller

ResourceQuota enforcement happens through an admission controller. When a pod creation or update request arrives at the API server, the ResourceQuota admission controller intercepts it.

The admission controller:

Looks up all ResourceQuota objects in the target namespace.
Calculates the total resource consumption after the request would be applied.
If the new total exceeds any quota, the request is rejected with a 403 Forbidden error.
If the request fits, the quota’s status.used field is updated.

This is a synchronous, blocking check. The pod is never created if it would exceed the quota. This differs from resource limits on containers, which allow the container to start and then enforce via the kernel (OOM kill, CPU throttling).

The demo defines a quota in compute-quota:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: quota-demo
spec:
  hard:
    requests.cpu: "1"
    requests.memory: 1Gi
    limits.cpu: "2"
    limits.memory: 2Gi
    pods: "5"
    services: "3"
    persistentvolumeclaims: "2"

Quota Enforcement Requires Resource Specs

When a ResourceQuota exists in a namespace that tracks compute resources (requests.cpu, limits.memory, etc.), every pod in that namespace must specify resource requests and limits for those tracked resources. If a pod does not have resource specs, it is rejected.

This is why LimitRange exists: it provides default resource specs for pods that do not define their own.

Without a LimitRange, a bare kubectl run nginx --image=nginx in a quota-enabled namespace fails with:

Error: pods "nginx" is forbidden: failed quota: compute-quota:
must specify requests.cpu, requests.memory, limits.cpu, limits.memory

What Gets Counted

The quota counts resources from all pods in the namespace, regardless of their status. A pod in Pending state still counts against the quota because its resource requests are reserved. Only Succeeded and Failed pods (terminal states) do not count.

This means: if you have 5 pods stuck in Pending (image pull error, scheduling failure), they still consume quota. New pods cannot be created until the stuck pods are cleaned up.

ResourceQuota Scopes

Quotas can be scoped to apply only to certain types of pods. Scopes narrow which pods count against the quota.

BestEffort Scope

Applies only to pods with no resource requests or limits on any container:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: best-effort-quota
spec:
  hard:
    pods: "10"
  scopes:
    - BestEffort

This limits the number of BestEffort pods (pods without resource specs). Pods with resource specs are not affected.

NotBestEffort Scope

The inverse. Applies only to pods that have at least one container with resource requests or limits:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: not-best-effort-quota
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    pods: "20"
  scopes:
    - NotBestEffort

Terminating and NotTerminating Scopes

These scope quotas based on whether pods have an activeDeadlineSeconds set:

Terminating: Pods with activeDeadlineSeconds set (Jobs, batch workloads)
NotTerminating: Pods without activeDeadlineSeconds (long-running services)

apiVersion: v1
kind: ResourceQuota
metadata:
  name: batch-quota
spec:
  hard:
    pods: "50"
    requests.cpu: "8"
  scopes:
    - Terminating

This allows up to 50 batch job pods consuming 8 CPU total, without affecting the quota for long-running services.

PriorityClass Scope

Quota can target pods of a specific priority class using scopeSelector:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: high-priority-quota
spec:
  hard:
    pods: "10"
    requests.cpu: "4"
  scopeSelector:
    matchExpressions:
      - operator: In
        scopeName: PriorityClass
        values:
          - high-priority

This is powerful for multi-team clusters. You can give each team a quota for high-priority workloads and a separate, larger quota for low-priority workloads. High-priority pods are scarce and controlled. Low-priority pods can use more resources but get preempted first.

CrossNamespaceAffinity Scope

Added in v1.24. Limits the number of pods that use cross-namespace affinity terms:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: cross-ns-quota
spec:
  hard:
    pods: "5"
  scopes:
    - CrossNamespacePodAffinity

Cross-namespace affinity is a potential security concern because a pod in namespace A can influence scheduling near pods in namespace B. This scope limits that behavior.

Count Quotas

Beyond compute resources, quotas can count any API object type:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: object-counts
spec:
  hard:
    pods: "5"
    services: "3"
    persistentvolumeclaims: "2"
    configmaps: "10"
    secrets: "10"
    services.loadbalancers: "1"
    services.nodeports: "2"

Count Quotas for CRDs

You can quota custom resources using the count/ prefix:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: crd-quota
spec:
  hard:
    count/certificates.cert-manager.io: "20"
    count/virtualservices.networking.istio.io: "10"

The format is count/<resource>.<api-group>. This prevents teams from creating unbounded numbers of custom resources.

LimitRange In Detail

LimitRange operates at the individual container or pod level, while ResourceQuota operates at the namespace level. They serve different purposes:

ResourceQuota: Total budget for the namespace
LimitRange: Per-container constraints and defaults

The demo’s LimitRange:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: quota-demo
spec:
  limits:
    - type: Container
      default:
        cpu: 200m
        memory: 128Mi
      defaultRequest:
        cpu: 50m
        memory: 64Mi
      min:
        cpu: 25m
        memory: 32Mi
      max:
        cpu: 500m
        memory: 512Mi

LimitRange Fields Explained

Field	What It Does
`default`	Applied as the container’s limit if none specified
`defaultRequest`	Applied as the container’s request if none specified
`min`	Minimum allowed request/limit. Pod is rejected if below this
`max`	Maximum allowed request/limit. Pod is rejected if above this
`maxLimitRequestRatio`	Maximum allowed ratio of limit to request

The maxLimitRequestRatio is interesting. If set to 2, a container requesting 100m CPU can have at most 200m CPU limit. This prevents overcommit where a container requests 50m but has a limit of 4000m, potentially starving other containers.

How Defaults Are Applied

LimitRange defaults are injected by a mutating admission controller. When a pod spec arrives without resource specs:

The controller checks if any LimitRange exists in the namespace.
For each container missing resources.limits, it injects the default values.
For each container missing resources.requests, it injects the defaultRequest values.
If defaultRequest is not set but default is, the request is set equal to the limit.

The injection happens before the ResourceQuota check. So even if a user submits a pod without resources, the LimitRange injects defaults, and then the ResourceQuota validates the totals.

LimitRange Types

LimitRange supports three types:

Container (most common):

limits:
  - type: Container
    default:
      cpu: 200m
    max:
      cpu: 500m

Applies to individual containers within a pod.

Pod:

limits:
  - type: Pod
    max:
      cpu: "2"
      memory: 4Gi

Limits the total resources of all containers in a pod combined. This prevents a pod with 10 containers from consuming 10x the container max.

PersistentVolumeClaim:

limits:
  - type: PersistentVolumeClaim
    min:
      storage: 1Gi
    max:
      storage: 100Gi

Controls PVC sizes. Prevents users from requesting 1 TiB PVCs when the storage class has limited capacity.

Interaction Between Quota and LimitRange

These two resources work together in a specific order:

Pod spec arrives at the API server.
LimitRange mutating admission injects defaults (if needed).
LimitRange validating admission checks min/max/ratio constraints.
ResourceQuota admission checks namespace totals.

If the LimitRange defaults cause the pod to exceed the ResourceQuota, the pod is rejected. The error message comes from ResourceQuota, not LimitRange, which can be confusing.

The Math in the Demo

The demo shows this interaction:

small-app (fits within quota):

spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: nginx
          resources:
            requests:
              cpu: 100m
              memory: 64Mi
            limits:
              cpu: 200m
              memory: 128Mi

Total for 2 replicas: 200m CPU request, 400m CPU limit, 128Mi memory request, 256Mi memory limit.

greedy-app (exceeds quota):

spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: nginx
          resources:
            requests:
              cpu: 400m
              memory: 256Mi
            limits:
              cpu: 800m
              memory: 512Mi

small-app already uses 200m CPU request. greedy-app wants 400m * 3 = 1200m CPU request. Total would be 1400m. Quota allows 1000m. Only 2 of the 3 greedy-app pods can be created (200m existing + 400m * 2 = 1000m exactly). The third pod is rejected.

The Deployment controller keeps retrying. If small-app is scaled down or deleted, the quota frees up and the third greedy-app pod can be created.

Multiple ResourceQuotas

A namespace can have multiple ResourceQuota objects. Each one enforces independently. The pod must satisfy all of them.

# Compute quota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
spec:
  hard:
    requests.cpu: "4"
    limits.memory: 8Gi

---
# Object count quota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: count-quota
spec:
  hard:
    pods: "20"
    services: "5"

Use separate quotas for different concerns: compute resources, object counts, storage. This makes it easier to adjust one without touching the other.

Multi-Team Namespace Strategies

In multi-team clusters, quotas are the primary mechanism for resource governance.

One Namespace Per Team

The simplest model. Each team gets a namespace with a ResourceQuota:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "8"
    requests.memory: 16Gi
    limits.cpu: "16"
    limits.memory: 32Gi
    pods: "50"

Environment-Based Namespaces

Teams get separate namespaces for dev, staging, and production with different quotas. Dev gets a small quota (2 CPU, 10 pods). Production gets a larger one (16 CPU, 100 pods). The Hierarchical Namespace Controller (HNC) can propagate quotas from parent to child namespaces for more automated governance.

QoS Classes and Quotas

Kubernetes assigns QoS classes based on resource specs: Guaranteed (requests equal limits), Burstable (requests set but not equal to limits), and BestEffort (no requests or limits). The BestEffort and NotBestEffort quota scopes correspond to these classes. BestEffort pods are evicted first under memory pressure.

Storage and Monitoring Quotas

Quotas can limit storage with requests.storage and per-StorageClass quotas using the <storageclass>.storageclass.storage.k8s.io/ prefix. Monitor quota usage with kubectl describe resourcequota, the kube_resourcequota Prometheus metric, or kubectl get events --field-selector reason=FailedCreate.

Common Pitfalls

1. Forgetting LimitRange with Quota

If ResourceQuota tracks compute resources but no LimitRange provides defaults, pods without resource specs are rejected. Always pair ResourceQuota with LimitRange.

2. Quota Covers All Pod States

Pending and CrashLoopBackOff pods still count. A namespace full of broken pods prevents new pods from being created.

3. ReplicaSet Controller Retries Silently

When a Deployment cannot create pods due to quota, the ReplicaSet controller retries with exponential backoff. The Deployment appears to accept the request, but replicas stay at a lower count. Check events, not just kubectl get deploy.

4. LimitRange Min/Max vs Quota

A LimitRange max of 500m CPU with 3 pods could mean 1500m total. But if the quota limits requests to 1000m, only 2 pods at 500m can exist. The LimitRange max and the ResourceQuota hard limit interact in non-obvious ways.

5. Ephemeral Storage Quotas

If you need to limit ephemeral storage (container logs, emptyDir volumes), add:

spec:
  hard:
    requests.ephemeral-storage: 10Gi
    limits.ephemeral-storage: 20Gi

Without this, a single pod writing to an emptyDir can fill the node’s disk.