PersistentVolumes & StorageClasses: Deep Dive
This document explains how the Kubernetes storage layer works, from PV/PVC binding mechanics through CSI drivers, volume snapshots, and production storage patterns. It covers the “why” behind access modes, reclaim policies, topology-aware provisioning, and the trade-offs you face when choosing storage for real workloads.
The Storage Abstraction Stack
Section titled “The Storage Abstraction Stack”Kubernetes separates storage into three layers. Each has a distinct role.
StorageClass defines how storage is provisioned. It specifies the provisioner, parameters (IOPS, replication, filesystem type), and reclaim policy. Think of it as a storage template controlled by cluster administrators.
PersistentVolume (PV) is a piece of storage in the cluster. It exists independently of any pod. PVs can be created manually (static provisioning) or automatically by a StorageClass (dynamic provisioning).
PersistentVolumeClaim (PVC) is a request for storage by a pod. It specifies capacity, access mode, and optionally a StorageClass. Kubernetes binds the PVC to a matching PV.
StorageClass (how to provision) | vPersistentVolume (a piece of storage) | v (bound)PersistentVolumeClaim (a request for storage) | v (mounted)PodThis separation exists so developers request storage without knowing the backend, and administrators configure backends without knowing the applications. The PVC is the contract between the two.
PV/PVC Binding Mechanics
Section titled “PV/PVC Binding Mechanics”When a PVC is created, the control plane looks for a PV that satisfies all requirements. Binding considers several factors.
Capacity
Section titled “Capacity”The PV’s capacity must be greater than or equal to the PVC’s request. A PVC requesting 256Mi can bind to a 256Mi or 1Gi PV, but not a 128Mi PV.
From this demo’s dynamic PVC:
apiVersion: v1kind: PersistentVolumeClaimmetadata: name: dynamic-pvc namespace: storage-demospec: accessModes: - ReadWriteOnce resources: requests: storage: 256Mi storageClassName: standardAccess Modes
Section titled “Access Modes”The PV must support the access mode requested by the PVC. A PV offering only ReadWriteOnce cannot satisfy a PVC requesting ReadWriteMany.
Label Selectors
Section titled “Label Selectors”A PVC can use a label selector to target a specific PV. This is how static provisioning works in this demo:
apiVersion: v1kind: PersistentVolumemetadata: name: manual-pv labels: type: localspec: capacity: storage: 128Mi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain hostPath: path: /tmp/manual-pv-data---apiVersion: v1kind: PersistentVolumeClaimmetadata: name: manual-pvc namespace: storage-demospec: accessModes: - ReadWriteOnce resources: requests: storage: 128Mi selector: matchLabels: type: localThe PVC uses selector.matchLabels to find only PVs labeled type: local. Without the selector, Kubernetes could bind to any available PV matching capacity and access mode.
StorageClass
Section titled “StorageClass”If the PVC specifies a storageClassName, it only binds to PVs with the same class. Omitting storageClassName uses the cluster’s default StorageClass (if one exists). Setting storageClassName: "" explicitly opts out of dynamic provisioning.
Binding Is Exclusive
Section titled “Binding Is Exclusive”A PV binds to exactly one PVC. Once bound, the PV is reserved until the PVC is deleted. This is a 1:1 relationship.
Dynamic Provisioning with StorageClass
Section titled “Dynamic Provisioning with StorageClass”Static provisioning requires pre-creating PVs. Dynamic provisioning automates this. When a PVC references a StorageClass, the provisioner creates the PV automatically.
A provisioner watches for unbound PVCs. When it sees one requesting its StorageClass, it calls the storage API to create a volume, then creates a PV in Kubernetes. Minikube’s standard class uses the k8s.io/minikube-hostpath provisioner. Production provisioners include ebs.csi.aws.com, pd.csi.storage.gke.io, and disk.csi.azure.com.
A typical StorageClass:
apiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: fast-ssdprovisioner: ebs.csi.aws.comparameters: type: gp3 iops: "3000" throughput: "125"reclaimPolicy: DeletevolumeBindingMode: WaitForFirstConsumerallowVolumeExpansion: trueThe parameters field is passed directly to the provisioner. Different provisioners accept different parameters. These are storage-backend-specific, not Kubernetes-level concepts.
CSI Drivers
Section titled “CSI Drivers”CSI (Container Storage Interface) is the standard plugin mechanism for storage in Kubernetes.
Why CSI Exists
Section titled “Why CSI Exists”Before CSI, storage drivers were compiled into Kubernetes. Adding a new storage system meant modifying Kubernetes source code and upgrading the cluster. CSI defines a standard gRPC interface so storage vendors develop, release, and update drivers independently.
How They Work
Section titled “How They Work”A CSI driver has two components. The controller plugin (Deployment or StatefulSet) handles volume lifecycle: create, delete, snapshot, expand. It talks to the storage backend’s API. The node plugin (DaemonSet on every node) handles mount, unmount, and format operations.
When a PVC triggers provisioning:
- Controller plugin receives a
CreateVolumeRPC and calls the storage API. - Kubernetes creates a PV representing the new volume.
- When a pod is scheduled, the node plugin receives
NodeStageVolume(format and stage) thenNodePublishVolume(bind-mount into the pod).
| Driver | Backend | Use Case |
|---|---|---|
ebs.csi.aws.com | AWS EBS | Block storage on AWS |
efs.csi.aws.com | AWS EFS | Shared NFS on AWS |
pd.csi.storage.gke.io | GCE PD | Block storage on GKE |
disk.csi.azure.com | Azure Disk | Block storage on Azure |
rook-ceph.csi.ceph.com | Ceph (via Rook) | Self-hosted distributed storage |
Access Modes: What They Actually Mean
Section titled “Access Modes: What They Actually Mean”Access modes describe how a volume can be mounted by nodes. They do not enforce filesystem-level permissions. This distinction causes frequent confusion.
ReadWriteOnce (RWO)
Section titled “ReadWriteOnce (RWO)”The volume can be mounted read-write by a single node. Multiple pods on the same node can all mount it. Pods on different nodes cannot. This is the most common mode, mapping to block storage devices (EBS, GCE PD, Azure Disk) that attach to one instance at a time.
From this demo, the writer pod mounts a PVC read-write:
volumes: - name: data persistentVolumeClaim: claimName: dynamic-pvcThe reader pod mounts a different PVC with readOnly: true on the volumeMount:
volumes: - name: data persistentVolumeClaim: claimName: manual-pvcThe access mode on the PVC controls node-level attachment. The readOnly flag on the volumeMount controls container-level visibility. They are independent.
ReadOnlyMany (ROX)
Section titled “ReadOnlyMany (ROX)”The volume can be mounted read-only by many nodes simultaneously. Useful for shared configuration, static assets, or pre-built datasets.
ReadWriteMany (RWX)
Section titled “ReadWriteMany (RWX)”The volume can be mounted read-write by many nodes simultaneously. Requires NFS, CephFS, Amazon EFS, or similar. Block storage does not support RWX.
RWX with concurrent writes requires the application to handle file locking. The storage system provides concurrent access, not concurrent safety. Two pods writing to the same file simultaneously will corrupt it without proper locking.
ReadWriteOncePod (RWOP)
Section titled “ReadWriteOncePod (RWOP)”Added in Kubernetes 1.27. Only one pod cluster-wide can mount the volume read-write. Stricter than RWO (which allows multiple pods on the same node). Useful for databases that assume exclusive write access.
Reclaim Policies
Section titled “Reclaim Policies”The reclaim policy determines what happens to the PV when its PVC is deleted.
Retain
Section titled “Retain”The PV keeps its data and moves to Released state. An administrator must manually clean up. This demo’s static PV uses Retain:
spec: persistentVolumeReclaimPolicy: Retain hostPath: path: /tmp/manual-pv-dataUse Retain for production databases. Deleting a PVC accidentally will not destroy your data.
Delete
Section titled “Delete”The PV and underlying storage are both deleted with the PVC. This is the default for dynamically provisioned volumes. Use it for caches, temporary pipelines, and reproducible data.
Recycle (Deprecated)
Section titled “Recycle (Deprecated)”Ran rm -rf /volume/* and reused the PV. Deprecated because it was too simplistic and insecure. Use dynamic provisioning instead.
Volume Expansion
Section titled “Volume Expansion”StorageClasses can allow resizing with allowVolumeExpansion: true:
apiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: expandableprovisioner: ebs.csi.aws.comallowVolumeExpansion: trueTo expand: edit the PVC and increase spec.resources.requests.storage. The CSI driver handles the rest. Some drivers support online expansion. Others require a pod restart.
You can only expand, never shrink. This is a deliberate safety measure against data loss.
Volume Snapshots
Section titled “Volume Snapshots”Snapshots capture volume state at a point in time. They need a CSI driver with snapshot support and a VolumeSnapshotClass:
apiVersion: snapshot.storage.k8s.io/v1kind: VolumeSnapshotClassmetadata: name: csi-ebs-snapclassdriver: ebs.csi.aws.comdeletionPolicy: DeleteCreate a snapshot:
apiVersion: snapshot.storage.k8s.io/v1kind: VolumeSnapshotmetadata: name: my-data-snapshotspec: volumeSnapshotClassName: csi-ebs-snapclass source: persistentVolumeClaimName: my-data-pvcRestore by creating a PVC with dataSource:
apiVersion: v1kind: PersistentVolumeClaimmetadata: name: restored-dataspec: accessModes: - ReadWriteOnce resources: requests: storage: 256Mi dataSource: name: my-data-snapshot kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.ioThe provisioner creates a new volume pre-populated with the snapshot’s data. This is the foundation for database backup/restore, disaster recovery, and environment cloning.
Topology-Aware Provisioning
Section titled “Topology-Aware Provisioning”In multi-zone clusters, where a volume is created matters. An EBS volume in us-east-1a cannot be attached to a node in us-east-1b.
The volumeBindingMode on StorageClass controls this:
apiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: topology-awareprovisioner: ebs.csi.aws.comvolumeBindingMode: WaitForFirstConsumerImmediate (default): PV is created as soon as the PVC is created. The provisioner picks a zone. If the pod lands in a different zone, it cannot mount the volume.
WaitForFirstConsumer: PV is not created until a pod using the PVC is scheduled. The provisioner creates the volume in the pod’s zone, guaranteeing accessibility.
For multi-zone clusters, WaitForFirstConsumer should be the default. Immediate only makes sense for single-zone clusters or zone-independent backends like NFS.
hostPath vs Local Volumes
Section titled “hostPath vs Local Volumes”hostPath
Section titled “hostPath”Mounts a host directory directly into the pod. Simple and works everywhere. This demo uses it:
spec: hostPath: path: /tmp/manual-pv-dataBut it is dangerous in production. If the pod moves to a different node, it gets a different directory. There is no capacity enforcement. Security is a concern since pods can access any file on the node. Use hostPath only for single-node development like minikube.
Local Volumes
Section titled “Local Volumes”Similar to hostPath but managed as proper PVs. They are topology-aware (the scheduler knows which node has the storage), support capacity tracking, and work with the standard PVC lifecycle. Require mandatory nodeAffinity:
spec: local: path: /mnt/ssd0 nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - worker-node-1Use local volumes when you need local SSD performance (databases, caches) with proper lifecycle management. The trade-off: your pod is pinned to a specific node.
Production Storage Patterns
Section titled “Production Storage Patterns”Databases
Section titled “Databases”Use RWO or RWOP access, Retain reclaim, provisioned IOPS, WaitForFirstConsumer, and allowVolumeExpansion. StatefulSets give each replica its own PVC:
apiVersion: apps/v1kind: StatefulSetmetadata: name: postgresspec: replicas: 3 volumeClaimTemplates: - metadata: name: pgdata spec: accessModes: ["ReadWriteOnce"] storageClassName: fast-ssd resources: requests: storage: 50GiEach replica gets its own PVC (pgdata-postgres-0, pgdata-postgres-1, etc.). Deleting a StatefulSet does not delete its PVCs, protecting against accidental data loss.
Shared Files
Section titled “Shared Files”RWX access via NFS, CephFS, or EFS. Higher latency than block storage, but works well for media files, uploads, and static assets. All replicas mount the same PVC.
Ephemeral Cache
Section titled “Ephemeral Cache”Often emptyDir is enough. For PVC-backed ephemeral storage, use ephemeral volumes:
volumes: - name: scratch ephemeral: volumeClaimTemplate: spec: accessModes: ["ReadWriteOnce"] storageClassName: fast-ssd resources: requests: storage: 10GiCreated with the pod, deleted with the pod. Useful for build caches, CI/CD temp files, and ML training scratch space.
Data Persistence Across Pod Restarts
Section titled “Data Persistence Across Pod Restarts”This is the core value of PersistentVolumes. This demo shows it: the writer pod appends data, gets deleted, and a new pod reads the same data from the same PVC. Data survives because it lives on the PV, not in the container filesystem.
Choosing the Right Storage
Section titled “Choosing the Right Storage”| Workload | Access Mode | Reclaim | Binding Mode | Volume Type |
|---|---|---|---|---|
| Single-instance DB | RWO/RWOP | Retain | WaitForFirstConsumer | Block (EBS, GCE PD) |
| Replicated DB | RWO | Retain | WaitForFirstConsumer | Block |
| Shared files | RWX | Retain | N/A | NFS, EFS, CephFS |
| Build cache | RWO | Delete | Immediate | Block or ephemeral |
| ML training data | ROX | Retain | N/A | NFS, S3 via CSI |
| Temp scratch | N/A | N/A | N/A | emptyDir |
See Also
Section titled “See Also”- README for step-by-step instructions to run this demo
- Multi-Container Patterns for shared volumes between containers
- Kubernetes Storage Docs for the official reference
- CSI Driver List for available drivers