Velero Backup and Restore: Deep Dive
This document explains why Velero exists, how backup and restore workflows operate, and what trade-offs you face when implementing disaster recovery for Kubernetes clusters. It covers storage backends, volume snapshots, resource filtering, and production considerations for multi-cluster recovery scenarios.
Why Velero Exists
Section titled “Why Velero Exists”Kubernetes clusters contain more than just container images and environment variables. ConfigMaps, Secrets, PersistentVolumeClaims, CRDs, and their instances represent cluster state. When disaster strikes (accidental deletion, cluster failure, regional outage), you need to recreate all of this.
What etcd Snapshots Cannot Do
Section titled “What etcd Snapshots Cannot Do”Every Kubernetes cluster stores its state in etcd. Taking etcd snapshots gives you a low-level backup of the entire cluster. But etcd snapshots have serious limitations:
- They back up everything or nothing. You cannot restore a single namespace or application.
- Restoring from etcd replaces the entire cluster state. Any changes since the snapshot are lost.
- They do not capture PersistentVolume data. PVs live outside etcd as actual storage volumes. An etcd snapshot contains the PV object metadata but not the files on disk.
- Cross-cluster restore is complex. You cannot easily restore an etcd snapshot from cluster A into cluster B.
- They require etcd-level access, which application teams rarely have.
What Velero Provides
Section titled “What Velero Provides”Velero operates at the Kubernetes API level, not the etcd level. It serializes Kubernetes resources to JSON and stores them in object storage (S3, GCS, Azure Blob). This gives you:
- Namespace-level granularity. Backup and restore individual namespaces or applications.
- Label-based filtering. Back up only resources with specific labels.
- Cross-cluster portability. Restore backups from production into a new disaster recovery cluster.
- Volume backup. Integrate with volume snapshots (CSI) or file-level backup (Restic/Kopia).
- Self-service restore. Developers can restore their own namespaces without cluster admin access.
Velero complements etcd snapshots. Use etcd snapshots for full cluster state recovery. Use Velero for application-level backup, namespace migration, and disaster recovery.
How Velero Works
Section titled “How Velero Works”Velero has two main components: the Velero server and the Velero CLI.
Server Architecture
Section titled “Server Architecture”The Velero server runs as a Deployment in the velero namespace. It consists of several controllers that watch for backup and restore custom resources.
Velero CLI Velero Server (Pod) Object Storage | | | | velero backup create | | |------------------------------>| | | | | | | Create Backup CR | | |<-----(watches) | | | | | | Query Kubernetes API | | | (GET all resources) | | | | | | Serialize to JSON | | | | | | Upload backup tarball | | |-------------------------------->| | | | | Check backup status | | |------------------------------>| | | | | | Backup Complete | | |<------------------------------| |The backup controller watches for Backup custom resources. When you run velero backup create, the CLI creates a Backup CR. The server sees it, queries the Kubernetes API for all resources matching the backup scope, serializes them to JSON, compresses them into a tarball, and uploads to the configured BackupStorageLocation.
Restore Workflow
Section titled “Restore Workflow”Restore is the reverse process:
Velero CLI Velero Server Object Storage | | | | velero restore create | | |------------------------------>| | | | | | | Create Restore CR | | |<-----(watches) | | | | | | Download backup tarball | | |<--------------------------------| | | | | | Extract JSON manifests | | | | | | Apply resources to cluster | | | (kubectl apply equivalent) | | | |The restore controller downloads the backup tarball, extracts each resource definition, and applies it to the cluster via the Kubernetes API. Resources are restored in a specific order (namespaces first, then other resources) to avoid dependency issues.
Plugin Architecture
Section titled “Plugin Architecture”Velero uses plugins to interface with different storage backends and volume snapshot providers. Plugins are separate binaries that run inside the Velero server pod. The server communicates with them over gRPC.
From the demo’s install command:
velero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.9.0The --plugins flag tells Velero to download and install the AWS plugin. This plugin handles both S3 storage (via the BackupStorageLocation interface) and EBS volume snapshots (via the VolumeSnapshotter interface).
Other available plugins:
velero-plugin-for-gcp: Google Cloud Storage and GCE Persistent Disk snapshotsvelero-plugin-for-microsoft-azure: Azure Blob Storage and Azure Disk snapshotsvelero-plugin-for-csi: Generic CSI volume snapshots (works with any CSI driver)
You can write custom plugins to integrate Velero with proprietary storage systems or add custom backup logic.
Storage Backends
Section titled “Storage Backends”Velero stores backup data in object storage via BackupStorageLocations (BSL). A BSL defines where backups go.
BackupStorageLocation Configuration
Section titled “BackupStorageLocation Configuration”From the demo’s install command:
--bucket velero \--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero-demo.svc:9000This creates a BackupStorageLocation pointing to MinIO running inside the cluster. In production, you would point to external S3:
velero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.9.0 \ --bucket my-backup-bucket \ --backup-location-config region=us-east-1 \ --secret-file ./credentials-veleroThe credentials file contains AWS access keys in a specific format:
[default]aws_access_key_id=AKIAIOSFODNN7EXAMPLEaws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYThis file becomes a Kubernetes Secret that Velero uses to authenticate to S3.
Why MinIO for Local Testing
Section titled “Why MinIO for Local Testing”MinIO provides an S3-compatible API. This lets you test Velero workflows without an AWS account. From the demo’s MinIO deployment:
# From manifests/minio.yamlapiVersion: apps/v1kind: Deploymentmetadata: name: minio namespace: velero-demospec: replicas: 1 template: spec: containers: - name: minio image: minio/minio:latest args: - server - /data - --console-address - :9001 env: - name: MINIO_ROOT_USER value: "minio" - name: MINIO_ROOT_PASSWORD value: "minio123" ports: - name: api containerPort: 9000 - name: console containerPort: 9001Port 9000 serves the S3-compatible API. Port 9001 serves the web console. Velero talks to port 9000.
MinIO stores data in a PersistentVolume:
# From manifests/minio.yamlvolumeMounts: - name: data mountPath: /datavolumes: - name: data persistentVolumeClaim: claimName: minio-pvcIn this demo setup, MinIO and its data live in the same cluster you are backing up. This is fine for learning but defeats the purpose in production. If the cluster fails, you lose both your workload and your backups. Always use external object storage for real disaster recovery.
Multi-Region Backup
Section titled “Multi-Region Backup”Production setups often use multiple BackupStorageLocations for geographic redundancy:
velero backup-location create us-east \ --provider aws \ --bucket velero-us-east-1 \ --config region=us-east-1
velero backup-location create us-west \ --provider aws \ --bucket velero-us-west-2 \ --config region=us-west-2You can specify which location to use for each backup:
velero backup create app-backup --storage-location us-westOr configure scheduled backups to use different locations for different retention tiers (daily to us-east, weekly to us-west).
Key Concepts
Section titled “Key Concepts”Backups
Section titled “Backups”A Backup is a custom resource that triggers the backup process:
apiVersion: velero.io/v1kind: Backupmetadata: name: demo-backup namespace: velerospec: includedNamespaces: - velero-demo storageLocation: default ttl: 720h # 30 daysWhen you run velero backup create demo-backup --include-namespaces velero-demo, the CLI creates this CR for you.
The backup tarball contains JSON manifests for all resources in the namespace. From the demo’s sample app:
# From manifests/sample-app.yamlapiVersion: v1kind: ConfigMapmetadata: name: nginx-config namespace: velero-demo---apiVersion: apps/v1kind: Deploymentmetadata: name: sample-app namespace: velero-demospec: replicas: 2---apiVersion: v1kind: Servicemetadata: name: sample-app namespace: velero-demoAll three resources are serialized and stored in the backup. When you restore, all three come back.
Restores
Section titled “Restores”A Restore is also a custom resource:
apiVersion: velero.io/v1kind: Restoremetadata: name: demo-backup-20250411123045 namespace: velerospec: backupName: demo-backup includedNamespaces: - velero-demoYou can restore to a different namespace using namespace mappings:
velero restore create --from-backup demo-backup \ --namespace-mappings velero-demo:new-namespaceThis takes the backup from velero-demo and recreates all resources in new-namespace instead.
Schedules
Section titled “Schedules”Schedules automate backup creation on a cron schedule:
apiVersion: velero.io/v1kind: Schedulemetadata: name: daily-backup namespace: velerospec: schedule: "0 2 * * *" # 2 AM daily template: includedNamespaces: - production ttl: 720hThe demo shows creating a schedule via the CLI:
velero schedule create daily-backup \ --schedule="0 */6 * * *" \ --include-namespaces velero-demoThis creates a backup every 6 hours. The TTL (time to live) controls retention. Backups older than the TTL are automatically deleted from object storage.
Backup Hooks
Section titled “Backup Hooks”Hooks let you run commands inside containers before or after a backup. This is critical for databases that need consistent snapshots.
Pre-backup hook (flush database to disk):
apiVersion: v1kind: Podmetadata: name: postgres-0 annotations: pre.hook.backup.velero.io/command: '["/bin/bash", "-c", "PGPASSWORD=$POSTGRES_PASSWORD pg_dump -U postgres -d mydb > /tmp/backup.sql"]' pre.hook.backup.velero.io/timeout: "3m"Velero runs this command before backing up the pod. The command dumps the database to a file inside the container. Then Velero backs up the PVC, which includes the dump file.
Post-backup hook (clean up):
post.hook.backup.velero.io/command: '["/bin/rm", "/tmp/backup.sql"]'Without hooks, backing up a running database might capture inconsistent state (half-written transactions, dirty buffers).
Volume Snapshots vs File-Level Backup
Section titled “Volume Snapshots vs File-Level Backup”Velero supports two approaches for backing up persistent volumes.
Volume Snapshots (CSI)
Section titled “Volume Snapshots (CSI)”Volume snapshots use the CSI driver’s snapshot capability. For AWS EBS, this means creating an EBS snapshot. For GCE Persistent Disks, a PD snapshot.
Enable with:
velero install \ --use-volume-snapshots=true \ --snapshot-location-config region=us-east-1When you back up a PVC, Velero triggers a VolumeSnapshot via the CSI driver. The snapshot is stored in the cloud provider’s snapshot system (not in the S3 bucket). The backup tarball contains a reference to the snapshot ID.
On restore, Velero creates a new PVC with dataSource pointing to the snapshot. The CSI driver creates a volume pre-populated with the snapshot data.
Pros:
- Fast. Snapshots are block-level copies.
- Native to the storage system.
- Incremental. Most cloud providers only store changed blocks.
Cons:
- Cloud provider-specific. EBS snapshots only work in AWS.
- Cannot easily move to a different cloud.
- Some CSI drivers do not support snapshots.
File-Level Backup (Restic/Kopia)
Section titled “File-Level Backup (Restic/Kopia)”Restic and Kopia are backup tools that copy files from a volume into object storage. Velero integrates them as an alternative to CSI snapshots.
Enable with:
velero install \ --use-volume-snapshots=false \ --uploader-type=kopiaWhen you back up a PVC, Velero deploys a helper pod on the same node as the PVC. The helper mounts the PVC and uploads its contents to S3 via Kopia. The data goes into the same S3 bucket as the Kubernetes manifests.
On restore, Velero deploys another helper pod to download the data from S3 and write it into the new PVC.
Pros:
- Cloud-agnostic. Works anywhere.
- Backs up to the same S3 bucket as manifests (simpler management).
- Works with any storage backend (hostPath, NFS, Ceph).
Cons:
- Slower than snapshots. Files are copied byte-by-byte.
- Higher CPU and network usage during backup and restore.
- Requires Velero to schedule helper pods on the same nodes as the volumes.
The demo disables volume snapshots with --use-volume-snapshots=false because minikube’s hostPath provisioner does not support CSI snapshots.
Resource Filtering
Section titled “Resource Filtering”Velero lets you control exactly what gets backed up.
Include and Exclude Namespaces
Section titled “Include and Exclude Namespaces”# Backup specific namespacesvelero backup create prod-backup --include-namespaces production,staging
# Backup everything except specific namespacesvelero backup create all-except-default --exclude-namespaces default,kube-systemFrom the demo:
velero backup create demo-backup --include-namespaces velero-demoThis backs up only resources in the velero-demo namespace.
Label Selectors
Section titled “Label Selectors”velero backup create app-only --selector app=sample-appFrom the demo’s sample app:
# From manifests/sample-app.yamlapiVersion: v1kind: ConfigMapmetadata: name: nginx-config namespace: velero-demo labels: app: sample-appThe ConfigMap, Deployment, and Service all have app: sample-app. The backup includes all three.
This is useful for multi-tenant clusters where different teams share namespaces. Each team labels their resources with team: frontend or team: backend, and they can back up only their own workloads.
Resource Type Filters
Section titled “Resource Type Filters”# Backup everything except ConfigMapsvelero backup create no-configmaps \ --include-namespaces velero-demo \ --exclude-resources configmaps
# Backup only Deployments and Servicesvelero backup create minimal \ --include-namespaces velero-demo \ --include-resources deployments,servicesUseful for compliance scenarios where Secrets must not leave the cluster. You can exclude Secrets from backups and restore them separately via a secure channel.
Cluster-Scoped Resources
Section titled “Cluster-Scoped Resources”By default, Velero backs up only namespaced resources. Cluster-scoped resources (ClusterRoles, PersistentVolumes, CRDs) require explicit inclusion:
velero backup create full-cluster \ --include-cluster-resources=trueFor namespace-level backups, you typically want --include-cluster-resources=false to avoid conflicts when restoring into a different cluster.
Trade-Offs and Alternatives
Section titled “Trade-Offs and Alternatives”Velero vs etcd Snapshot
Section titled “Velero vs etcd Snapshot”| Aspect | Velero | etcd Snapshot |
|---|---|---|
| Granularity | Namespace or label-based | Entire cluster |
| Cross-cluster restore | Easy | Complex |
| PV data | Optional (via snapshots or Restic) | Not included |
| Restore speed | Slow (API calls) | Fast (direct etcd restore) |
| Access required | Kubernetes API | etcd access (typically admin-only) |
Use etcd snapshots for full cluster disaster recovery. Use Velero for application-level backup and migration.
Velero vs Kasten K10
Section titled “Velero vs Kasten K10”Kasten K10 is a commercial Kubernetes backup solution. It provides:
- Integrated UI for backup and restore
- Application-aware backup (automatic hook generation for databases)
- Multi-cluster disaster recovery orchestration
- Advanced policy management (compliance, SLA tracking)
Velero is open source, lightweight, and flexible. K10 is a comprehensive platform with enterprise features and support. If you need backup for dozens of clusters with compliance requirements, K10 may be worth the cost. For most use cases, Velero is sufficient.
Velero vs Application-Specific Backup
Section titled “Velero vs Application-Specific Backup”For critical databases, consider application-specific backup tools alongside Velero:
- PostgreSQL:
pg_dump,pg_basebackup, WAL archiving - MySQL:
mysqldump, Percona XtraBackup - MongoDB:
mongodump, Ops Manager backups
Velero provides cluster-level recovery. Application-specific tools provide point-in-time recovery and transaction-level granularity. Use both.
Production Considerations
Section titled “Production Considerations”Scheduled Backups with Retention
Section titled “Scheduled Backups with Retention”Configure automated backups with appropriate TTLs:
# Daily full backup, 30-day retentionvelero schedule create daily \ --schedule="0 2 * * *" \ --include-namespaces production \ --ttl 720h
# Weekly backup, 90-day retentionvelero schedule create weekly \ --schedule="0 3 * * 0" \ --include-namespaces production \ --ttl 2160hMonitor backup success via Prometheus metrics or the Velero CLI:
velero backup getvelero backup describe daily-20250411020000Set up alerts when backups fail or take too long.
Cross-Cluster Restore for Disaster Recovery
Section titled “Cross-Cluster Restore for Disaster Recovery”Test your disaster recovery plan by restoring into a separate cluster:
- Stand up a new cluster in a different region or cloud.
- Install Velero with the same BackupStorageLocation configuration.
- Restore from the latest backup.
# In the DR clustervelero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.9.0 \ --bucket my-backup-bucket \ --backup-location-config region=us-east-1 \ --secret-file ./credentials-velero \ --no-default-backup-location=false
velero restore create dr-restore --from-backup daily-20250411020000Common issues:
- StorageClass mismatch. The DR cluster may not have the same StorageClasses. Use restore mappings to translate.
- LoadBalancer IP conflicts. Services with
type: LoadBalancerget new IPs in the DR cluster. Update DNS. - PVC provisioning delays. Large PVs take time to restore from snapshots.
Backup Encryption
Section titled “Backup Encryption”Object storage should be encrypted at rest (S3 server-side encryption, GCS encryption). For additional security, enable Velero client-side encryption (in development as of 2025).
RBAC and Multi-Tenancy
Section titled “RBAC and Multi-Tenancy”Grant namespace-scoped backup permissions:
apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: velero-namespace-backup namespace: team-arules: - apiGroups: ["velero.io"] resources: ["backups", "restores"] verbs: ["create", "get", "list"]Users can back up and restore their own namespaces without cluster-admin access.
Backup Size and Performance
Section titled “Backup Size and Performance”Large clusters generate large backups. A cluster with 50,000 resources can produce multi-GB tarballs. This impacts:
- Upload time to S3 (may take 10+ minutes)
- Download time during restore
- S3 storage costs
Mitigate with:
- Exclude unnecessary resources (logs, temporary workloads)
- Use separate schedules for critical and non-critical namespaces
- Implement backup retention policies to delete old backups
Common Pitfalls
Section titled “Common Pitfalls”PV Backup Gotchas
Section titled “PV Backup Gotchas”Problem: You back up a namespace, delete everything, restore, and the PVCs are pending.
Cause: PersistentVolumes are cluster-scoped. Your backup included PVCs but not the underlying PVs.
Solution: Either include cluster resources (--include-cluster-resources=true) or use CSI/Restic volume backup to capture PV data, not just metadata.
Problem: Restored PVCs bind to the wrong PVs.
Cause: PV names are globally unique. If you restore into the same cluster, name collisions can occur.
Solution: Delete PVs before restoring, or use namespace mappings to isolate the restored resources.
CRD Ordering
Section titled “CRD Ordering”Problem: Restore fails with “the server could not find the requested resource” errors.
Cause: Custom resources (CRs) are restored before their CustomResourceDefinitions (CRDs).
Solution: Velero restores CRDs first by default. If you excluded cluster resources, restore CRDs manually before restoring CRs:
velero restore create crds-only \ --from-backup demo-backup \ --include-cluster-resources=true \ --include-resources customresourcedefinitions
velero restore create app-restore \ --from-backup demo-backup \ --include-namespaces velero-demoNamespace Conflicts on Restore
Section titled “Namespace Conflicts on Restore”Problem: Restore fails with “namespace already exists” errors.
Cause: You are restoring into a cluster that already has the namespace.
Solution: Use namespace mappings or delete the existing namespace first. Velero does not overwrite existing resources by default. You can force it with --existing-policy=update:
velero restore create --from-backup demo-backup \ --existing-policy=updateThis updates existing resources instead of skipping them. Use cautiously, as it can overwrite live configuration.
Velero Server Pod Crashes
Section titled “Velero Server Pod Crashes”Problem: The Velero pod is CrashLooping.
Cause: Invalid BackupStorageLocation configuration (wrong S3 endpoint, bad credentials).
Solution: Check logs:
kubectl logs -n velero deployment/veleroCommon errors:
NoSuchBucket: The S3 bucket does not exist.InvalidAccessKeyId: Credentials are wrong.RequestTimeout: Network connectivity issue to S3.
Fix the configuration and restart the Velero pod.
Restic/Kopia Init Container Hangs
Section titled “Restic/Kopia Init Container Hangs”Problem: Backups with file-level backup never complete.
Cause: The Restic/Kopia helper pod cannot mount the PVC (wrong node, security context issues).
Solution: Check the helper pod logs:
kubectl logs -n velero <restic-pod-name>Ensure the PVC’s access mode allows the helper pod to mount it. If using ReadWriteOnce, the helper must run on the same node as the original pod. If that node is down, the backup cannot proceed.
Further Reading
Section titled “Further Reading”- Velero documentation
- Velero GitHub repository
- Backup and restore best practices
- CSI snapshot documentation
- Kopia project
- Disaster recovery patterns for Kubernetes
See Also
Section titled “See Also”- How to run the demo
- Persistent Volumes for storage fundamentals
- StatefulSet for stateful workload backup considerations