CRDs & Operators: Deep Dive
This document explains how CustomResourceDefinitions extend the Kubernetes API, why the operator pattern uses a reconciliation loop, and when to use features like status subresources, finalizers, and owner references. It connects the demo’s shell-based operator to production patterns used by CloudNativePG, cert-manager, and similar projects.
What CRDs Do
Section titled “What CRDs Do”A CustomResourceDefinition teaches the Kubernetes API server about a new resource type. After
applying the demo’s CRD, you can use kubectl get websites just like kubectl get pods.
The CRD is pure API registration. It stores your custom objects in etcd. It validates them against a schema. It supports CRUD operations. But it does nothing else. No pods are created. No side effects happen. CRDs are data, not behavior.
The demo’s CRD:
apiVersion: apiextensions.k8s.io/v1kind: CustomResourceDefinitionmetadata: name: websites.demo.example.comspec: group: demo.example.com names: kind: Website listKind: WebsiteList plural: websites singular: website shortNames: - ws scope: Namespaced versions: - name: v1 served: true storage: trueKey fields:
- group: The API group. Combined with the version, this gives
demo.example.com/v1. - names: How the resource appears in kubectl (
websites,ws). - scope:
NamespacedorCluster. Most CRDs are namespaced. - versions: One or more API versions. Exactly one must be the
storageversion.
OpenAPI v3 Validation Schema
Section titled “OpenAPI v3 Validation Schema”The CRD schema defines what fields are allowed and their constraints:
schema: openAPIV3Schema: type: object properties: spec: type: object required: - title - replicas properties: title: type: string description: Title displayed on the website replicas: type: integer minimum: 1 maximum: 5 description: Number of pods to run color: type: string default: "#2196F3" description: Theme color for the website status: type: object properties: availableReplicas: type: integer url: type: stringThe API server validates every create and update request against this schema. If replicas
is set to 10, the request is rejected because maximum: 5. If title is omitted, the
request is rejected because it is required.
Validation Features
Section titled “Validation Features”| Feature | Purpose | Example |
|---|---|---|
type | Data type | string, integer, object, array |
required | Mandatory fields | ["title", "replicas"] |
minimum/maximum | Numeric bounds | minimum: 1, maximum: 5 |
pattern | Regex validation | pattern: "^#[0-9a-fA-F]{6}$" |
enum | Allowed values | enum: ["small", "medium", "large"] |
default | Default value if not specified | default: "#2196F3" |
format | Semantic format | format: date-time |
x-kubernetes-validations | CEL expressions (Kubernetes 1.25+) | Custom logic |
CEL Validation (Kubernetes 1.25+)
Section titled “CEL Validation (Kubernetes 1.25+)”Common Expression Language allows writing validation rules that span multiple fields:
x-kubernetes-validations: - rule: "self.minReplicas <= self.maxReplicas" message: "minReplicas must not exceed maxReplicas"CEL validation runs server-side without needing a webhook.
Printer Columns
Section titled “Printer Columns”Printer columns control what kubectl get displays:
additionalPrinterColumns: - name: Title type: string jsonPath: .spec.title - name: Replicas type: integer jsonPath: .spec.replicas - name: Color type: string jsonPath: .spec.color - name: Age type: date jsonPath: .metadata.creationTimestampRunning kubectl get websites -n crd-demo produces:
NAME TITLE REPLICAS COLOR AGEmy-blog My Personal Blog 2 #4CAF50 5mdocs-site Documentation Portal 1 #FF9800 5mWithout printer columns, kubectl get only shows NAME and AGE. Printer columns make
custom resources feel like first-class citizens.
Status Subresource
Section titled “Status Subresource”The demo enables the status subresource:
subresources: status: {}This creates a separate API endpoint for the status: /apis/demo.example.com/v1/namespaces/crd-demo/websites/my-blog/status.
Why Separate Status
Section titled “Why Separate Status”Without the status subresource, updating the status requires updating the entire object. This
means the controller must have update permission on the full resource, and status updates
can conflict with spec updates from users.
With the status subresource:
- Users update
.specvia the main endpoint. - Controllers update
.statusvia the/statusendpoint. - RBAC can grant different permissions for spec and status.
- Changes to
.statusdo not increment.metadata.generation.
This separation is essential for the operator pattern. The user declares desired state in
.spec. The operator reports actual state in .status.
CRD Versioning and Storage Versions
Section titled “CRD Versioning and Storage Versions”CRDs can serve multiple API versions simultaneously:
versions: - name: v1 served: true storage: true - name: v1beta1 served: true storage: false- served: Clients can read and write this version via the API.
- storage: Objects are stored in etcd in this version. Exactly one version must be the storage version.
When a client writes a v1beta1 object, the API server converts it to v1 before storing
it. When a client reads a v1beta1 object, the API server converts the stored v1 object
back to v1beta1.
Conversion Webhooks
Section titled “Conversion Webhooks”For non-trivial version differences, you need a conversion webhook. The API server sends the object to your webhook, which converts between versions:
spec: conversion: strategy: Webhook webhook: conversionReviewVersions: ["v1"] clientConfig: service: name: website-converter namespace: crd-demo path: /convertThe webhook receives a ConversionReview request and returns the converted object. This
allows you to rename fields, change types, add defaults, and handle any structural changes
between versions.
Without a conversion webhook, the API server uses None strategy, which only allows
identical schemas across versions (essentially just renaming the version).
The Operator Pattern
Section titled “The Operator Pattern”An operator is a controller that watches custom resources and reconciles the cluster state to match them. The demo implements this as a shell script:
apiVersion: apps/v1kind: Deploymentmetadata: name: website-operator namespace: crd-demospec: replicas: 1 template: spec: serviceAccountName: website-operator containers: - name: operator image: busybox:1.36 command: ["/bin/sh", "/scripts/reconcile.sh"]The operator runs as a regular Deployment with a dedicated ServiceAccount that has the necessary RBAC permissions:
apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: website-operator namespace: crd-demorules: - apiGroups: ["demo.example.com"] resources: ["websites", "websites/status"] verbs: ["get", "list", "watch", "update", "patch"] - apiGroups: ["apps"] resources: ["deployments"] verbs: ["get", "list", "watch", "create", "update", "delete"] - apiGroups: [""] resources: ["services"] verbs: ["get", "list", "watch", "create", "update", "delete"] - apiGroups: [""] resources: ["events"] verbs: ["create"]The Reconciliation Loop
Section titled “The Reconciliation Loop”The demo’s operator runs a simple loop:
while true; do # List all Website CRs # For each Website, create or update Deployment + Service sleep 10doneIt polls every 10 seconds, reads all Website resources, and ensures a matching Deployment and Service exist for each one.
Level-Triggered vs Edge-Triggered
Section titled “Level-Triggered vs Edge-Triggered”The demo’s polling approach is level-triggered. It checks the current state on every cycle, regardless of what changed. If it missed an event, it catches up on the next poll.
Real operators use watches (event streams from the API server) for efficiency. A watch is edge-triggered: it fires when something changes. But production operators are still designed to be level-triggered in their logic. The watch triggers a reconciliation, but the reconciliation function always reads the full current state and computes the diff. It never relies on the event alone.
This distinction matters. A level-triggered controller is idempotent. You can restart it at any time. It reads the current state and converges. An edge-triggered controller that misses an event might leave the system in an inconsistent state.
The Reconcile Function
Section titled “The Reconcile Function”In production operators (using controller-runtime in Go), the reconcile function receives
a Request containing the namespace and name of the changed object:
func (r *WebsiteReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { // 1. Fetch the Website CR var website v1.Website if err := r.Get(ctx, req.NamespacedName, &website); err != nil { return ctrl.Result{}, client.IgnoreNotFound(err) }
// 2. Create or update the Deployment // 3. Create or update the Service // 4. Update the Website status
return ctrl.Result{}, nil}The function runs whenever the Website or any owned resource changes. It computes the difference between desired state and actual state, then takes the minimum action to converge.
Watches and Informers
Section titled “Watches and Informers”Watches
Section titled “Watches”The Kubernetes API supports watch requests. A watch opens a long-lived HTTP connection and streams events (ADDED, MODIFIED, DELETED) for a resource type.
GET /apis/demo.example.com/v1/namespaces/crd-demo/websites?watch=trueThe operator receives events in real-time instead of polling.
Informers
Section titled “Informers”An informer is a client-side cache backed by a watch. It:
- Lists all resources of a type (initial sync).
- Opens a watch to receive updates.
- Maintains an in-memory cache of all resources.
- Calls registered event handlers when resources change.
Informers are the backbone of controller-runtime. They make reads fast (cache hit) and writes efficient (react to changes).
SharedInformerFactory
Section titled “SharedInformerFactory”Multiple controllers can share informers for the same resource type. This avoids redundant API server connections and reduces memory usage.
Owner References and Garbage Collection
Section titled “Owner References and Garbage Collection”Owner references link child resources to their parent. When the parent is deleted, the garbage collector automatically deletes the children.
A production operator sets owner references on created resources:
apiVersion: apps/v1kind: Deploymentmetadata: name: website-my-blog ownerReferences: - apiVersion: demo.example.com/v1 kind: Website name: my-blog uid: abc-123-def controller: true blockOwnerDeletion: trueWhen the Website my-blog is deleted, Kubernetes automatically deletes the
website-my-blog Deployment. No operator intervention needed.
The demo’s shell operator does not set owner references (the JSON is simplified for learning). In production, this is essential. Without owner references, deleting a Website CR leaves orphaned Deployments and Services behind.
Cascade Delete Policies
Section titled “Cascade Delete Policies”- Foreground: Children are deleted first. The parent is deleted after all children are gone.
- Background: The parent is deleted immediately. Children are garbage collected asynchronously.
- Orphan: Children are not deleted. They become standalone resources.
Finalizers
Section titled “Finalizers”Finalizers prevent a resource from being deleted until cleanup is complete. A finalizer is a
string in .metadata.finalizers. As long as any finalizer is present, the resource enters a
Terminating state but is not removed from etcd.
The operator’s workflow:
-
Add finalizer when creating/reconciling the resource:
metadata:finalizers:- websites.demo.example.com/cleanup -
Detect deletion: Check if
.metadata.deletionTimestampis set. -
Run cleanup: Delete external resources, revoke certificates, clean up cloud resources.
-
Remove finalizer: Patch the object to remove the finalizer string.
-
Kubernetes deletes the object once all finalizers are removed.
Finalizers are essential when your operator manages resources outside Kubernetes (cloud load balancers, DNS records, external databases).
The controller-runtime Framework (Go)
Section titled “The controller-runtime Framework (Go)”Most production operators are written in Go using the controller-runtime library. It provides:
Manager
Section titled “Manager”Sets up shared dependencies: the API client, informer caches, leader election.
mgr, err := ctrl.NewManager(cfg, ctrl.Options{ Scheme: scheme, LeaderElection: true, LeaderElectionID: "website-operator",})Controller
Section titled “Controller”Registers a reconciler with watches:
ctrl.NewControllerManagedBy(mgr). For(&v1.Website{}). Owns(&appsv1.Deployment{}). Owns(&corev1.Service{}). Complete(reconciler)For(&v1.Website{}): Watch Website resources. Trigger reconciliation on changes.Owns(&appsv1.Deployment{}): Watch Deployments owned by Websites. If a Deployment changes, reconcile the owning Website.Complete(reconciler): Wire up the reconciler function.
Leader Election
Section titled “Leader Election”In production, operators run with multiple replicas for high availability. Only one replica (the leader) actively reconciles. If the leader crashes, another replica takes over.
Leader election uses a Lease object in the cluster. The leader holds the lease and renews it periodically. If the lease expires, another replica acquires it.
How Production Operators Implement These Patterns
Section titled “How Production Operators Implement These Patterns”CloudNativePG
Section titled “CloudNativePG”The CloudNativePG operator manages PostgreSQL clusters. It demonstrates:
- Multi-version CRD: Supports
v1with full validation. - Status subresource: Reports cluster health, replication lag, backup status.
- Owner references: The Cluster CR owns Pods, PVCs, Services, ConfigMaps.
- Finalizers: Cleans up PVCs, certificates, and backup repositories on deletion.
- Watches: Watches Cluster CRs, Pods, Nodes, and Secrets.
- Pod ordinal management: Uses stable pod names for primary/replica identity (similar to StatefulSets but with custom logic).
cert-manager
Section titled “cert-manager”The cert-manager operator manages TLS certificates. It demonstrates:
- Multiple CRDs: Certificate, Issuer, ClusterIssuer, CertificateRequest, Order, Challenge.
- Cross-resource reconciliation: A Certificate triggers a CertificateRequest, which triggers an Order, which triggers Challenges.
- Finalizers: Cleans up ACME challenges (DNS records, HTTP endpoints) on deletion.
- Status conditions: Reports certificate readiness, expiration, and renewal status.
- Aggregated ClusterRoles: Adds certificate permissions to the built-in
adminandeditroles.
Building Your Own Operator: Decision Framework
Section titled “Building Your Own Operator: Decision Framework”Do You Need an Operator?
Section titled “Do You Need an Operator?”Not every CRD needs an operator. Sometimes a CRD is just structured configuration storage, and an external system (CI/CD pipeline, GitOps tool) reads it.
You need an operator when:
- Custom resources should create or manage other Kubernetes resources.
- The reconciliation logic requires continuous monitoring and correction.
- Resources have lifecycle hooks (creation, update, deletion) with side effects.
Shell vs Go vs Python
Section titled “Shell vs Go vs Python”| Approach | Best For | Limitations |
|---|---|---|
| Shell script | Learning, prototyping | No watches, no proper error handling |
| Python (kopf) | Medium complexity, rapid development | Performance at scale |
| Go (controller-runtime) | Production operators | Steeper learning curve |
| Java (JOSDK) | Java shops | Higher resource usage |
The demo uses a shell script. It polls, does crude JSON parsing, and has no error recovery. This is fine for understanding the concept. For production, use controller-runtime.
Connection to the Demo
Section titled “Connection to the Demo”The demo builds up the operator pattern step by step:
- CRD registration:
website-crd.yamlteaches Kubernetes about Websites. - Custom resources without an operator:
website-samples.yamlcreates Websites. They exist in etcd but produce no pods. - Operator deployment: The shell-based operator polls for Websites and creates Deployments and Services.
- Reconciliation: Changing a Website’s
replicasortitletriggers the operator to update the Deployment.
The gap between step 2 (data only) and step 3 (data plus behavior) is the core insight. CRDs provide the API. Operators provide the automation.
Common Pitfalls
Section titled “Common Pitfalls”Infinite Reconciliation Loops
Section titled “Infinite Reconciliation Loops”If the operator updates its own CR’s status, and the status update triggers a reconciliation,
you get an infinite loop. Use generation-based checks: only reconcile when
.metadata.generation changes (spec changes), not when .metadata.resourceVersion changes
(any change including status).
Missing RBAC
Section titled “Missing RBAC”Operators need explicit permissions for every resource they touch. The demo’s operator needs access to websites, deployments, services, and events. Missing a permission causes authorization errors at runtime.
Not Setting Owner References
Section titled “Not Setting Owner References”Without owner references, deleting a CR leaves child resources behind. Over time, these orphaned resources accumulate.
Polling Instead of Watching
Section titled “Polling Instead of Watching”The demo polls every 10 seconds. On a cluster with thousands of CRs, this creates unnecessary API server load. Production operators use watches for efficiency.
Further Reading
Section titled “Further Reading”- Kubernetes CRD documentation
- Operator pattern
- controller-runtime
- Operator SDK
- kubebuilder
- CloudNativePG
- cert-manager