Skip to content

CRDs & Operators: Deep Dive

This document explains how CustomResourceDefinitions extend the Kubernetes API, why the operator pattern uses a reconciliation loop, and when to use features like status subresources, finalizers, and owner references. It connects the demo’s shell-based operator to production patterns used by CloudNativePG, cert-manager, and similar projects.


A CustomResourceDefinition teaches the Kubernetes API server about a new resource type. After applying the demo’s CRD, you can use kubectl get websites just like kubectl get pods.

The CRD is pure API registration. It stores your custom objects in etcd. It validates them against a schema. It supports CRUD operations. But it does nothing else. No pods are created. No side effects happen. CRDs are data, not behavior.

The demo’s CRD:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: websites.demo.example.com
spec:
group: demo.example.com
names:
kind: Website
listKind: WebsiteList
plural: websites
singular: website
shortNames:
- ws
scope: Namespaced
versions:
- name: v1
served: true
storage: true

Key fields:

  • group: The API group. Combined with the version, this gives demo.example.com/v1.
  • names: How the resource appears in kubectl (websites, ws).
  • scope: Namespaced or Cluster. Most CRDs are namespaced.
  • versions: One or more API versions. Exactly one must be the storage version.

The CRD schema defines what fields are allowed and their constraints:

schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
required:
- title
- replicas
properties:
title:
type: string
description: Title displayed on the website
replicas:
type: integer
minimum: 1
maximum: 5
description: Number of pods to run
color:
type: string
default: "#2196F3"
description: Theme color for the website
status:
type: object
properties:
availableReplicas:
type: integer
url:
type: string

The API server validates every create and update request against this schema. If replicas is set to 10, the request is rejected because maximum: 5. If title is omitted, the request is rejected because it is required.

FeaturePurposeExample
typeData typestring, integer, object, array
requiredMandatory fields["title", "replicas"]
minimum/maximumNumeric boundsminimum: 1, maximum: 5
patternRegex validationpattern: "^#[0-9a-fA-F]{6}$"
enumAllowed valuesenum: ["small", "medium", "large"]
defaultDefault value if not specifieddefault: "#2196F3"
formatSemantic formatformat: date-time
x-kubernetes-validationsCEL expressions (Kubernetes 1.25+)Custom logic

Common Expression Language allows writing validation rules that span multiple fields:

x-kubernetes-validations:
- rule: "self.minReplicas <= self.maxReplicas"
message: "minReplicas must not exceed maxReplicas"

CEL validation runs server-side without needing a webhook.


Printer columns control what kubectl get displays:

additionalPrinterColumns:
- name: Title
type: string
jsonPath: .spec.title
- name: Replicas
type: integer
jsonPath: .spec.replicas
- name: Color
type: string
jsonPath: .spec.color
- name: Age
type: date
jsonPath: .metadata.creationTimestamp

Running kubectl get websites -n crd-demo produces:

NAME TITLE REPLICAS COLOR AGE
my-blog My Personal Blog 2 #4CAF50 5m
docs-site Documentation Portal 1 #FF9800 5m

Without printer columns, kubectl get only shows NAME and AGE. Printer columns make custom resources feel like first-class citizens.


The demo enables the status subresource:

subresources:
status: {}

This creates a separate API endpoint for the status: /apis/demo.example.com/v1/namespaces/crd-demo/websites/my-blog/status.

Without the status subresource, updating the status requires updating the entire object. This means the controller must have update permission on the full resource, and status updates can conflict with spec updates from users.

With the status subresource:

  • Users update .spec via the main endpoint.
  • Controllers update .status via the /status endpoint.
  • RBAC can grant different permissions for spec and status.
  • Changes to .status do not increment .metadata.generation.

This separation is essential for the operator pattern. The user declares desired state in .spec. The operator reports actual state in .status.


CRDs can serve multiple API versions simultaneously:

versions:
- name: v1
served: true
storage: true
- name: v1beta1
served: true
storage: false
  • served: Clients can read and write this version via the API.
  • storage: Objects are stored in etcd in this version. Exactly one version must be the storage version.

When a client writes a v1beta1 object, the API server converts it to v1 before storing it. When a client reads a v1beta1 object, the API server converts the stored v1 object back to v1beta1.

For non-trivial version differences, you need a conversion webhook. The API server sends the object to your webhook, which converts between versions:

spec:
conversion:
strategy: Webhook
webhook:
conversionReviewVersions: ["v1"]
clientConfig:
service:
name: website-converter
namespace: crd-demo
path: /convert

The webhook receives a ConversionReview request and returns the converted object. This allows you to rename fields, change types, add defaults, and handle any structural changes between versions.

Without a conversion webhook, the API server uses None strategy, which only allows identical schemas across versions (essentially just renaming the version).


An operator is a controller that watches custom resources and reconciles the cluster state to match them. The demo implements this as a shell script:

apiVersion: apps/v1
kind: Deployment
metadata:
name: website-operator
namespace: crd-demo
spec:
replicas: 1
template:
spec:
serviceAccountName: website-operator
containers:
- name: operator
image: busybox:1.36
command: ["/bin/sh", "/scripts/reconcile.sh"]

The operator runs as a regular Deployment with a dedicated ServiceAccount that has the necessary RBAC permissions:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: website-operator
namespace: crd-demo
rules:
- apiGroups: ["demo.example.com"]
resources: ["websites", "websites/status"]
verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
- apiGroups: [""]
resources: ["services"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create"]

The demo’s operator runs a simple loop:

Terminal window
while true; do
# List all Website CRs
# For each Website, create or update Deployment + Service
sleep 10
done

It polls every 10 seconds, reads all Website resources, and ensures a matching Deployment and Service exist for each one.

The demo’s polling approach is level-triggered. It checks the current state on every cycle, regardless of what changed. If it missed an event, it catches up on the next poll.

Real operators use watches (event streams from the API server) for efficiency. A watch is edge-triggered: it fires when something changes. But production operators are still designed to be level-triggered in their logic. The watch triggers a reconciliation, but the reconciliation function always reads the full current state and computes the diff. It never relies on the event alone.

This distinction matters. A level-triggered controller is idempotent. You can restart it at any time. It reads the current state and converges. An edge-triggered controller that misses an event might leave the system in an inconsistent state.

In production operators (using controller-runtime in Go), the reconcile function receives a Request containing the namespace and name of the changed object:

func (r *WebsiteReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// 1. Fetch the Website CR
var website v1.Website
if err := r.Get(ctx, req.NamespacedName, &website); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// 2. Create or update the Deployment
// 3. Create or update the Service
// 4. Update the Website status
return ctrl.Result{}, nil
}

The function runs whenever the Website or any owned resource changes. It computes the difference between desired state and actual state, then takes the minimum action to converge.


The Kubernetes API supports watch requests. A watch opens a long-lived HTTP connection and streams events (ADDED, MODIFIED, DELETED) for a resource type.

GET /apis/demo.example.com/v1/namespaces/crd-demo/websites?watch=true

The operator receives events in real-time instead of polling.

An informer is a client-side cache backed by a watch. It:

  1. Lists all resources of a type (initial sync).
  2. Opens a watch to receive updates.
  3. Maintains an in-memory cache of all resources.
  4. Calls registered event handlers when resources change.

Informers are the backbone of controller-runtime. They make reads fast (cache hit) and writes efficient (react to changes).

Multiple controllers can share informers for the same resource type. This avoids redundant API server connections and reduces memory usage.


Owner references link child resources to their parent. When the parent is deleted, the garbage collector automatically deletes the children.

A production operator sets owner references on created resources:

apiVersion: apps/v1
kind: Deployment
metadata:
name: website-my-blog
ownerReferences:
- apiVersion: demo.example.com/v1
kind: Website
name: my-blog
uid: abc-123-def
controller: true
blockOwnerDeletion: true

When the Website my-blog is deleted, Kubernetes automatically deletes the website-my-blog Deployment. No operator intervention needed.

The demo’s shell operator does not set owner references (the JSON is simplified for learning). In production, this is essential. Without owner references, deleting a Website CR leaves orphaned Deployments and Services behind.

  • Foreground: Children are deleted first. The parent is deleted after all children are gone.
  • Background: The parent is deleted immediately. Children are garbage collected asynchronously.
  • Orphan: Children are not deleted. They become standalone resources.

Finalizers prevent a resource from being deleted until cleanup is complete. A finalizer is a string in .metadata.finalizers. As long as any finalizer is present, the resource enters a Terminating state but is not removed from etcd.

The operator’s workflow:

  1. Add finalizer when creating/reconciling the resource:

    metadata:
    finalizers:
    - websites.demo.example.com/cleanup
  2. Detect deletion: Check if .metadata.deletionTimestamp is set.

  3. Run cleanup: Delete external resources, revoke certificates, clean up cloud resources.

  4. Remove finalizer: Patch the object to remove the finalizer string.

  5. Kubernetes deletes the object once all finalizers are removed.

Finalizers are essential when your operator manages resources outside Kubernetes (cloud load balancers, DNS records, external databases).


Most production operators are written in Go using the controller-runtime library. It provides:

Sets up shared dependencies: the API client, informer caches, leader election.

mgr, err := ctrl.NewManager(cfg, ctrl.Options{
Scheme: scheme,
LeaderElection: true,
LeaderElectionID: "website-operator",
})

Registers a reconciler with watches:

ctrl.NewControllerManagedBy(mgr).
For(&v1.Website{}).
Owns(&appsv1.Deployment{}).
Owns(&corev1.Service{}).
Complete(reconciler)
  • For(&v1.Website{}): Watch Website resources. Trigger reconciliation on changes.
  • Owns(&appsv1.Deployment{}): Watch Deployments owned by Websites. If a Deployment changes, reconcile the owning Website.
  • Complete(reconciler): Wire up the reconciler function.

In production, operators run with multiple replicas for high availability. Only one replica (the leader) actively reconciles. If the leader crashes, another replica takes over.

Leader election uses a Lease object in the cluster. The leader holds the lease and renews it periodically. If the lease expires, another replica acquires it.


How Production Operators Implement These Patterns

Section titled “How Production Operators Implement These Patterns”

The CloudNativePG operator manages PostgreSQL clusters. It demonstrates:

  • Multi-version CRD: Supports v1 with full validation.
  • Status subresource: Reports cluster health, replication lag, backup status.
  • Owner references: The Cluster CR owns Pods, PVCs, Services, ConfigMaps.
  • Finalizers: Cleans up PVCs, certificates, and backup repositories on deletion.
  • Watches: Watches Cluster CRs, Pods, Nodes, and Secrets.
  • Pod ordinal management: Uses stable pod names for primary/replica identity (similar to StatefulSets but with custom logic).

The cert-manager operator manages TLS certificates. It demonstrates:

  • Multiple CRDs: Certificate, Issuer, ClusterIssuer, CertificateRequest, Order, Challenge.
  • Cross-resource reconciliation: A Certificate triggers a CertificateRequest, which triggers an Order, which triggers Challenges.
  • Finalizers: Cleans up ACME challenges (DNS records, HTTP endpoints) on deletion.
  • Status conditions: Reports certificate readiness, expiration, and renewal status.
  • Aggregated ClusterRoles: Adds certificate permissions to the built-in admin and edit roles.

Building Your Own Operator: Decision Framework

Section titled “Building Your Own Operator: Decision Framework”

Not every CRD needs an operator. Sometimes a CRD is just structured configuration storage, and an external system (CI/CD pipeline, GitOps tool) reads it.

You need an operator when:

  • Custom resources should create or manage other Kubernetes resources.
  • The reconciliation logic requires continuous monitoring and correction.
  • Resources have lifecycle hooks (creation, update, deletion) with side effects.
ApproachBest ForLimitations
Shell scriptLearning, prototypingNo watches, no proper error handling
Python (kopf)Medium complexity, rapid developmentPerformance at scale
Go (controller-runtime)Production operatorsSteeper learning curve
Java (JOSDK)Java shopsHigher resource usage

The demo uses a shell script. It polls, does crude JSON parsing, and has no error recovery. This is fine for understanding the concept. For production, use controller-runtime.


The demo builds up the operator pattern step by step:

  1. CRD registration: website-crd.yaml teaches Kubernetes about Websites.
  2. Custom resources without an operator: website-samples.yaml creates Websites. They exist in etcd but produce no pods.
  3. Operator deployment: The shell-based operator polls for Websites and creates Deployments and Services.
  4. Reconciliation: Changing a Website’s replicas or title triggers the operator to update the Deployment.

The gap between step 2 (data only) and step 3 (data plus behavior) is the core insight. CRDs provide the API. Operators provide the automation.


If the operator updates its own CR’s status, and the status update triggers a reconciliation, you get an infinite loop. Use generation-based checks: only reconcile when .metadata.generation changes (spec changes), not when .metadata.resourceVersion changes (any change including status).

Operators need explicit permissions for every resource they touch. The demo’s operator needs access to websites, deployments, services, and events. Missing a permission causes authorization errors at runtime.

Without owner references, deleting a CR leaves child resources behind. Over time, these orphaned resources accumulate.

The demo polls every 10 seconds. On a cluster with thousands of CRs, this creates unnecessary API server load. Production operators use watches for efficiency.