CRDs & Operators: Deep Dive

This document explains how CustomResourceDefinitions extend the Kubernetes API, why the operator pattern uses a reconciliation loop, and when to use features like status subresources, finalizers, and owner references. It connects the demo’s shell-based operator to production patterns used by CloudNativePG, cert-manager, and similar projects.

What CRDs Do

A CustomResourceDefinition teaches the Kubernetes API server about a new resource type. After applying the demo’s CRD, you can use kubectl get websites just like kubectl get pods.

The CRD is pure API registration. It stores your custom objects in etcd. It validates them against a schema. It supports CRUD operations. But it does nothing else. No pods are created. No side effects happen. CRDs are data, not behavior.

The demo’s CRD:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: websites.demo.example.com
spec:
  group: demo.example.com
  names:
    kind: Website
    listKind: WebsiteList
    plural: websites
    singular: website
    shortNames:
      - ws
  scope: Namespaced
  versions:
    - name: v1
      served: true
      storage: true

Key fields:

group: The API group. Combined with the version, this gives demo.example.com/v1.
names: How the resource appears in kubectl (websites, ws).
scope: Namespaced or Cluster. Most CRDs are namespaced.
versions: One or more API versions. Exactly one must be the storage version.

OpenAPI v3 Validation Schema

The CRD schema defines what fields are allowed and their constraints:

schema:
  openAPIV3Schema:
    type: object
    properties:
      spec:
        type: object
        required:
          - title
          - replicas
        properties:
          title:
            type: string
            description: Title displayed on the website
          replicas:
            type: integer
            minimum: 1
            maximum: 5
            description: Number of pods to run
          color:
            type: string
            default: "#2196F3"
            description: Theme color for the website
      status:
        type: object
        properties:
          availableReplicas:
            type: integer
          url:
            type: string

The API server validates every create and update request against this schema. If replicas is set to 10, the request is rejected because maximum: 5. If title is omitted, the request is rejected because it is required.

Validation Features

Feature	Purpose	Example
`type`	Data type	`string`, `integer`, `object`, `array`
`required`	Mandatory fields	`["title", "replicas"]`
`minimum/maximum`	Numeric bounds	`minimum: 1, maximum: 5`
`pattern`	Regex validation	`pattern: "^#[0-9a-fA-F]{6}$"`
`enum`	Allowed values	`enum: ["small", "medium", "large"]`
`default`	Default value if not specified	`default: "#2196F3"`
`format`	Semantic format	`format: date-time`
`x-kubernetes-validations`	CEL expressions (Kubernetes 1.25+)	Custom logic

CEL Validation (Kubernetes 1.25+)

Common Expression Language allows writing validation rules that span multiple fields:

x-kubernetes-validations:
  - rule: "self.minReplicas <= self.maxReplicas"
    message: "minReplicas must not exceed maxReplicas"

CEL validation runs server-side without needing a webhook.

Printer Columns

Printer columns control what kubectl get displays:

additionalPrinterColumns:
  - name: Title
    type: string
    jsonPath: .spec.title
  - name: Replicas
    type: integer
    jsonPath: .spec.replicas
  - name: Color
    type: string
    jsonPath: .spec.color
  - name: Age
    type: date
    jsonPath: .metadata.creationTimestamp

Running kubectl get websites -n crd-demo produces:

NAME        TITLE                   REPLICAS   COLOR     AGE
my-blog     My Personal Blog        2          #4CAF50   5m
docs-site   Documentation Portal    1          #FF9800   5m

Without printer columns, kubectl get only shows NAME and AGE. Printer columns make custom resources feel like first-class citizens.

Status Subresource

The demo enables the status subresource:

subresources:
  status: {}

This creates a separate API endpoint for the status: /apis/demo.example.com/v1/namespaces/crd-demo/websites/my-blog/status.

Why Separate Status

Without the status subresource, updating the status requires updating the entire object. This means the controller must have update permission on the full resource, and status updates can conflict with spec updates from users.

With the status subresource:

Users update .spec via the main endpoint.
Controllers update .status via the /status endpoint.
RBAC can grant different permissions for spec and status.
Changes to .status do not increment .metadata.generation.

This separation is essential for the operator pattern. The user declares desired state in .spec. The operator reports actual state in .status.

CRD Versioning and Storage Versions

CRDs can serve multiple API versions simultaneously:

versions:
  - name: v1
    served: true
    storage: true
  - name: v1beta1
    served: true
    storage: false

served: Clients can read and write this version via the API.
storage: Objects are stored in etcd in this version. Exactly one version must be the storage version.

When a client writes a v1beta1 object, the API server converts it to v1 before storing it. When a client reads a v1beta1 object, the API server converts the stored v1 object back to v1beta1.

Conversion Webhooks

For non-trivial version differences, you need a conversion webhook. The API server sends the object to your webhook, which converts between versions:

spec:
  conversion:
    strategy: Webhook
    webhook:
      conversionReviewVersions: ["v1"]
      clientConfig:
        service:
          name: website-converter
          namespace: crd-demo
          path: /convert

The webhook receives a ConversionReview request and returns the converted object. This allows you to rename fields, change types, add defaults, and handle any structural changes between versions.

Without a conversion webhook, the API server uses None strategy, which only allows identical schemas across versions (essentially just renaming the version).

The Operator Pattern

An operator is a controller that watches custom resources and reconciles the cluster state to match them. The demo implements this as a shell script:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: website-operator
  namespace: crd-demo
spec:
  replicas: 1
  template:
    spec:
      serviceAccountName: website-operator
      containers:
        - name: operator
          image: busybox:1.36
          command: ["/bin/sh", "/scripts/reconcile.sh"]

The operator runs as a regular Deployment with a dedicated ServiceAccount that has the necessary RBAC permissions:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: website-operator
  namespace: crd-demo
rules:
  - apiGroups: ["demo.example.com"]
    resources: ["websites", "websites/status"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: ["apps"]
    resources: ["deployments"]
    verbs: ["get", "list", "watch", "create", "update", "delete"]
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get", "list", "watch", "create", "update", "delete"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create"]

The Reconciliation Loop

The demo’s operator runs a simple loop:

while true; do
  # List all Website CRs
  # For each Website, create or update Deployment + Service
  sleep 10
done

It polls every 10 seconds, reads all Website resources, and ensures a matching Deployment and Service exist for each one.

Level-Triggered vs Edge-Triggered

The demo’s polling approach is level-triggered. It checks the current state on every cycle, regardless of what changed. If it missed an event, it catches up on the next poll.

Real operators use watches (event streams from the API server) for efficiency. A watch is edge-triggered: it fires when something changes. But production operators are still designed to be level-triggered in their logic. The watch triggers a reconciliation, but the reconciliation function always reads the full current state and computes the diff. It never relies on the event alone.

This distinction matters. A level-triggered controller is idempotent. You can restart it at any time. It reads the current state and converges. An edge-triggered controller that misses an event might leave the system in an inconsistent state.

The Reconcile Function

In production operators (using controller-runtime in Go), the reconcile function receives a Request containing the namespace and name of the changed object:

func (r *WebsiteReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // 1. Fetch the Website CR
    var website v1.Website
    if err := r.Get(ctx, req.NamespacedName, &website); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 2. Create or update the Deployment
    // 3. Create or update the Service
    // 4. Update the Website status

    return ctrl.Result{}, nil
}

The function runs whenever the Website or any owned resource changes. It computes the difference between desired state and actual state, then takes the minimum action to converge.

Watches and Informers

Watches

The Kubernetes API supports watch requests. A watch opens a long-lived HTTP connection and streams events (ADDED, MODIFIED, DELETED) for a resource type.

GET /apis/demo.example.com/v1/namespaces/crd-demo/websites?watch=true

The operator receives events in real-time instead of polling.

Informers

An informer is a client-side cache backed by a watch. It:

Lists all resources of a type (initial sync).
Opens a watch to receive updates.
Maintains an in-memory cache of all resources.
Calls registered event handlers when resources change.

Informers are the backbone of controller-runtime. They make reads fast (cache hit) and writes efficient (react to changes).

SharedInformerFactory

Multiple controllers can share informers for the same resource type. This avoids redundant API server connections and reduces memory usage.

Owner References and Garbage Collection

Owner references link child resources to their parent. When the parent is deleted, the garbage collector automatically deletes the children.

A production operator sets owner references on created resources:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: website-my-blog
  ownerReferences:
    - apiVersion: demo.example.com/v1
      kind: Website
      name: my-blog
      uid: abc-123-def
      controller: true
      blockOwnerDeletion: true

When the Website my-blog is deleted, Kubernetes automatically deletes the website-my-blog Deployment. No operator intervention needed.

The demo’s shell operator does not set owner references (the JSON is simplified for learning). In production, this is essential. Without owner references, deleting a Website CR leaves orphaned Deployments and Services behind.

Cascade Delete Policies

Foreground: Children are deleted first. The parent is deleted after all children are gone.
Background: The parent is deleted immediately. Children are garbage collected asynchronously.
Orphan: Children are not deleted. They become standalone resources.

Finalizers

Finalizers prevent a resource from being deleted until cleanup is complete. A finalizer is a string in .metadata.finalizers. As long as any finalizer is present, the resource enters a Terminating state but is not removed from etcd.

The operator’s workflow:

Add finalizer when creating/reconciling the resource:

metadata:
  finalizers:
    - websites.demo.example.com/cleanup

Detect deletion: Check if .metadata.deletionTimestamp is set.
Run cleanup: Delete external resources, revoke certificates, clean up cloud resources.
Remove finalizer: Patch the object to remove the finalizer string.
Kubernetes deletes the object once all finalizers are removed.

Finalizers are essential when your operator manages resources outside Kubernetes (cloud load balancers, DNS records, external databases).

The controller-runtime Framework (Go)

Most production operators are written in Go using the controller-runtime library. It provides:

Manager

Sets up shared dependencies: the API client, informer caches, leader election.

mgr, err := ctrl.NewManager(cfg, ctrl.Options{
    Scheme:           scheme,
    LeaderElection:   true,
    LeaderElectionID: "website-operator",
})

Controller

Registers a reconciler with watches:

ctrl.NewControllerManagedBy(mgr).
    For(&v1.Website{}).
    Owns(&appsv1.Deployment{}).
    Owns(&corev1.Service{}).
    Complete(reconciler)

For(&v1.Website{}): Watch Website resources. Trigger reconciliation on changes.
Owns(&appsv1.Deployment{}): Watch Deployments owned by Websites. If a Deployment changes, reconcile the owning Website.
Complete(reconciler): Wire up the reconciler function.

Leader Election

In production, operators run with multiple replicas for high availability. Only one replica (the leader) actively reconciles. If the leader crashes, another replica takes over.

Leader election uses a Lease object in the cluster. The leader holds the lease and renews it periodically. If the lease expires, another replica acquires it.

How Production Operators Implement These Patterns

CloudNativePG

The CloudNativePG operator manages PostgreSQL clusters. It demonstrates:

Multi-version CRD: Supports v1 with full validation.
Status subresource: Reports cluster health, replication lag, backup status.
Owner references: The Cluster CR owns Pods, PVCs, Services, ConfigMaps.
Finalizers: Cleans up PVCs, certificates, and backup repositories on deletion.
Watches: Watches Cluster CRs, Pods, Nodes, and Secrets.
Pod ordinal management: Uses stable pod names for primary/replica identity (similar to StatefulSets but with custom logic).

cert-manager

The cert-manager operator manages TLS certificates. It demonstrates:

Multiple CRDs: Certificate, Issuer, ClusterIssuer, CertificateRequest, Order, Challenge.
Cross-resource reconciliation: A Certificate triggers a CertificateRequest, which triggers an Order, which triggers Challenges.
Finalizers: Cleans up ACME challenges (DNS records, HTTP endpoints) on deletion.
Status conditions: Reports certificate readiness, expiration, and renewal status.
Aggregated ClusterRoles: Adds certificate permissions to the built-in admin and edit roles.

Building Your Own Operator: Decision Framework

Do You Need an Operator?

Not every CRD needs an operator. Sometimes a CRD is just structured configuration storage, and an external system (CI/CD pipeline, GitOps tool) reads it.

You need an operator when:

Custom resources should create or manage other Kubernetes resources.
The reconciliation logic requires continuous monitoring and correction.
Resources have lifecycle hooks (creation, update, deletion) with side effects.

Shell vs Go vs Python

Approach	Best For	Limitations
Shell script	Learning, prototyping	No watches, no proper error handling
Python (kopf)	Medium complexity, rapid development	Performance at scale
Go (controller-runtime)	Production operators	Steeper learning curve
Java (JOSDK)	Java shops	Higher resource usage

The demo uses a shell script. It polls, does crude JSON parsing, and has no error recovery. This is fine for understanding the concept. For production, use controller-runtime.

Connection to the Demo

The demo builds up the operator pattern step by step:

CRD registration: website-crd.yaml teaches Kubernetes about Websites.
Custom resources without an operator: website-samples.yaml creates Websites. They exist in etcd but produce no pods.
Operator deployment: The shell-based operator polls for Websites and creates Deployments and Services.
Reconciliation: Changing a Website’s replicas or title triggers the operator to update the Deployment.

The gap between step 2 (data only) and step 3 (data plus behavior) is the core insight. CRDs provide the API. Operators provide the automation.

Common Pitfalls

Infinite Reconciliation Loops

If the operator updates its own CR’s status, and the status update triggers a reconciliation, you get an infinite loop. Use generation-based checks: only reconcile when .metadata.generation changes (spec changes), not when .metadata.resourceVersion changes (any change including status).

Missing RBAC

Operators need explicit permissions for every resource they touch. The demo’s operator needs access to websites, deployments, services, and events. Missing a permission causes authorization errors at runtime.

Not Setting Owner References

Without owner references, deleting a CR leaves child resources behind. Over time, these orphaned resources accumulate.

Polling Instead of Watching

The demo polls every 10 seconds. On a cluster with thousands of CRs, this creates unnecessary API server load. Production operators use watches for efficiency.