CloudNativePG: Deep Dive

A technical deep-dive into how CloudNativePG works, why it exists, and what happens under the hood when you deploy a PostgreSQL cluster on Kubernetes with CNPG.

This document complements the demo README. The README walks you through deploying and testing a cluster. This document explains the machinery behind it.

Why CloudNativePG Exists
CNPG Architecture
The Cluster CRD Field by Field
Replication Topology
Failover Mechanics
Services and Connection Routing
Secrets and Credential Management
Backup and Recovery
Connection Pooling with PgBouncer
Comparison: Plain Deployment vs StatefulSet vs CloudNativePG

1. Why CloudNativePG Exists

The Problem

PostgreSQL is a stateful workload. Kubernetes was designed for stateless workloads. Running PostgreSQL on Kubernetes with standard primitives exposes several gaps.

A plain Deployment gives you:

A single pod with a PVC. No replication. If the pod dies, Kubernetes restarts it, but the database is unavailable until the restart completes. There is no standby to promote.
No automatic backups. You have to set up CronJobs, manage retention, handle WAL archiving yourself, and hope your restore process actually works.
Manual credential management. You create Secrets by hand, wire them into environment variables, and rotate them manually.
PVC lifecycle is your problem. If you delete the Deployment, the PVC might stick around. Or it might not. Depends on your reclaim policy.

A StatefulSet is slightly better:

You get stable network identities (pg-0, pg-1, pg-2) and ordered pod creation.
Each pod gets its own PVC automatically.
But a StatefulSet knows nothing about PostgreSQL. It does not configure streaming replication. It does not detect a failed primary. It does not promote a standby. It does not update services to point to the new primary.

You end up writing custom scripts, sidecar containers, and init containers to glue PostgreSQL replication logic onto Kubernetes primitives. This is fragile, hard to test, and painful to maintain.

How Operators Solve This

The Kubernetes Operator pattern extends the API server with custom resources and controllers that encode domain-specific operational knowledge. An operator for PostgreSQL understands:

How to bootstrap a new cluster with initdb
How to configure streaming replication between a primary and standbys
How to detect a failed primary and promote a standby
How to update service endpoints so applications reconnect transparently
How to manage credentials, certificates, backups, and restores

You declare what you want. The operator figures out how to get there and how to keep it there.

CloudNativePG (CNPG) is a CNCF Sandbox project that implements this pattern for PostgreSQL. It was designed from scratch for Kubernetes, as opposed to being a port of a pre-Kubernetes HA solution.

2. CNPG Architecture

The Operator Pattern

CNPG follows the standard Kubernetes operator architecture:

Custom Resource Definitions (CRDs) extend the Kubernetes API with new types like Cluster, Backup, ScheduledBackup, and Pooler.
A controller-manager (cnpg-controller-manager) runs as a Deployment in the cnpg-system namespace. It watches for changes to these custom resources.
Reconciliation loops continuously compare the desired state (what you declared in the CRD) with the actual state (what exists in the cluster) and take action to converge them.

When you apply a Cluster resource, the controller-manager:

Creates pods (one per PostgreSQL instance)
Creates PVCs for each pod
Runs initdb on the first pod (the primary)
Configures streaming replication on subsequent pods (the standbys)
Creates services for routing (-rw, -ro, -r)
Creates secrets with credentials
Starts monitoring the health of all instances

If something drifts from the desired state (a pod dies, replication breaks, a new instance is needed), the reconciliation loop detects it and corrects it.

No StatefulSet Under the Hood

This is a key design decision. CNPG does not use StatefulSets. It manages pods directly.

Why? StatefulSets impose ordering constraints and identity semantics that conflict with how PostgreSQL failover works. When a primary fails, CNPG needs to promote a specific standby immediately. It cannot wait for StatefulSet ordering rules. It needs full control over which pod has which role, which PVC is attached where, and which services point to which endpoints.

By managing pods directly, CNPG can:

Promote any standby to primary without renaming or restarting pods
Reattach PVCs to different pods during recovery
Update service endpoints within seconds of a failover
Perform rolling updates with fine-grained control over the order

Instance Manager

Each PostgreSQL pod runs an instance manager process (not an external sidecar). This is a Go binary that:

Starts and supervises the PostgreSQL process
Handles liveness and readiness probes
Communicates status back to the controller-manager via pod annotations and conditions
Manages local WAL archiving and restoration
Handles graceful shutdown and pg_rewind operations

The instance manager runs as PID 1 in the container. PostgreSQL runs as a child process. This gives the instance manager full lifecycle control.

How CNPG Differs from Other PostgreSQL Operators

There are three major PostgreSQL operators for Kubernetes:

	CloudNativePG	Crunchy PGO	Zalando postgres-operator
HA mechanism	Built-in, no external dependency	Uses Patroni (etcd required)	Uses Patroni (etcd required)
Pod management	Direct pod management	StatefulSet-based	StatefulSet-based
Failover agent	Instance manager (in-process)	Patroni sidecar	Patroni sidecar
CNCF status	Sandbox project	Not CNCF	Not CNCF
WAL storage	Object storage (S3, GCS, Azure)	Object storage + PVC	Object storage (S3, GCS)
Connection pooling	Built-in Pooler CRD (PgBouncer)	Built-in PgBouncer	External
Declarative config	Single Cluster CRD	Multiple CRDs	Single postgresql CRD

The biggest architectural difference is that CNPG does not depend on an external consensus store like etcd for leader election. It uses Kubernetes lease objects instead. This removes a significant operational dependency. Patroni-based solutions require a healthy etcd cluster for failover decisions. CNPG requires only a healthy Kubernetes API server, which you already have.

3. The Cluster CRD Field by Field

The Demo Cluster

Here is the Cluster resource from this demo:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: demo-pg
  namespace: cnpg-demo
spec:
  instances: 3

  bootstrap:
    initdb:
      database: app
      owner: app

  storage:
    size: 1Gi

  resources:
    requests:
      memory: 256Mi
      cpu: 100m
    limits:
      memory: 512Mi
      cpu: 500m

  postgresql:
    parameters:
      shared_buffers: "128MB"
      log_statement: "all"

  monitoring:
    enablePodMonitor: false

Let’s walk through each section.

`instances: 3`

The total number of PostgreSQL instances. CNPG always creates exactly one primary. The rest are streaming replicas. With instances: 3, you get one primary and two standbys.

Changing this value and re-applying the manifest scales the cluster. Increasing it adds new standbys. Decreasing it removes standbys (never the primary). The operator handles replication setup for new instances automatically.

`bootstrap.initdb`

Controls how the cluster is initialized on first creation. This runs initdb on the primary pod to create the PostgreSQL data directory.

bootstrap:
  initdb:
    database: app
    owner: app

database: The application database to create (in addition to the default postgres database).
owner: The PostgreSQL role that owns this database. CNPG auto-generates a password for this user and stores it in a Kubernetes Secret named <cluster>-app.

Other initdb options not used in this demo:

dataChecksums: Enables data checksums for corruption detection (recommended for production).
encoding: Character encoding (default UTF8).
localeCType / localeCollate: Locale settings.
postInitSQL: SQL statements to run after initialization.
postInitApplicationSQL: SQL statements to run as the application user after initialization. Useful for creating tables, extensions, or seed data.
import: Import data from an existing PostgreSQL database during bootstrap.

`storage`

storage:
  size: 1Gi

Defines the PVC for each instance’s PGDATA directory. CNPG creates one PVC per pod. The default storage class is used unless you specify one.

In production, you would typically also set:

storage:
  size: 50Gi
  storageClass: gp3-csi    # or your preferred storage class

CNPG also supports a separate walStorage section for placing WAL files on a different volume, which can improve I/O performance by separating WAL writes from data writes.

`resources`

resources:
  requests:
    memory: 256Mi
    cpu: 100m
  limits:
    memory: 512Mi
    cpu: 500m

Standard Kubernetes resource requests and limits. These apply to the PostgreSQL container in each pod. A few production considerations:

Memory limits matter a lot for PostgreSQL. If the PostgreSQL process exceeds the memory limit, the OOM killer terminates it. Set shared_buffers and work_mem so that peak memory usage stays well within the limit.
CPU limits are debatable. Some teams remove CPU limits entirely to avoid throttling during query spikes. CPU requests are what matters for scheduling.

`postgresql.parameters`

postgresql:
  parameters:
    shared_buffers: "128MB"
    log_statement: "all"

These map directly to postgresql.conf parameters. You can set any PostgreSQL configuration parameter here. CNPG applies them and handles pg_ctl reload or pod restart as needed (some parameters require a restart).

Common production parameters:

postgresql:
  parameters:
    shared_buffers: "1GB"
    effective_cache_size: "3GB"
    work_mem: "16MB"
    maintenance_work_mem: "256MB"
    max_connections: "200"
    wal_level: "logical"        # if you need logical replication
    log_statement: "ddl"
    log_min_duration_statement: "1000"  # log slow queries > 1s

`monitoring`

monitoring:
  enablePodMonitor: false

When set to true, CNPG creates a PodMonitor resource that Prometheus can scrape. Each PostgreSQL pod exposes metrics on port 9187 via the built-in exporter. This is disabled in the demo because minikube does not typically have the Prometheus Operator installed.

In production with OpenShift, you would set this to true and the built-in monitoring stack picks up the metrics automatically.

Important Fields Not in This Demo

The demo manifest is intentionally minimal. Here are fields you will encounter in production clusters.

`backup` and `scheduledBackup`

Configures continuous backup to object storage. See Section 8 for details.

spec:
  backup:
    barmanObjectStore:
      destinationPath: s3://my-bucket/cnpg-backups/
      s3Credentials:
        accessKeyId:
          name: backup-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: backup-creds
          key: ACCESS_SECRET_KEY
    retentionPolicy: "30d"

`replica`

Configures a replica cluster, which is a full read-only copy of another CNPG cluster. Used for disaster recovery across regions or clusters.

spec:
  replica:
    enabled: true
    source: primary-cluster

`affinity`

Controls pod scheduling to spread instances across nodes and availability zones.

spec:
  affinity:
    enablePodAntiAffinity: true
    topologyKey: kubernetes.io/hostname
    podAntiAffinityType: required

With required anti-affinity, Kubernetes will not schedule two instances on the same node. This is critical for HA, because a node failure should take out at most one instance.

`certificates`

CNPG auto-generates TLS certificates for client and replication connections by default. You can provide your own CA or certificates.

spec:
  certificates:
    serverCASecret: my-ca-secret
    serverTLSSecret: my-tls-secret
    clientCASecret: my-client-ca-secret

`superuserSecret`

By default, CNPG disables the postgres superuser for security. If you need superuser access, you can enable it and provide or let CNPG generate the secret.

spec:
  enableSuperuserAccess: true
  superuserSecret:
    name: my-superuser-secret

4. Replication Topology

Streaming Replication

CNPG uses PostgreSQL’s built-in streaming replication. This is the same replication mechanism used by PostgreSQL outside of Kubernetes. Nothing exotic.

The primary accepts writes and generates WAL (Write-Ahead Log) records. Standbys connect to the primary via a replication connection and continuously stream WAL records. Each standby replays the WAL records to keep its data directory in sync with the primary.

The replication connection uses a dedicated replication slot for each standby. Replication slots prevent the primary from discarding WAL segments that a standby has not yet received. This guarantees no data loss during temporary standby outages, at the cost of WAL accumulation on the primary if a standby is down for a long time.

WAL Shipping

In addition to streaming replication (which is a direct TCP connection), CNPG can archive WAL segments to object storage (S3, GCS, Azure Blob). This serves two purposes:

Point-in-time recovery (PITR): You can restore the database to any point in time by replaying WAL from a base backup.
Standby bootstrap: New standbys can be created from the archived WAL instead of needing a full pg_basebackup from the primary.

When object storage is configured, the primary continuously archives completed WAL segments. Standbys first restore from object storage, then switch to streaming for the most recent WAL.

Synchronous vs Asynchronous Replication

By default, CNPG uses asynchronous replication. The primary does not wait for standbys to confirm WAL receipt before committing a transaction. This gives you the best write performance, but in a failure scenario, the most recently committed transactions (typically sub-second) on the primary might not have reached any standby yet.

For workloads that cannot tolerate any data loss, CNPG supports synchronous replication:

spec:
  postgresql:
    parameters:
      synchronous_commit: "on"
  minSyncReplicas: 1
  maxSyncReplicas: 2

With synchronous replication, the primary waits for at least minSyncReplicas standbys to confirm WAL receipt before reporting a transaction as committed. This guarantees zero data loss (RPO=0) at the cost of write latency, because every commit requires a network round-trip to a standby.

Promotion Decision

When the primary fails, the operator must choose which standby to promote. CNPG selects the standby with the most up-to-date WAL position (the one with the least replication lag). This minimizes data loss. If multiple standbys are at the same position, topology preferences (node, zone) may influence the choice.

5. Failover Mechanics

This is the core value proposition of CNPG. Here is what happens, step by step, when the primary pod dies.

Step 1: Failure Detection

The controller-manager continuously monitors all instances. It checks:

Pod conditions (is the pod running?)
Instance manager health endpoint (is the PostgreSQL process healthy?)
Replication status (is the instance replicating?)

The Kubernetes kubelet also performs liveness probes. If the PostgreSQL process inside the pod is unresponsive, the kubelet restarts the container. If the node itself fails, the pod enters a Terminating or Unknown state.

CNPG’s detection is fast. In the demo, when you kubectl delete pod the primary, the controller-manager notices within seconds because Kubernetes immediately reports the pod deletion event through its watch mechanism.

Step 2: Leader Election

The controller-manager evaluates all surviving standbys. It queries each standby’s WAL receive position (the pg_last_wal_receive_lsn() function) and selects the one that is most up to date.

Unlike Patroni-based solutions, this decision does not require an external consensus store. The controller-manager is the single decision-maker. It uses Kubernetes Lease objects for its own leader election (in case there are multiple controller-manager replicas), but the PostgreSQL promotion decision is made directly by the controller.

Step 3: Standby Promotion

The controller-manager instructs the chosen standby’s instance manager to promote. The instance manager calls pg_promote(), which takes the standby out of recovery mode and makes it a full read-write primary.

This is fast. PostgreSQL promotion typically completes in under a second.

Step 4: Service Endpoint Update

The controller-manager updates the Endpoints (or EndpointSlices) for the -rw service to point to the new primary pod. It also updates the -ro service to remove the promoted pod (since it is no longer a standby) and re-adds it to the -r (read-any) service.

Applications connected through the -rw service DNS name will have their next connection attempt routed to the new primary. Existing TCP connections to the old primary will be broken, so applications need connection retry logic. This is standard database client behavior.

Step 5: Old Primary Recovery

When the old primary pod comes back (either because Kubernetes restarts it on the same node, or because a new pod is scheduled), it cannot simply rejoin as a standby. Its data directory was ahead of the current standbys at the time of failure, and it may contain WAL that was never replicated.

The instance manager uses pg_rewind to rewind the old primary’s data directory to the point where it diverged from the new primary. It then starts PostgreSQL in standby mode, connecting to the new primary for streaming replication.

If pg_rewind fails (typically because the divergence is too large or WAL is missing), the instance manager falls back to a full pg_basebackup from the new primary.

Timeline

In the demo, the full failover sequence, from primary deletion to a new primary accepting writes, typically completes in 5 to 15 seconds. Most of that time is Kubernetes pod lifecycle overhead, not CNPG or PostgreSQL.

6. Services and Connection Routing

CNPG automatically creates three Kubernetes services for each cluster. The demo cluster demo-pg gets:

Service	DNS Name	Targets	Purpose
`demo-pg-rw`	`demo-pg-rw.cnpg-demo.svc`	Primary only	Writes, DDL, transactions
`demo-pg-ro`	`demo-pg-ro.cnpg-demo.svc`	Standbys only	Read-heavy queries, reporting
`demo-pg-r`	`demo-pg-r.cnpg-demo.svc`	All instances	Reads that tolerate slight staleness

How Endpoint Updates Work

These services do not use label selectors to find pods. Instead, CNPG manages the Endpoint objects directly. The controller-manager explicitly sets which pod IPs appear in each service’s Endpoints.

During failover:

The old primary’s IP is removed from demo-pg-rw Endpoints.
The new primary’s IP is added to demo-pg-rw Endpoints.
The new primary’s IP is removed from demo-pg-ro Endpoints (it is no longer a standby).
The old primary’s IP is added to demo-pg-ro Endpoints when it comes back as a standby.

This happens atomically from the application’s perspective. The service DNS name stays the same. The underlying IP changes.

Why Applications Should Use Service DNS Names

Never hard-code pod IPs or pod hostnames in application connection strings. Pods are ephemeral. Their IPs change across restarts. Their hostnames are only meaningful within the context of a StatefulSet (which CNPG doesn’t use).

Use the service DNS names:

# For writes
postgresql://app:password@demo-pg-rw.cnpg-demo.svc:5432/app

# For reads
postgresql://app:password@demo-pg-ro.cnpg-demo.svc:5432/app

This is exactly what the client pod in the demo does:

apiVersion: v1
kind: Pod
metadata:
  name: pg-client
  namespace: cnpg-demo
spec:
  containers:
    - name: psql
      image: postgres:16-alpine
      command: ["sleep", "infinity"]

The client pod connects to demo-pg-rw:5432 by DNS name. When failover happens, the next connection attempt automatically goes to the new primary. No application changes needed.

Read/Write Splitting

The separate -rw and -ro services make read/write splitting straightforward at the application level. Send writes to -rw, send reads to -ro. Many frameworks (Django, Rails, Spring) have built-in support for multiple database connections.

The -r service (all instances) is useful when you want maximum read throughput and can tolerate reading from the primary or any standby. It load-balances across all instances.

7. Secrets and Credential Management

Auto-Generated Secrets

When you create a Cluster, CNPG automatically generates Kubernetes Secrets for database credentials. For the demo cluster demo-pg, it creates:

demo-pg-app: Credentials for the application user (app), which is the owner of the app database.
demo-pg-superuser: Credentials for the postgres superuser (only if enableSuperuserAccess: true).

Secret Contents

Each secret contains multiple keys for convenience:

Key	Example Value
`username`	`app`
`password`	`<auto-generated>`
`host`	`demo-pg-rw.cnpg-demo.svc`
`port`	`5432`
`dbname`	`app`
`uri`	`postgresql://app:pass@demo-pg-rw.cnpg-demo.svc:5432/app`
`jdbc-uri`	`jdbc:postgresql://demo-pg-rw.cnpg-demo.svc:5432/app?...`
`pgpass`	`demo-pg-rw.cnpg-demo.svc:5432:app:app:pass`

The host field points to the -rw service by default. The uri and jdbc-uri fields are ready-to-use connection strings.

Using Secrets in Applications

You can mount these secrets as environment variables in your application pods:

env:
  - name: DATABASE_URL
    valueFrom:
      secretKeyRef:
        name: demo-pg-app
        key: uri

Or mount the entire secret as a volume and read the files.

Credential Rotation

To rotate credentials:

Update the Secret (either manually or via an external secret manager).
CNPG detects the change and updates the PostgreSQL role’s password to match.
Applications reading the Secret will get the new credentials on their next Secret refresh.

For zero-downtime rotation, use two application users and rotate them alternately. CNPG does not do automatic periodic rotation out of the box, but it integrates with external secret managers that do.

TLS Certificates

CNPG generates TLS certificates for all connections by default. Both client-to-server and replication connections use TLS. The certificates are stored in Secrets and rotated automatically. You can bring your own CA if needed for integration with corporate PKI.

8. Backup and Recovery

Backup and recovery require object storage (S3, MinIO, GCS, Azure Blob Storage). This is beyond the scope of the minikube demo, but the concepts are important to understand.

How Backups Work

CNPG uses Barman Cloud under the hood, a set of Python tools from the Barman project (the standard PostgreSQL backup solution).

There are two components to backup:

Base backups: A full copy of the PostgreSQL data directory, compressed and uploaded to object storage. These are taken periodically (e.g., daily or weekly).
Continuous WAL archiving: Every completed WAL segment (16MB by default) is immediately uploaded to object storage. This captures every change between base backups.

Together, a base backup plus all WAL segments since that backup allow you to restore to any point in time.

Configuring Backup

spec:
  backup:
    barmanObjectStore:
      destinationPath: s3://my-bucket/cnpg/demo-pg/
      endpointURL: https://s3.amazonaws.com  # or MinIO URL
      s3Credentials:
        accessKeyId:
          name: s3-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: s3-creds
          key: ACCESS_SECRET_KEY
      wal:
        compression: gzip
        maxParallel: 4
      data:
        compression: gzip
    retentionPolicy: "30d"

Scheduled Backups

You define a ScheduledBackup resource to take base backups on a cron schedule:

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: demo-pg-daily
spec:
  schedule: "0 3 * * *"    # Daily at 3 AM
  cluster:
    name: demo-pg
  backupOwnerReference: self

Point-in-Time Recovery (PITR)

To restore a cluster to a specific point in time, you create a new Cluster resource that bootstraps from a backup:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: demo-pg-restored
spec:
  instances: 3
  bootstrap:
    recovery:
      source: demo-pg
      recoveryTarget:
        targetTime: "2026-04-05T14:30:00Z"
  externalClusters:
    - name: demo-pg
      barmanObjectStore:
        destinationPath: s3://my-bucket/cnpg/demo-pg/
        s3Credentials:
          accessKeyId:
            name: s3-creds
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: s3-creds
            key: ACCESS_SECRET_KEY

CNPG will:

Find the most recent base backup before the target time.
Restore it.
Replay WAL segments up to exactly the target time.
Open the database for read-write access.
Create standbys from the new primary.

This is how you recover from accidental data deletion, schema mistakes, or application bugs.

Recovery Point and Recovery Time

RPO (Recovery Point Objective): With continuous WAL archiving, the maximum data loss is one WAL segment (16MB of changes). In practice, WAL archiving happens within seconds of segment completion. With synchronous replication to a standby, RPO is zero.
RTO (Recovery Time Objective): Depends on backup size and WAL volume. A small database restores in minutes. A multi-terabyte database with days of WAL could take hours.

9. Connection Pooling with PgBouncer

The Problem

PostgreSQL uses a process-per-connection model. Each client connection spawns a dedicated backend process on the server. These processes consume memory (typically 5-10MB each) and the cost of creating and destroying them is non-trivial.

In Kubernetes, where many microservices each maintain their own connection pools, the total connection count can grow quickly. A cluster with 20 microservices, each with a pool of 10 connections, means 200 PostgreSQL backends. Scale that with replicas and you hit max_connections limits fast.

The Pooler CRD

CNPG provides a built-in Pooler CRD that deploys PgBouncer in front of your PostgreSQL cluster:

apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
  name: demo-pg-pooler-rw
  namespace: cnpg-demo
spec:
  cluster:
    name: demo-pg
  instances: 2
  type: rw            # rw or ro
  pgbouncer:
    poolMode: transaction
    parameters:
      max_client_conn: "1000"
      default_pool_size: "25"

This creates a PgBouncer deployment with 2 replicas that proxies connections to the demo-pg cluster’s primary (because type: rw).

Pool Modes

Transaction pooling (transaction): Connections are returned to the pool after each transaction. This gives the best connection reuse. Most applications should use this mode.
Session pooling (session): Connections are held for the entire client session. Less efficient but required for features like prepared statements, advisory locks, or LISTEN/NOTIFY.
Statement pooling (statement): Connections are returned after each statement. Very aggressive. Only works for simple, stateless queries.

When to Use Pooling

Use the Pooler CRD when:

You have many microservices connecting to the same database.
Your total connection count approaches max_connections.
You see connection creation overhead in your latency metrics.
You want to decouple application connection limits from database connection limits.

You might skip pooling for:

Small deployments with few connections.
Applications that rely heavily on session-level features (prepared statements, temp tables, session variables).

10. Comparison: Plain Deployment vs StatefulSet vs CloudNativePG

Capability	Plain Deployment	StatefulSet	CloudNativePG
High Availability	None. Single pod.	Stable identities, but no HA logic. You write your own.	Built-in. Automatic failover with configurable standbys.
Automatic Failover	No. Pod restarts, but no standby promotion.	No. You need Patroni or custom scripts.	Yes. Detects failure, promotes standby, updates services. 5-15 seconds.
Replication	None.	None built-in. You configure pg_hba.conf, recovery.conf manually.	Streaming replication configured automatically. Sync or async.
Backups	Manual. CronJobs + pg_dump or custom scripts.	Manual. Same as Deployment.	Built-in. Continuous WAL archiving + base backups to object storage. PITR.
Scaling	Manual. Add more Deployments, configure replication yourself.	Scale replicas, but no replication setup.	Change `instances` count. Apply. Done.
Rolling Upgrades	Delete and recreate. Downtime.	Ordered rolling update, but no PostgreSQL-aware upgrade logic.	PostgreSQL-aware rolling updates. Standbys first, then switchover. Minimal downtime.
Credential Management	Manual Secret creation. Manual rotation.	Manual Secret creation. Manual rotation.	Auto-generated Secrets with URI, JDBC, pgpass. Integrated rotation.
Storage Management	Manual PVC lifecycle.	Automatic PVC per pod. Stable.	Automatic PVC per pod. Operator manages lifecycle and reattachment.
TLS	Manual certificate management.	Manual certificate management.	Auto-generated TLS certificates. Automatic rotation.
Monitoring	Manual. Deploy your own exporter.	Manual. Deploy your own exporter.	Built-in metrics exporter. PodMonitor creation via single flag.
Connection Routing	Single Service. No read/write split.	Single Service. No read/write split.	Three services: -rw (primary), -ro (standbys), -r (all).
Connection Pooling	Deploy PgBouncer yourself.	Deploy PgBouncer yourself.	Built-in Pooler CRD. Managed PgBouncer.
Operational Knowledge	All on you.	Pod identity on Kubernetes, everything else on you.	Encoded in the operator. Replication, failover, backup, recovery, upgrades.

The pattern is clear. A plain Deployment gives you a PostgreSQL process in a container. A StatefulSet gives you stable pod identities and persistent storage. CloudNativePG gives you a managed PostgreSQL cluster that handles the operational complexity that makes running databases in production hard.

CloudNativePG: Deep Dive

Table of Contents

1. Why CloudNativePG Exists

The Problem

How Operators Solve This

2. CNPG Architecture

The Operator Pattern

No StatefulSet Under the Hood

Instance Manager

How CNPG Differs from Other PostgreSQL Operators

3. The Cluster CRD Field by Field

The Demo Cluster

instances: 3

bootstrap.initdb

storage

resources

postgresql.parameters

monitoring

Important Fields Not in This Demo

backup and scheduledBackup

replica

affinity

certificates

superuserSecret

4. Replication Topology

Streaming Replication

WAL Shipping

Synchronous vs Asynchronous Replication

Promotion Decision

5. Failover Mechanics

Step 1: Failure Detection

Step 2: Leader Election

Step 3: Standby Promotion

Step 4: Service Endpoint Update

Step 5: Old Primary Recovery

Timeline

6. Services and Connection Routing

How Endpoint Updates Work

Why Applications Should Use Service DNS Names

Read/Write Splitting

7. Secrets and Credential Management

Auto-Generated Secrets

Secret Contents

Using Secrets in Applications

Credential Rotation

TLS Certificates

8. Backup and Recovery

How Backups Work

Configuring Backup

Scheduled Backups

Point-in-Time Recovery (PITR)

Recovery Point and Recovery Time

9. Connection Pooling with PgBouncer

The Problem

The Pooler CRD

Pool Modes

When to Use Pooling

10. Comparison: Plain Deployment vs StatefulSet vs CloudNativePG

Further Reading

`instances: 3`

`bootstrap.initdb`

`storage`

`resources`

`postgresql.parameters`

`monitoring`

`backup` and `scheduledBackup`

`replica`

`affinity`

`certificates`

`superuserSecret`