CloudNativePG: Deep Dive
A technical deep-dive into how CloudNativePG works, why it exists, and what happens under the hood when you deploy a PostgreSQL cluster on Kubernetes with CNPG.
This document complements the demo README. The README walks you through deploying and testing a cluster. This document explains the machinery behind it.
Table of Contents
Section titled “Table of Contents”- Why CloudNativePG Exists
- CNPG Architecture
- The Cluster CRD Field by Field
- Replication Topology
- Failover Mechanics
- Services and Connection Routing
- Secrets and Credential Management
- Backup and Recovery
- Connection Pooling with PgBouncer
- Comparison: Plain Deployment vs StatefulSet vs CloudNativePG
1. Why CloudNativePG Exists
Section titled “1. Why CloudNativePG Exists”The Problem
Section titled “The Problem”PostgreSQL is a stateful workload. Kubernetes was designed for stateless workloads. Running PostgreSQL on Kubernetes with standard primitives exposes several gaps.
A plain Deployment gives you:
- A single pod with a PVC. No replication. If the pod dies, Kubernetes restarts it, but the database is unavailable until the restart completes. There is no standby to promote.
- No automatic backups. You have to set up CronJobs, manage retention, handle WAL archiving yourself, and hope your restore process actually works.
- Manual credential management. You create Secrets by hand, wire them into environment variables, and rotate them manually.
- PVC lifecycle is your problem. If you delete the Deployment, the PVC might stick around. Or it might not. Depends on your reclaim policy.
A StatefulSet is slightly better:
- You get stable network identities (
pg-0,pg-1,pg-2) and ordered pod creation. - Each pod gets its own PVC automatically.
- But a StatefulSet knows nothing about PostgreSQL. It does not configure streaming replication. It does not detect a failed primary. It does not promote a standby. It does not update services to point to the new primary.
You end up writing custom scripts, sidecar containers, and init containers to glue PostgreSQL replication logic onto Kubernetes primitives. This is fragile, hard to test, and painful to maintain.
How Operators Solve This
Section titled “How Operators Solve This”The Kubernetes Operator pattern extends the API server with custom resources and controllers that encode domain-specific operational knowledge. An operator for PostgreSQL understands:
- How to bootstrap a new cluster with
initdb - How to configure streaming replication between a primary and standbys
- How to detect a failed primary and promote a standby
- How to update service endpoints so applications reconnect transparently
- How to manage credentials, certificates, backups, and restores
You declare what you want. The operator figures out how to get there and how to keep it there.
CloudNativePG (CNPG) is a CNCF Sandbox project that implements this pattern for PostgreSQL. It was designed from scratch for Kubernetes, as opposed to being a port of a pre-Kubernetes HA solution.
2. CNPG Architecture
Section titled “2. CNPG Architecture”The Operator Pattern
Section titled “The Operator Pattern”CNPG follows the standard Kubernetes operator architecture:
- Custom Resource Definitions (CRDs) extend the Kubernetes API with new types like
Cluster,Backup,ScheduledBackup, andPooler. - A controller-manager (
cnpg-controller-manager) runs as a Deployment in thecnpg-systemnamespace. It watches for changes to these custom resources. - Reconciliation loops continuously compare the desired state (what you declared in the CRD) with the actual state (what exists in the cluster) and take action to converge them.
When you apply a Cluster resource, the controller-manager:
- Creates pods (one per PostgreSQL instance)
- Creates PVCs for each pod
- Runs
initdbon the first pod (the primary) - Configures streaming replication on subsequent pods (the standbys)
- Creates services for routing (
-rw,-ro,-r) - Creates secrets with credentials
- Starts monitoring the health of all instances
If something drifts from the desired state (a pod dies, replication breaks, a new instance is needed), the reconciliation loop detects it and corrects it.
No StatefulSet Under the Hood
Section titled “No StatefulSet Under the Hood”This is a key design decision. CNPG does not use StatefulSets. It manages pods directly.
Why? StatefulSets impose ordering constraints and identity semantics that conflict with how PostgreSQL failover works. When a primary fails, CNPG needs to promote a specific standby immediately. It cannot wait for StatefulSet ordering rules. It needs full control over which pod has which role, which PVC is attached where, and which services point to which endpoints.
By managing pods directly, CNPG can:
- Promote any standby to primary without renaming or restarting pods
- Reattach PVCs to different pods during recovery
- Update service endpoints within seconds of a failover
- Perform rolling updates with fine-grained control over the order
Instance Manager
Section titled “Instance Manager”Each PostgreSQL pod runs an instance manager process (not an external sidecar). This is a Go binary that:
- Starts and supervises the PostgreSQL process
- Handles liveness and readiness probes
- Communicates status back to the controller-manager via pod annotations and conditions
- Manages local WAL archiving and restoration
- Handles graceful shutdown and pg_rewind operations
The instance manager runs as PID 1 in the container. PostgreSQL runs as a child process. This gives the instance manager full lifecycle control.
How CNPG Differs from Other PostgreSQL Operators
Section titled “How CNPG Differs from Other PostgreSQL Operators”There are three major PostgreSQL operators for Kubernetes:
| CloudNativePG | Crunchy PGO | Zalando postgres-operator | |
|---|---|---|---|
| HA mechanism | Built-in, no external dependency | Uses Patroni (etcd required) | Uses Patroni (etcd required) |
| Pod management | Direct pod management | StatefulSet-based | StatefulSet-based |
| Failover agent | Instance manager (in-process) | Patroni sidecar | Patroni sidecar |
| CNCF status | Sandbox project | Not CNCF | Not CNCF |
| WAL storage | Object storage (S3, GCS, Azure) | Object storage + PVC | Object storage (S3, GCS) |
| Connection pooling | Built-in Pooler CRD (PgBouncer) | Built-in PgBouncer | External |
| Declarative config | Single Cluster CRD | Multiple CRDs | Single postgresql CRD |
The biggest architectural difference is that CNPG does not depend on an external consensus store like etcd for leader election. It uses Kubernetes lease objects instead. This removes a significant operational dependency. Patroni-based solutions require a healthy etcd cluster for failover decisions. CNPG requires only a healthy Kubernetes API server, which you already have.
3. The Cluster CRD Field by Field
Section titled “3. The Cluster CRD Field by Field”The Demo Cluster
Section titled “The Demo Cluster”Here is the Cluster resource from this demo:
apiVersion: postgresql.cnpg.io/v1kind: Clustermetadata: name: demo-pg namespace: cnpg-demospec: instances: 3
bootstrap: initdb: database: app owner: app
storage: size: 1Gi
resources: requests: memory: 256Mi cpu: 100m limits: memory: 512Mi cpu: 500m
postgresql: parameters: shared_buffers: "128MB" log_statement: "all"
monitoring: enablePodMonitor: falseLet’s walk through each section.
instances: 3
Section titled “instances: 3”The total number of PostgreSQL instances. CNPG always creates exactly one primary. The rest are streaming replicas. With instances: 3, you get one primary and two standbys.
Changing this value and re-applying the manifest scales the cluster. Increasing it adds new standbys. Decreasing it removes standbys (never the primary). The operator handles replication setup for new instances automatically.
bootstrap.initdb
Section titled “bootstrap.initdb”Controls how the cluster is initialized on first creation. This runs initdb on the primary pod to create the PostgreSQL data directory.
bootstrap: initdb: database: app owner: appdatabase: The application database to create (in addition to the defaultpostgresdatabase).owner: The PostgreSQL role that owns this database. CNPG auto-generates a password for this user and stores it in a Kubernetes Secret named<cluster>-app.
Other initdb options not used in this demo:
dataChecksums: Enables data checksums for corruption detection (recommended for production).encoding: Character encoding (defaultUTF8).localeCType/localeCollate: Locale settings.postInitSQL: SQL statements to run after initialization.postInitApplicationSQL: SQL statements to run as the application user after initialization. Useful for creating tables, extensions, or seed data.import: Import data from an existing PostgreSQL database during bootstrap.
storage
Section titled “storage”storage: size: 1GiDefines the PVC for each instance’s PGDATA directory. CNPG creates one PVC per pod. The default storage class is used unless you specify one.
In production, you would typically also set:
storage: size: 50Gi storageClass: gp3-csi # or your preferred storage classCNPG also supports a separate walStorage section for placing WAL files on a different volume, which can improve I/O performance by separating WAL writes from data writes.
resources
Section titled “resources”resources: requests: memory: 256Mi cpu: 100m limits: memory: 512Mi cpu: 500mStandard Kubernetes resource requests and limits. These apply to the PostgreSQL container in each pod. A few production considerations:
- Memory limits matter a lot for PostgreSQL. If the PostgreSQL process exceeds the memory limit, the OOM killer terminates it. Set
shared_buffersandwork_memso that peak memory usage stays well within the limit. - CPU limits are debatable. Some teams remove CPU limits entirely to avoid throttling during query spikes. CPU requests are what matters for scheduling.
postgresql.parameters
Section titled “postgresql.parameters”postgresql: parameters: shared_buffers: "128MB" log_statement: "all"These map directly to postgresql.conf parameters. You can set any PostgreSQL configuration parameter here. CNPG applies them and handles pg_ctl reload or pod restart as needed (some parameters require a restart).
Common production parameters:
postgresql: parameters: shared_buffers: "1GB" effective_cache_size: "3GB" work_mem: "16MB" maintenance_work_mem: "256MB" max_connections: "200" wal_level: "logical" # if you need logical replication log_statement: "ddl" log_min_duration_statement: "1000" # log slow queries > 1smonitoring
Section titled “monitoring”monitoring: enablePodMonitor: falseWhen set to true, CNPG creates a PodMonitor resource that Prometheus can scrape. Each PostgreSQL pod exposes metrics on port 9187 via the built-in exporter. This is disabled in the demo because minikube does not typically have the Prometheus Operator installed.
In production with OpenShift, you would set this to true and the built-in monitoring stack picks up the metrics automatically.
Important Fields Not in This Demo
Section titled “Important Fields Not in This Demo”The demo manifest is intentionally minimal. Here are fields you will encounter in production clusters.
backup and scheduledBackup
Section titled “backup and scheduledBackup”Configures continuous backup to object storage. See Section 8 for details.
spec: backup: barmanObjectStore: destinationPath: s3://my-bucket/cnpg-backups/ s3Credentials: accessKeyId: name: backup-creds key: ACCESS_KEY_ID secretAccessKey: name: backup-creds key: ACCESS_SECRET_KEY retentionPolicy: "30d"replica
Section titled “replica”Configures a replica cluster, which is a full read-only copy of another CNPG cluster. Used for disaster recovery across regions or clusters.
spec: replica: enabled: true source: primary-clusteraffinity
Section titled “affinity”Controls pod scheduling to spread instances across nodes and availability zones.
spec: affinity: enablePodAntiAffinity: true topologyKey: kubernetes.io/hostname podAntiAffinityType: requiredWith required anti-affinity, Kubernetes will not schedule two instances on the same node. This is critical for HA, because a node failure should take out at most one instance.
certificates
Section titled “certificates”CNPG auto-generates TLS certificates for client and replication connections by default. You can provide your own CA or certificates.
spec: certificates: serverCASecret: my-ca-secret serverTLSSecret: my-tls-secret clientCASecret: my-client-ca-secretsuperuserSecret
Section titled “superuserSecret”By default, CNPG disables the postgres superuser for security. If you need superuser access, you can enable it and provide or let CNPG generate the secret.
spec: enableSuperuserAccess: true superuserSecret: name: my-superuser-secret4. Replication Topology
Section titled “4. Replication Topology”Streaming Replication
Section titled “Streaming Replication”CNPG uses PostgreSQL’s built-in streaming replication. This is the same replication mechanism used by PostgreSQL outside of Kubernetes. Nothing exotic.
The primary accepts writes and generates WAL (Write-Ahead Log) records. Standbys connect to the primary via a replication connection and continuously stream WAL records. Each standby replays the WAL records to keep its data directory in sync with the primary.
The replication connection uses a dedicated replication slot for each standby. Replication slots prevent the primary from discarding WAL segments that a standby has not yet received. This guarantees no data loss during temporary standby outages, at the cost of WAL accumulation on the primary if a standby is down for a long time.
WAL Shipping
Section titled “WAL Shipping”In addition to streaming replication (which is a direct TCP connection), CNPG can archive WAL segments to object storage (S3, GCS, Azure Blob). This serves two purposes:
- Point-in-time recovery (PITR): You can restore the database to any point in time by replaying WAL from a base backup.
- Standby bootstrap: New standbys can be created from the archived WAL instead of needing a full
pg_basebackupfrom the primary.
When object storage is configured, the primary continuously archives completed WAL segments. Standbys first restore from object storage, then switch to streaming for the most recent WAL.
Synchronous vs Asynchronous Replication
Section titled “Synchronous vs Asynchronous Replication”By default, CNPG uses asynchronous replication. The primary does not wait for standbys to confirm WAL receipt before committing a transaction. This gives you the best write performance, but in a failure scenario, the most recently committed transactions (typically sub-second) on the primary might not have reached any standby yet.
For workloads that cannot tolerate any data loss, CNPG supports synchronous replication:
spec: postgresql: parameters: synchronous_commit: "on" minSyncReplicas: 1 maxSyncReplicas: 2With synchronous replication, the primary waits for at least minSyncReplicas standbys to confirm WAL receipt before reporting a transaction as committed. This guarantees zero data loss (RPO=0) at the cost of write latency, because every commit requires a network round-trip to a standby.
Promotion Decision
Section titled “Promotion Decision”When the primary fails, the operator must choose which standby to promote. CNPG selects the standby with the most up-to-date WAL position (the one with the least replication lag). This minimizes data loss. If multiple standbys are at the same position, topology preferences (node, zone) may influence the choice.
5. Failover Mechanics
Section titled “5. Failover Mechanics”This is the core value proposition of CNPG. Here is what happens, step by step, when the primary pod dies.
Step 1: Failure Detection
Section titled “Step 1: Failure Detection”The controller-manager continuously monitors all instances. It checks:
- Pod conditions (is the pod running?)
- Instance manager health endpoint (is the PostgreSQL process healthy?)
- Replication status (is the instance replicating?)
The Kubernetes kubelet also performs liveness probes. If the PostgreSQL process inside the pod is unresponsive, the kubelet restarts the container. If the node itself fails, the pod enters a Terminating or Unknown state.
CNPG’s detection is fast. In the demo, when you kubectl delete pod the primary, the controller-manager notices within seconds because Kubernetes immediately reports the pod deletion event through its watch mechanism.
Step 2: Leader Election
Section titled “Step 2: Leader Election”The controller-manager evaluates all surviving standbys. It queries each standby’s WAL receive position (the pg_last_wal_receive_lsn() function) and selects the one that is most up to date.
Unlike Patroni-based solutions, this decision does not require an external consensus store. The controller-manager is the single decision-maker. It uses Kubernetes Lease objects for its own leader election (in case there are multiple controller-manager replicas), but the PostgreSQL promotion decision is made directly by the controller.
Step 3: Standby Promotion
Section titled “Step 3: Standby Promotion”The controller-manager instructs the chosen standby’s instance manager to promote. The instance manager calls pg_promote(), which takes the standby out of recovery mode and makes it a full read-write primary.
This is fast. PostgreSQL promotion typically completes in under a second.
Step 4: Service Endpoint Update
Section titled “Step 4: Service Endpoint Update”The controller-manager updates the Endpoints (or EndpointSlices) for the -rw service to point to the new primary pod. It also updates the -ro service to remove the promoted pod (since it is no longer a standby) and re-adds it to the -r (read-any) service.
Applications connected through the -rw service DNS name will have their next connection attempt routed to the new primary. Existing TCP connections to the old primary will be broken, so applications need connection retry logic. This is standard database client behavior.
Step 5: Old Primary Recovery
Section titled “Step 5: Old Primary Recovery”When the old primary pod comes back (either because Kubernetes restarts it on the same node, or because a new pod is scheduled), it cannot simply rejoin as a standby. Its data directory was ahead of the current standbys at the time of failure, and it may contain WAL that was never replicated.
The instance manager uses pg_rewind to rewind the old primary’s data directory to the point where it diverged from the new primary. It then starts PostgreSQL in standby mode, connecting to the new primary for streaming replication.
If pg_rewind fails (typically because the divergence is too large or WAL is missing), the instance manager falls back to a full pg_basebackup from the new primary.
Timeline
Section titled “Timeline”In the demo, the full failover sequence, from primary deletion to a new primary accepting writes, typically completes in 5 to 15 seconds. Most of that time is Kubernetes pod lifecycle overhead, not CNPG or PostgreSQL.
6. Services and Connection Routing
Section titled “6. Services and Connection Routing”CNPG automatically creates three Kubernetes services for each cluster. The demo cluster demo-pg gets:
| Service | DNS Name | Targets | Purpose |
|---|---|---|---|
demo-pg-rw | demo-pg-rw.cnpg-demo.svc | Primary only | Writes, DDL, transactions |
demo-pg-ro | demo-pg-ro.cnpg-demo.svc | Standbys only | Read-heavy queries, reporting |
demo-pg-r | demo-pg-r.cnpg-demo.svc | All instances | Reads that tolerate slight staleness |
How Endpoint Updates Work
Section titled “How Endpoint Updates Work”These services do not use label selectors to find pods. Instead, CNPG manages the Endpoint objects directly. The controller-manager explicitly sets which pod IPs appear in each service’s Endpoints.
During failover:
- The old primary’s IP is removed from
demo-pg-rwEndpoints. - The new primary’s IP is added to
demo-pg-rwEndpoints. - The new primary’s IP is removed from
demo-pg-roEndpoints (it is no longer a standby). - The old primary’s IP is added to
demo-pg-roEndpoints when it comes back as a standby.
This happens atomically from the application’s perspective. The service DNS name stays the same. The underlying IP changes.
Why Applications Should Use Service DNS Names
Section titled “Why Applications Should Use Service DNS Names”Never hard-code pod IPs or pod hostnames in application connection strings. Pods are ephemeral. Their IPs change across restarts. Their hostnames are only meaningful within the context of a StatefulSet (which CNPG doesn’t use).
Use the service DNS names:
# For writespostgresql://app:password@demo-pg-rw.cnpg-demo.svc:5432/app
# For readspostgresql://app:password@demo-pg-ro.cnpg-demo.svc:5432/appThis is exactly what the client pod in the demo does:
apiVersion: v1kind: Podmetadata: name: pg-client namespace: cnpg-demospec: containers: - name: psql image: postgres:16-alpine command: ["sleep", "infinity"]The client pod connects to demo-pg-rw:5432 by DNS name. When failover happens, the next connection attempt automatically goes to the new primary. No application changes needed.
Read/Write Splitting
Section titled “Read/Write Splitting”The separate -rw and -ro services make read/write splitting straightforward at the application level. Send writes to -rw, send reads to -ro. Many frameworks (Django, Rails, Spring) have built-in support for multiple database connections.
The -r service (all instances) is useful when you want maximum read throughput and can tolerate reading from the primary or any standby. It load-balances across all instances.
7. Secrets and Credential Management
Section titled “7. Secrets and Credential Management”Auto-Generated Secrets
Section titled “Auto-Generated Secrets”When you create a Cluster, CNPG automatically generates Kubernetes Secrets for database credentials. For the demo cluster demo-pg, it creates:
demo-pg-app: Credentials for the application user (app), which is the owner of theappdatabase.demo-pg-superuser: Credentials for thepostgressuperuser (only ifenableSuperuserAccess: true).
Secret Contents
Section titled “Secret Contents”Each secret contains multiple keys for convenience:
| Key | Example Value |
|---|---|
username | app |
password | <auto-generated> |
host | demo-pg-rw.cnpg-demo.svc |
port | 5432 |
dbname | app |
uri | postgresql://app:pass@demo-pg-rw.cnpg-demo.svc:5432/app |
jdbc-uri | jdbc:postgresql://demo-pg-rw.cnpg-demo.svc:5432/app?... |
pgpass | demo-pg-rw.cnpg-demo.svc:5432:app:app:pass |
The host field points to the -rw service by default. The uri and jdbc-uri fields are ready-to-use connection strings.
Using Secrets in Applications
Section titled “Using Secrets in Applications”You can mount these secrets as environment variables in your application pods:
env: - name: DATABASE_URL valueFrom: secretKeyRef: name: demo-pg-app key: uriOr mount the entire secret as a volume and read the files.
Credential Rotation
Section titled “Credential Rotation”To rotate credentials:
- Update the Secret (either manually or via an external secret manager).
- CNPG detects the change and updates the PostgreSQL role’s password to match.
- Applications reading the Secret will get the new credentials on their next Secret refresh.
For zero-downtime rotation, use two application users and rotate them alternately. CNPG does not do automatic periodic rotation out of the box, but it integrates with external secret managers that do.
TLS Certificates
Section titled “TLS Certificates”CNPG generates TLS certificates for all connections by default. Both client-to-server and replication connections use TLS. The certificates are stored in Secrets and rotated automatically. You can bring your own CA if needed for integration with corporate PKI.
8. Backup and Recovery
Section titled “8. Backup and Recovery”Backup and recovery require object storage (S3, MinIO, GCS, Azure Blob Storage). This is beyond the scope of the minikube demo, but the concepts are important to understand.
How Backups Work
Section titled “How Backups Work”CNPG uses Barman Cloud under the hood, a set of Python tools from the Barman project (the standard PostgreSQL backup solution).
There are two components to backup:
- Base backups: A full copy of the PostgreSQL data directory, compressed and uploaded to object storage. These are taken periodically (e.g., daily or weekly).
- Continuous WAL archiving: Every completed WAL segment (16MB by default) is immediately uploaded to object storage. This captures every change between base backups.
Together, a base backup plus all WAL segments since that backup allow you to restore to any point in time.
Configuring Backup
Section titled “Configuring Backup”spec: backup: barmanObjectStore: destinationPath: s3://my-bucket/cnpg/demo-pg/ endpointURL: https://s3.amazonaws.com # or MinIO URL s3Credentials: accessKeyId: name: s3-creds key: ACCESS_KEY_ID secretAccessKey: name: s3-creds key: ACCESS_SECRET_KEY wal: compression: gzip maxParallel: 4 data: compression: gzip retentionPolicy: "30d"Scheduled Backups
Section titled “Scheduled Backups”You define a ScheduledBackup resource to take base backups on a cron schedule:
apiVersion: postgresql.cnpg.io/v1kind: ScheduledBackupmetadata: name: demo-pg-dailyspec: schedule: "0 3 * * *" # Daily at 3 AM cluster: name: demo-pg backupOwnerReference: selfPoint-in-Time Recovery (PITR)
Section titled “Point-in-Time Recovery (PITR)”To restore a cluster to a specific point in time, you create a new Cluster resource that bootstraps from a backup:
apiVersion: postgresql.cnpg.io/v1kind: Clustermetadata: name: demo-pg-restoredspec: instances: 3 bootstrap: recovery: source: demo-pg recoveryTarget: targetTime: "2026-04-05T14:30:00Z" externalClusters: - name: demo-pg barmanObjectStore: destinationPath: s3://my-bucket/cnpg/demo-pg/ s3Credentials: accessKeyId: name: s3-creds key: ACCESS_KEY_ID secretAccessKey: name: s3-creds key: ACCESS_SECRET_KEYCNPG will:
- Find the most recent base backup before the target time.
- Restore it.
- Replay WAL segments up to exactly the target time.
- Open the database for read-write access.
- Create standbys from the new primary.
This is how you recover from accidental data deletion, schema mistakes, or application bugs.
Recovery Point and Recovery Time
Section titled “Recovery Point and Recovery Time”- RPO (Recovery Point Objective): With continuous WAL archiving, the maximum data loss is one WAL segment (16MB of changes). In practice, WAL archiving happens within seconds of segment completion. With synchronous replication to a standby, RPO is zero.
- RTO (Recovery Time Objective): Depends on backup size and WAL volume. A small database restores in minutes. A multi-terabyte database with days of WAL could take hours.
9. Connection Pooling with PgBouncer
Section titled “9. Connection Pooling with PgBouncer”The Problem
Section titled “The Problem”PostgreSQL uses a process-per-connection model. Each client connection spawns a dedicated backend process on the server. These processes consume memory (typically 5-10MB each) and the cost of creating and destroying them is non-trivial.
In Kubernetes, where many microservices each maintain their own connection pools, the total connection count can grow quickly. A cluster with 20 microservices, each with a pool of 10 connections, means 200 PostgreSQL backends. Scale that with replicas and you hit max_connections limits fast.
The Pooler CRD
Section titled “The Pooler CRD”CNPG provides a built-in Pooler CRD that deploys PgBouncer in front of your PostgreSQL cluster:
apiVersion: postgresql.cnpg.io/v1kind: Poolermetadata: name: demo-pg-pooler-rw namespace: cnpg-demospec: cluster: name: demo-pg instances: 2 type: rw # rw or ro pgbouncer: poolMode: transaction parameters: max_client_conn: "1000" default_pool_size: "25"This creates a PgBouncer deployment with 2 replicas that proxies connections to the demo-pg cluster’s primary (because type: rw).
Pool Modes
Section titled “Pool Modes”- Transaction pooling (
transaction): Connections are returned to the pool after each transaction. This gives the best connection reuse. Most applications should use this mode. - Session pooling (
session): Connections are held for the entire client session. Less efficient but required for features like prepared statements, advisory locks, orLISTEN/NOTIFY. - Statement pooling (
statement): Connections are returned after each statement. Very aggressive. Only works for simple, stateless queries.
When to Use Pooling
Section titled “When to Use Pooling”Use the Pooler CRD when:
- You have many microservices connecting to the same database.
- Your total connection count approaches
max_connections. - You see connection creation overhead in your latency metrics.
- You want to decouple application connection limits from database connection limits.
You might skip pooling for:
- Small deployments with few connections.
- Applications that rely heavily on session-level features (prepared statements, temp tables, session variables).
10. Comparison: Plain Deployment vs StatefulSet vs CloudNativePG
Section titled “10. Comparison: Plain Deployment vs StatefulSet vs CloudNativePG”| Capability | Plain Deployment | StatefulSet | CloudNativePG |
|---|---|---|---|
| High Availability | None. Single pod. | Stable identities, but no HA logic. You write your own. | Built-in. Automatic failover with configurable standbys. |
| Automatic Failover | No. Pod restarts, but no standby promotion. | No. You need Patroni or custom scripts. | Yes. Detects failure, promotes standby, updates services. 5-15 seconds. |
| Replication | None. | None built-in. You configure pg_hba.conf, recovery.conf manually. | Streaming replication configured automatically. Sync or async. |
| Backups | Manual. CronJobs + pg_dump or custom scripts. | Manual. Same as Deployment. | Built-in. Continuous WAL archiving + base backups to object storage. PITR. |
| Scaling | Manual. Add more Deployments, configure replication yourself. | Scale replicas, but no replication setup. | Change instances count. Apply. Done. |
| Rolling Upgrades | Delete and recreate. Downtime. | Ordered rolling update, but no PostgreSQL-aware upgrade logic. | PostgreSQL-aware rolling updates. Standbys first, then switchover. Minimal downtime. |
| Credential Management | Manual Secret creation. Manual rotation. | Manual Secret creation. Manual rotation. | Auto-generated Secrets with URI, JDBC, pgpass. Integrated rotation. |
| Storage Management | Manual PVC lifecycle. | Automatic PVC per pod. Stable. | Automatic PVC per pod. Operator manages lifecycle and reattachment. |
| TLS | Manual certificate management. | Manual certificate management. | Auto-generated TLS certificates. Automatic rotation. |
| Monitoring | Manual. Deploy your own exporter. | Manual. Deploy your own exporter. | Built-in metrics exporter. PodMonitor creation via single flag. |
| Connection Routing | Single Service. No read/write split. | Single Service. No read/write split. | Three services: -rw (primary), -ro (standbys), -r (all). |
| Connection Pooling | Deploy PgBouncer yourself. | Deploy PgBouncer yourself. | Built-in Pooler CRD. Managed PgBouncer. |
| Operational Knowledge | All on you. | Pod identity on Kubernetes, everything else on you. | Encoded in the operator. Replication, failover, backup, recovery, upgrades. |
The pattern is clear. A plain Deployment gives you a PostgreSQL process in a container. A StatefulSet gives you stable pod identities and persistent storage. CloudNativePG gives you a managed PostgreSQL cluster that handles the operational complexity that makes running databases in production hard.