Event-Driven Architecture (Kafka): Deep Dive
This demo runs Apache Kafka and Zookeeper as StatefulSets in Kubernetes. A producer sends timestamped messages to a Kafka topic every 5 seconds. A consumer reads those messages in real time. The producer and consumer do not know about each other. Kafka sits between them, handling storage, ordering, and delivery.
This document explains why event-driven architecture exists, how Kafka works internally, how it differs from traditional message queues, and what it takes to run Kafka reliably on Kubernetes.
Why Event-Driven Architecture
Section titled “Why Event-Driven Architecture”In the previous demo, the backend serves API requests synchronously. A client sends a request. The backend responds. Done. This works until it does not.
Synchronous communication creates temporal coupling. Both parties must be online at the same time. If the backend is down, the request fails. If the backend is slow, the client waits.
Event-driven architecture removes that coupling. The producer publishes an event. The broker stores it. The consumer reads it when ready. If the consumer is offline, events accumulate in the broker. When the consumer comes back, it catches up.
This pattern is essential for:
- Order processing pipelines where multiple systems react to a single event
- Audit logging where events must be durable and ordered
- Real-time analytics where high-volume data streams from many sources
- Integration between systems owned by different teams
Kafka Architecture
Section titled “Kafka Architecture”Kafka is a distributed commit log. It is not a traditional message queue. Understanding that distinction is key.
Brokers
Section titled “Brokers”A Kafka cluster consists of one or more brokers. Each broker is a server that stores data and serves clients. This demo runs a single broker:
apiVersion: apps/v1kind: StatefulSetmetadata: name: kafka namespace: kafka-demospec: serviceName: kafka replicas: 1 selector: matchLabels: app: kafka template: metadata: labels: app: kafka spec: containers: - name: kafka image: docker.io/bitnami/kafka:3.6 ports: - containerPort: 9092 name: kafka env: - name: KAFKA_CFG_ZOOKEEPER_CONNECT value: zookeeper:2181 - name: KAFKA_CFG_LISTENERS value: PLAINTEXT://:9092 - name: KAFKA_CFG_ADVERTISED_LISTENERS value: PLAINTEXT://kafka-0.kafka.kafka-demo.svc.cluster.local:9092 - name: KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE value: "true" - name: ALLOW_PLAINTEXT_LISTENER value: "yes"In production, you run 3 or more brokers for fault tolerance. Data is replicated across brokers so that no single failure loses messages.
Topics
Section titled “Topics”A topic is a named stream of events. This demo uses a topic called
events. The producer writes to it. The consumer reads from it.
Topics are created automatically because KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE
is set to true. In production, you usually create topics explicitly with
a defined number of partitions and a replication factor:
kafka-topics.sh --create --topic orders \ --partitions 3 --replication-factor 1 \ --bootstrap-server localhost:9092Partitions
Section titled “Partitions”A topic is divided into partitions. Each partition is an ordered, immutable sequence of messages. Partitions are the unit of parallelism in Kafka.
With one partition, messages are strictly ordered. With three partitions, Kafka distributes messages across them. Ordering is guaranteed within a partition but not across partitions.
Topic: events Partition 0: [msg1, msg4, msg7, ...] Partition 1: [msg2, msg5, msg8, ...] Partition 2: [msg3, msg6, msg9, ...]Each partition lives on one broker (with replicas on other brokers). Adding partitions allows more consumers to read in parallel, but you cannot reduce the number of partitions after creation.
Consumer Groups
Section titled “Consumer Groups”Consumers belong to a group. Kafka assigns each partition to exactly one consumer in the group. If you have 3 partitions and 3 consumers in the same group, each consumer reads from one partition.
If a consumer crashes, Kafka reassigns its partition to another consumer in the group. This is called rebalancing.
The consumer in this demo uses kafka-console-consumer.sh with
--from-beginning, which reads all messages from the start of the topic.
In production, consumers track their position using offsets and resume from
where they left off.
Offsets
Section titled “Offsets”Each message in a partition has a numeric offset. Offset 0 is the first message. Offset 1 is the second. Offsets are never reused.
Consumers commit their current offset to Kafka. If a consumer restarts, it reads from the last committed offset. This is how Kafka provides at-least-once delivery: the consumer processes a message, then commits its offset. If it crashes before committing, it reprocesses the message after restart.
The Producer
Section titled “The Producer”The producer sends a message every 5 seconds:
containers:- name: producer image: docker.io/bitnami/kafka:3.6 command: - /bin/bash - -c - | echo "Waiting for Kafka to be ready..." sleep 30 echo "Starting producer, sending to topic 'events'..." count=0 while true; do count=$((count + 1)) message="event-${count}: order-placed at $(date '+%Y-%m-%d %H:%M:%S')" echo "$message" | kafka-console-producer.sh \ --broker-list kafka-0.kafka.kafka-demo.svc.cluster.local:9092 \ --topic events echo "[producer] Sent: $message" sleep 5 doneTwo things to notice. First, the producer uses the StatefulSet DNS name
kafka-0.kafka.kafka-demo.svc.cluster.local to reach the broker. This is
stable across pod restarts. Second, the producer sleeps 30 seconds before
starting to give Kafka time to initialize.
Producer Semantics
Section titled “Producer Semantics”In production, producers care deeply about delivery guarantees.
acks=0. The producer sends the message and does not wait for acknowledgment. Fastest but messages can be lost. Use for metrics and logs where occasional loss is acceptable.
acks=1. The producer waits for the partition leader to acknowledge. If the leader crashes before replication, the message is lost. Good balance of speed and reliability.
acks=all. The producer waits for all in-sync replicas to acknowledge. Slowest but no data loss as long as at least one replica survives. Required for financial transactions, audit logs, and anything you cannot afford to lose.
Idempotent producer. Kafka supports idempotent writes. The producer
assigns a sequence number to each message. If a network retry causes a
duplicate, Kafka deduplicates it. Enable with enable.idempotence=true.
Exactly-once semantics. Kafka’s transactional API lets a producer write to multiple partitions atomically. Combined with idempotent writes, this provides exactly-once within Kafka. Crossing system boundaries still requires application-level idempotency.
The Consumer
Section titled “The Consumer”The consumer reads from the events topic:
containers:- name: consumer image: docker.io/bitnami/kafka:3.6 command: - /bin/bash - -c - | echo "Waiting for Kafka to be ready..." sleep 40 echo "Starting consumer, reading from topic 'events'..." kafka-console-consumer.sh \ --bootstrap-server kafka-0.kafka.kafka-demo.svc.cluster.local:9092 \ --topic events \ --from-beginningThe --from-beginning flag tells the consumer to start at offset 0. Without
it, the consumer only reads new messages. The 40-second sleep ensures Kafka
and the producer are running before the consumer starts.
Consumer Semantics
Section titled “Consumer Semantics”At-most-once. Commit the offset before processing. If processing fails, the message is skipped. No duplicates, but messages can be lost.
At-least-once. Process the message, then commit the offset. If the consumer crashes after processing but before committing, it reprocesses the message. No lost messages, but duplicates are possible. This is Kafka’s default behavior.
Exactly-once. Use Kafka’s transactional consumer API. More complex and slower but eliminates both loss and duplication.
Most systems use at-least-once and design idempotent consumers. Processing the same order twice should produce the same result.
Topic Partitioning and Ordering
Section titled “Topic Partitioning and Ordering”Ordering guarantees depend on partitioning. A single partition gives total ordering but limits throughput to one consumer per group. Multiple partitions allow parallel consumption but only guarantee order within each partition.
To maintain order for related events, use a message key. Kafka hashes the key to select a partition. All messages with the same key go to the same partition:
Key: order-123 -> hash -> Partition 1Key: order-456 -> hash -> Partition 0Key: order-123 -> hash -> Partition 1 (same partition)This demo does not use keys. Messages distribute round-robin.
Zookeeper’s Role
Section titled “Zookeeper’s Role”Zookeeper manages Kafka’s cluster metadata:
apiVersion: apps/v1kind: StatefulSetmetadata: name: zookeeper namespace: kafka-demospec: serviceName: zookeeper replicas: 1 selector: matchLabels: app: zookeeper template: metadata: labels: app: zookeeper spec: containers: - name: zookeeper image: docker.io/bitnami/zookeeper:3.9 ports: - containerPort: 2181 name: client env: - name: ALLOW_ANONYMOUS_LOGIN value: "yes"Zookeeper tracks which brokers are alive, which broker leads each partition, topic configuration, and access control lists.
Kafka connects to Zookeeper via the KAFKA_CFG_ZOOKEEPER_CONNECT environment
variable:
env:- name: KAFKA_CFG_ZOOKEEPER_CONNECT value: zookeeper:2181KRaft: Removing Zookeeper
Section titled “KRaft: Removing Zookeeper”Starting with Kafka 3.3, Kafka can run in KRaft mode, where brokers manage their own metadata using an internal Raft consensus protocol. KRaft eliminates the operational burden of running Zookeeper separately: fewer moving parts, fewer failure modes, faster metadata operations.
This demo uses Zookeeper because it is simpler for learning. In production with Kafka 3.6+, KRaft is the recommended approach.
Kafka on Kubernetes
Section titled “Kafka on Kubernetes”Running Kafka on Kubernetes requires careful attention to storage and networking. Both Kafka and Zookeeper need stable identities and persistent storage. That is why they run as StatefulSets.
Why StatefulSets
Section titled “Why StatefulSets”A Deployment creates pods with random names. If a pod restarts, it gets a new name. Kafka needs stable names because the advertised listener address must be predictable:
- name: KAFKA_CFG_ADVERTISED_LISTENERS value: PLAINTEXT://kafka-0.kafka.kafka-demo.svc.cluster.local:9092kafka-0 is the stable name that the StatefulSet guarantees. If the pod
restarts, it comes back as kafka-0. Clients reconnect to the same address.
With a Deployment, the pod might restart as kafka-7f8b6c4d5-xq2pn. The
advertised listener would be wrong, and clients could not reconnect.
Headless Services
Section titled “Headless Services”Both StatefulSets use headless Services (clusterIP: None):
apiVersion: v1kind: Servicemetadata: name: kafka namespace: kafka-demospec: selector: app: kafka ports: - port: 9092 targetPort: 9092 name: kafka clusterIP: NoneA headless Service does not get a ClusterIP. DNS queries return the IP addresses of individual pods. This lets clients connect directly to specific brokers, which is essential because each broker holds different partitions.
The DNS name kafka-0.kafka.kafka-demo.svc.cluster.local resolves directly
to the kafka-0 pod’s IP. With a regular Service, the DNS name would
resolve to the Service’s ClusterIP, and kube-proxy would route to a random
pod. That breaks Kafka’s client-to-broker protocol.
Storage Considerations
Section titled “Storage Considerations”Both StatefulSets use volumeClaimTemplates for persistent storage:
volumeClaimTemplates:- metadata: name: kafka-data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 2GiEach pod gets its own PersistentVolumeClaim. If kafka-0 restarts, it
reattaches to the same PVC and recovers its data. In production, size the
PVC based on retention and throughput. Kafka defaults to 7-day retention.
At 1 MB/sec, that is about 600 GB. Use SSDs for latency-sensitive
workloads.
Resource Requirements
Section titled “Resource Requirements”Kafka uses the OS page cache heavily. More memory means more messages served from cache. This demo allocates 512Mi-1Gi. In production, give brokers 6-8 GB. Set the JVM heap to one-third of available memory and leave the rest for the page cache.
Kafka vs Message Queues
Section titled “Kafka vs Message Queues”Kafka is often compared to RabbitMQ and Redis Streams. They solve related but different problems.
RabbitMQ
Section titled “RabbitMQ”RabbitMQ is a traditional message broker using exchanges and queues. It deletes messages after consumer acknowledgment, pushes messages to consumers (Kafka consumers pull), and does not support replay. Choose RabbitMQ for complex routing patterns, strict process-once semantics, or moderate throughput (thousands of messages per second).
Redis Streams
Section titled “Redis Streams”Redis Streams provide a lightweight append-only log in memory. Lower throughput than Kafka (tens of thousands vs millions per second), simpler to operate, and good enough when you already run Redis and your volume is low. Redis Streams lack Kafka’s mature consumer group rebalancing and configurable retention.
When to Choose Kafka
Section titled “When to Choose Kafka”Kafka excels at high throughput (100k+ messages per second), durable storage with configurable retention, multi-consumer fan-out, message replay, and ordering guarantees within partitions.
Strimzi Operator
Section titled “Strimzi Operator”Managing Kafka by hand works for learning. In production, the Strimzi operator automates lifecycle management with CRDs for clusters, topics, users, and connectors:
apiVersion: kafka.strimzi.io/v1beta2kind: Kafkametadata: name: my-clusterspec: kafka: replicas: 3 storage: type: persistent-claim size: 100Gi zookeeper: replicas: 3 storage: type: persistent-claim size: 10GiStrimzi handles rolling upgrades, certificate management, and rack-aware replication. It is the recommended approach for Kafka on OpenShift.
Event Sourcing and CQRS
Section titled “Event Sourcing and CQRS”Kafka’s durable, ordered log enables two advanced architectural patterns.
Event Sourcing
Section titled “Event Sourcing”Traditional systems store current state. Event sourcing stores every state
change as an event: OrderCreated, PaymentReceived, OrderShipped. The
current state is derived by replaying all events. Kafka is a natural fit
because its log is append-only, ordered, and durable. The events topic
in this demo is a simple version of this pattern.
Benefits: complete audit trail, ability to rebuild state by replay, temporal queries, and decoupled consumers reacting independently.
Trade-offs: increased storage, complex schema evolution, and eventual consistency between the event log and derived state.
CQRS (Command Query Responsibility Segregation)
Section titled “CQRS (Command Query Responsibility Segregation)”CQRS separates the write model from the read model. The write side publishes events to Kafka. The read side consumes those events and builds optimized read models: denormalized tables, search indexes, materialized views.
Example: the order service writes OrderCreated events. A reporting
consumer builds a daily summary. A search consumer updates Elasticsearch.
A notification consumer sends emails. Each builds its own view.
CQRS works when read and write patterns differ significantly. It adds complexity, so it is not appropriate for simple CRUD applications.
Decoupling in Practice
Section titled “Decoupling in Practice”The most important thing this demo illustrates is decoupling. Stop the consumer. The producer keeps sending. Messages accumulate in Kafka. Restart the consumer. It catches up.
kubectl scale deployment consumer --replicas=0 -n kafka-demo# Wait 30 secondskubectl scale deployment consumer --replicas=1 -n kafka-demokubectl logs -f deploy/consumer -n kafka-demoThe producer did not fail. It did not slow down. It did not even notice. Kafka absorbed the difference. This is the core value proposition of event-driven architecture.
In a synchronous system, the consumer going offline causes the producer to fail (or at least timeout). In an event-driven system, the broker provides temporal decoupling. Services can operate on different schedules, at different speeds, with independent uptime.
Where to Go Next
Section titled “Where to Go Next”- Microservices Platform covers the service decomposition patterns that produce the events Kafka carries.
- API Gateway shows how Kong handles synchronous north-south traffic, the complement to Kafka’s asynchronous east-west messaging.