Kafka: The Architecture of a Distributed Log

Part 2 — Brokers, topics, replication, consumer groups, offsets, performance, and the full read/write path.

1. From First Principles to Architecture

In the previous article, we arrived at a system with the following properties:

Append-only log
Partitioned for scalability
Replicated for fault tolerance
Consumers track their own position
Consumers pull data
System stores data for replay

Now we turn these ideas into an actual system design.

At a high level, Kafka is a distributed system that stores and serves logs.

2. Core Components of Kafka

We now define the main building blocks.

Component	What it does
Producer	Writes data to Kafka
Broker	Kafka server that stores data
Topic	Logical stream of data
Partition	Physical log
Consumer	Reads data
Consumer Group	Group of consumers sharing work
Offset	Position in the log
Leader	Partition replica handling reads/writes
Follower	Replica copying leader
ISR	In-sync replicas
Controller	Manages leader election
Log Segment	Files on disk

Each component exists to solve a specific problem. At a glance:

Producers & consumers — the clients that write and read events
Brokers — the servers that actually store data
Topics — named streams so teams can share contracts without sharing code
Partitions — the unit of parallelism and ordering
Leaders, followers & ISR — who serves traffic and which replicas are in sync
Controller — coordinates leader election when brokers fail
Log segments — how records are stored efficiently on disk

3. Topics and Partitions — The Unit of Scalability

A topic is a logical stream of events (e.g., orders, payments, logs). But the real system is built around partitions.

Important idea:

Partition = ordered, append-only log stored on disk.

Each partition:

Lives on one broker (leader)
Has replicas on other brokers
Is written sequentially
Is read sequentially
Guarantees ordering only within the partition

This leads to a very important trade-off:

Property	Kafka behavior
Ordering	Only within partition
Parallelism	Across partitions
Scalability	By adding partitions

Partitions are one of Kafka’s most important design decisions. Here is the trade-off in plain terms.

What you gain:

Horizontal scaling — spread load across brokers
Parallel processing — many consumers at once
Fault isolation — a hot partition is contained

What you give up:

Global ordering across the whole topic — ordering is only guaranteed inside a single partition

4. Producers — How Data Is Written

When a producer sends a message to Kafka, it must decide: which partition should this message go to?

Common strategies:

Round robin → load balancing
Key-based partitioning → same key goes to same partition (preserves ordering per key)
Custom partitioner

Why key-based partitioning matters: all events for user_id=123 go to the same partition, which preserves ordering for that user.

This is how Kafka handles ordering at scale: Kafka does not guarantee global order. Kafka guarantees order per key (via partition).

5. Brokers — The Storage Layer

A broker is a server that stores partitions, handles reads and writes, replicates data, and serves consumers. Kafka clusters typically have multiple brokers.

Broker	Partitions
Broker 1	Topic A — P0, P1
Broker 2	Topic A — P2, P3
Broker 3	Topic A — P4, P5

This spreads storage, network load, and CPU load — how Kafka scales storage horizontally.

6. Replication — Handling Failures

Each partition has one leader and multiple followers.

Partition 0:
Leader → Broker 1
Follower → Broker 2
Follower → Broker 3

Write flow: producer writes to leader → leader writes to disk → followers replicate → leader waits for ISR acknowledgment → message is committed.

ISR (in-sync replicas) are replicas fully caught up with the leader. Kafka only acknowledges writes when replicas in ISR confirm. Producers control durability with acks:

acks=0 → fire and forget
acks=1 → leader only
acks=all → leader + ISR (safest)

acks	Latency	Durability
0	Lowest	Data loss possible
1	Medium	Leader crash → possible loss
all	Highest	Safest

This is a classic latency vs durability trade-off.

7. Consumers and Consumer Groups — The Unit of Work Scaling

Consumers read data from partitions. The real power comes from consumer groups.

Rule: one partition is consumed by only one consumer in a group.

Partitions	Consumers
P0	Consumer 1
P1	Consumer 2
P2	Consumer 3
P3	Consumer 1

Parallelism is capped by partitions — not by how many consumers you run. Three situations:

Partitions = consumers — each consumer gets one partition; you use the cluster evenly
Consumers > partitions — extra consumers have nothing to do (they stay idle)
Partitions > consumers — each consumer may read several partitions until work is balanced

So max useful parallelism in a group is at most one consumer per partition.

Partitions determine max consumer parallelism — one of the most important Kafka scaling concepts.

8. Offsets — Why Kafka Is Different From Traditional Queues

Traditional queues track which messages were consumed and delete them. Kafka does something different: Kafka does not track what was consumed — consumers track their own offset.

That simplifies the broker, enables replay, allows multiple consumer groups, and improves scalability. An offset is just a number in the log.

Offset 0 → Order Created
Offset 1 → Payment Completed
Offset 2 → Order Shipped
Offset 3 → Order Delivered

If a consumer crashes at offset 2, it restarts from offset 2. That gives replay, fault tolerance, and at-least-once processing (with correct commit semantics).

9. Why Kafka Is So Fast

Kafka is fast because it uses disk efficiently, not because it avoids disk.

Reason 1 — Sequential disk writes

Sequential writes are much faster than random writes. Kafka appends to the log — no random updates, no heavy indexes on the hot path — so disk stays fast.

Reason 2 — Page cache

Kafka relies on the OS page cache: writes hit memory first, the OS flushes to disk, and reads often come from memory. Kafka uses disk like memory with persistence.

Reason 3 — Zero copy (sendfile)

Data can go from disk to network without extra copies in application memory (disk → kernel → network), which improves throughput.

Reason 4 — Batching

Producers, brokers, and consumers batch work — fewer network calls, disk operations, and system calls. Batching is a major driver of high throughput.

Reason 5 — Partition parallelism

More partitions → more parallel reads and writes → more throughput (until you hit hardware limits).

10. Rebalancing — When Consumers Join or Leave

When a consumer in a group starts, stops, crashes, or scales up/down, Kafka reassigns partitions — rebalancing.

During rebalance, consumers stop processing, partitions are reassigned, then consumers resume. Frequent rebalances hurt stability — a major operational issue tied to ECS autoscaling, Kubernetes scaling, and rolling deployments.

11. The Complete Data Flow

Write path:

Producer → Partition Leader → Disk → Replication → Commit

Read path:

Consumer → Pull → Partition Leader → Read from log → Return → Commit Offset

That is Kafka’s core lifecycle.

12. Final Mental Model

Traditional system	Kafka
Database	Stores current state
Queue	Stores tasks
Kafka	Stores event history

The table is a shortcut. The fuller picture is simple:

Kafka is a distributed commit log — an ordered history, not just a pipe
Producers append events to that log
Consumers read from the log at their own pace
The log is split into partitions, copied for safety, and spread across brokers

If that mental model clicks, everything else in Kafka is mostly how those pieces are implemented and operated.

Conclusion — Logs as the Backbone of Modern Systems

Kafka is not just a messaging system. It is a distributed storage system optimized for sequential writes and parallel reads.

Its design rests on a few key ideas:

Logs instead of queues
Partitions for scalability
Replication for fault tolerance
Consumer groups for parallel processing
Offsets for replay
Pull model for backpressure
Batching and sequential I/O for performance

Once these ideas are understood, Kafka’s architecture becomes logical, not magical.