Kafka Vocab
What is Kafka?
- Why can’t we just label it as a message queue?
- “Apache Kafka is an open-source distributed event streaming platform.”
Vocab
- Producer: This is a client that publishes messages to Kafka. A producer sends data to Kafka brokers and more specifically, into Kafka topics.
- Consumer: A consumer is a client that consumes or reads data from Kafka. Consumers subscribe to one or more topics and consume published messages by pulling data from the brokers.
- Broker: In Kafka, a broker refers to a server in the Kafka cluster. Each broker holds multiple partitions of various topics and is designed to operate in a distributed environment, meaning Kafka clusters can span across multiple servers for fault-tolerance.
- Topic: Topics are categories or feeds to which messages are published. In Kafka, data is stored in topics. Topics are split into partitions for speed and scalability, and each message within a partition gets an incremental id, called an offset.
- Partition: Partitions are a way of dividing up the data within a topic. They allow for data within a topic to be split across multiple brokers in a Kafka cluster, enabling parallelism and fault-tolerance. Each partition can be replicated across multiple brokers.
- Record: A record is the name for data units that Kafka consumers and producers deal with. Each record consists of a key, a value, and a timestamp. The key and value are both byte arrays, and the optional key can be used for things like specifying the partition within the topic where the record should go.
- Offset: In the context of Kafka, an offset is a unique identifier of a record within a partition. It denotes the position of the consumer in the partition. It’s a sequential id number that is incremented for each record as it’s added to the partition. Importantly, Kafka retains all records in the partition for a configurable amount of time, so the same record can be read multiple times by re-positioning the consumer to an earlier offset.
Poison Pills
A “poison pill” refers to a record that can’t be processed due to various reasons such as incorrect format, size, content, or any unexpected condition that a consumer wasn’t designed to handle.
- Skip the Message: Implement logic that recognizes the poison pill and skips it, logging the problem for further investigation.
- Dead Letter Queue: Send the poison pill to a separate Kafka topic (or another storage system) called a Dead Letter Queue (DLQ). Engineers can later analyze DLQs to identify and address the root causes of the issues.
- Manual Intervention: Engineer skips the record manually using a CLI tool.
UI for Visualization
Written on September 3, 2023