Kafka Vocab

What is Kafka?

  • Why can’t we just label it as a message queue?
  • “Apache Kafka is an open-source distributed event streaming platform.”

Vocab

  • Producer: This is a client that publishes messages to Kafka. A producer sends data to Kafka brokers and more specifically, into Kafka topics.
  • Consumer: A consumer is a client that consumes or reads data from Kafka. Consumers subscribe to one or more topics and consume published messages by pulling data from the brokers.
  • Broker: In Kafka, a broker refers to a server in the Kafka cluster. Each broker holds multiple partitions of various topics and is designed to operate in a distributed environment, meaning Kafka clusters can span across multiple servers for fault-tolerance.
  • Topic: Topics are categories or feeds to which messages are published. In Kafka, data is stored in topics. Topics are split into partitions for speed and scalability, and each message within a partition gets an incremental id, called an offset.
  • Partition: Partitions are a way of dividing up the data within a topic. They allow for data within a topic to be split across multiple brokers in a Kafka cluster, enabling parallelism and fault-tolerance. Each partition can be replicated across multiple brokers.
  • Record: A record is the name for data units that Kafka consumers and producers deal with. Each record consists of a key, a value, and a timestamp. The key and value are both byte arrays, and the optional key can be used for things like specifying the partition within the topic where the record should go.
  • Offset: In the context of Kafka, an offset is a unique identifier of a record within a partition. It denotes the position of the consumer in the partition. It’s a sequential id number that is incremented for each record as it’s added to the partition. Importantly, Kafka retains all records in the partition for a configurable amount of time, so the same record can be read multiple times by re-positioning the consumer to an earlier offset.

Poison Pills

A “poison pill” refers to a record that can’t be processed due to various reasons such as incorrect format, size, content, or any unexpected condition that a consumer wasn’t designed to handle.

  • Skip the Message: Implement logic that recognizes the poison pill and skips it, logging the problem for further investigation.
  • Dead Letter Queue: Send the poison pill to a separate Kafka topic (or another storage system) called a Dead Letter Queue (DLQ). Engineers can later analyze DLQs to identify and address the root causes of the issues.
  • Manual Intervention: Engineer skips the record manually using a CLI tool.

UI for Visualization

Written on September 3, 2023