Apache Kafka¶
Apache Kafka is an open source project for a distributed publish-subscribe messaging system rethought as a distributed commit log.
Messages¶
Messages (records, events) are byte arrays (String, JSON, and Avro are among the most common formats). If a message has a key, Kafka (uses Partitioner) to make sure that all messages of the same key are in the same partition.
Topics¶
Kafka stores messages in topics that are partitioned and replicated across multiple brokers in a cluster.
Kafka Clients¶
Producers send messages to topics from which consumers read.
Language Agnostic¶
Kafka clients use binary protocol to talk to a Kafka cluster.
Consumer Groups¶
Consumers may be grouped in a consumer group with multiple consumers. Each consumer in a consumer group will read messages from a unique subset of partitions in each topic they subscribe to. Each message is delivered to one consumer in the group, and all messages with the same key arrive to the same consumer.
Durability¶
Kafka does not track which messages were read by consumers. Kafka keeps all messages for a finite amount of time, and it is consumers' responsibility to track their location per topic (offsets).