Showing posts with label Zookeeper. Show all posts
Showing posts with label Zookeeper. Show all posts

December 01, 2019

Kafka Part 10: Implement Exactly Once Processing in Kafka

Let's say we are designing a system using Apache Kafka which will send some kind of messages from one system to another. While designing to need to consider below questions:
  • How do we guarantee all messages are processed?
  • How do we avoid/handle duplicate messages?
A timeout could occur publishing messages to Kafka. Our consumer process could run out of memory or crash while writing to a downstream database. Or may be our broker could run out of disk space, a network partition may form between ZooKeeper instances.

Kafka Part 9: Compression

Compression In Kafka
Data is send from producer to the Kafka in the text format, commonly called the JSON format. JSON has a demerit because data is stored in the string form and most of the time this creates several duplicated records to get stored in the Kafka topic. Which occupies much disk space. That's why we need compression.

Kafka Part 8: Batch Size and linger.ms



What is a Producer Batch and Kafka’s batch size?
  • A producer writes messages to the Kafka, one-by-one. It waits for the messages that are being produced to Kafka. Then, it creates a batch and put the messages into it, until it becomes full. Then, send the batch to the Kafka. Such type of batch is known as a Producer Batch. 
  • We can say Kafka producers buffer unsent records for each partition. Size of these buffers is specified in the batch.size of config file. Once the buffer is full messages will be send.
  • The default batch size is 16KB, and the maximum can be anything. Large is the batch size, more is the compression, throughput, and efficiency of producer requests. The larger messages seem to be disproportionately delayed by small batch sizes.
  • The message size should not exceed the batch size. Otherwise, the message will not be batched. Also, the batch is allocated per partitions, so do not set it to a very high number.

Kafka Part 7: Why ZooKeeper is always configured with odd number of nodes?

Let's understand a few basics:

ZooKeeper is a highly-available, highly-reliable and fault-tolerant coordination and consensus service for distributed applications like Apache Storm or Kafka. Highly-available and highly-reliable is achieved through replication.