December 01, 2019

Kafka Part 8: Batch Size and linger.ms



What is a Producer Batch and Kafka’s batch size?
  • A producer writes messages to the Kafka, one-by-one. It waits for the messages that are being produced to Kafka. Then, it creates a batch and put the messages into it, until it becomes full. Then, send the batch to the Kafka. Such type of batch is known as a Producer Batch. 
  • We can say Kafka producers buffer unsent records for each partition. Size of these buffers is specified in the batch.size of config file. Once the buffer is full messages will be send.
  • The default batch size is 16KB, and the maximum can be anything. Large is the batch size, more is the compression, throughput, and efficiency of producer requests. The larger messages seem to be disproportionately delayed by small batch sizes.
  • The message size should not exceed the batch size. Otherwise, the message will not be batched. Also, the batch is allocated per partitions, so do not set it to a very high number.

How can you make batch.size dynamic in Kafka producers?
It's not possible. A producer object needs all properties before sending any message. You cannot modify its properties in middle of the job. We need to stop the producer, change the natch.size and start the producer again.

What is linger.ms?
  • Each Kafka topic contains one or more partitions. When a Kafka producer sends a record to a topic, it needs to decide which partition to send it to.
  • When we want to send several records to the same partition at around the same time, they can be sent as a batch. Processing each batch requires a bit of overhead, with each of the records inside the batch contributing to that cost. Records in smaller batches have a higher effective cost per record. Generally, smaller batches lead to more requests and queuing, resulting in higher latency.
  • By default, Kafka tries to send records as soon as possible. However, this behavior can be changed by using two properties linger.ms and batch.size. linger.ms defines how long the producer waits before sending the records to Kafka, whereas batch.size defines the maximum size of a batch that can be sent at a time. So, instead of sending a record as soon as possible, the producer will wait for linger.ms before sending the record.
  • The default for batch.size is 16,384 bytes, and the default of linger.ms is 0 milliseconds. Once batch.size is reached or at least linger.ms time has passed, the system will send the batch as soon as it is able.

-K Himaanshu Shuklaa..

No comments:

Post a Comment