{"id":947,"date":"2020-01-24T07:25:40","date_gmt":"2020-01-24T06:25:40","guid":{"rendered":"http:\/\/www.klaushaller.net\/?page_id=947"},"modified":"2020-01-24T10:04:32","modified_gmt":"2020-01-24T09:04:32","slug":"part-3-kafka-throughput-optimization-using-partitions-and-consumer-groups","status":"publish","type":"page","link":"https:\/\/www.klaushaller.net\/?page_id=947","title":{"rendered":"Apache Kafka Tutorial, Part 3"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">Part 3: Kafka Throughput Optimization using Partitions and Consumer Groups <\/h1>\n\n\n\n<p>Kafka is known for being able to process a heavy workload of messages and events. One way is to distribute the topics over various Kafka nodes. If you have 70 topics and 10 Kafka nodes, each node might be the leader for seven topics. Thus, Kafka can parallelize handling various topics while running a single Kafka cluster. In this section of the tutorial, we look on how to optimize the processing of the events of a single cluster using partitions and consumer groups.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1019\" src=\"http:\/\/www.klaushaller.net\/wp-content\/uploads\/2020\/01\/Kafka-Tutorial-Klaus-Haller-Part-3-1024x1019.png\" alt=\"\" class=\"wp-image-968\" srcset=\"https:\/\/www.klaushaller.net\/wp-content\/uploads\/2020\/01\/Kafka-Tutorial-Klaus-Haller-Part-3-1024x1019.png 1024w, https:\/\/www.klaushaller.net\/wp-content\/uploads\/2020\/01\/Kafka-Tutorial-Klaus-Haller-Part-3-300x300.png 300w, https:\/\/www.klaushaller.net\/wp-content\/uploads\/2020\/01\/Kafka-Tutorial-Klaus-Haller-Part-3-150x150.png 150w, https:\/\/www.klaushaller.net\/wp-content\/uploads\/2020\/01\/Kafka-Tutorial-Klaus-Haller-Part-3-768x764.png 768w, https:\/\/www.klaushaller.net\/wp-content\/uploads\/2020\/01\/Kafka-Tutorial-Klaus-Haller-Part-3-624x621.png 624w, https:\/\/www.klaushaller.net\/wp-content\/uploads\/2020\/01\/Kafka-Tutorial-Klaus-Haller-Part-3.png 1085w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Figure 3: Kafka platform using partitions, consumer groups, and replica features <\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">3.1 Kafka Partitions<\/h2>\n\n\n\n<p>Beside\ndistributing topics over various Kafka nodes, Kafka can optimize the throughput\nalso by distributing the workload of single topics over various Kafka nodes. This\nconcept is called Kafka <strong>partitions<\/strong>. Each partition is a kind of event\nqueue into which producers can write events and from which consumers can fetch events.\nThus, events of a single topic can be handled, e.g., by 10 or 50 Kafka nodes. However,\nthere is one aspect to be consider. All events of a partition are ordered, but\nthere is no order between events of different partitions. <\/p>\n\n\n\n<p>The number\nof partitions is defined on a per-topic level by setting the parameter during\nthe topic creation. Before submitting the command, you might want to check\nwhether your Zookeeper process and the three Kafka nodes\/brokers are still\nrunning.<\/p>\n\n\n\n<p>The command\nfor the topic creation is the following: <\/p>\n\n\n\n<p><code>kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 3 --partitions 2 --topic myPartitionsChannel<\/code><\/p>\n\n\n\n<p>Now let\u2019s\nhave a look how our platform looks like using the topics describe command:<\/p>\n\n\n\n<p><code>kafka-topics.bat --describe --zookeeper localhost:2181 --topic myPartitionsChannel<\/code><\/p>\n\n\n\n<p>The result\nshould look similar to that that:<\/p>\n\n\n\n<p><code>Topic: myPartitionsChannel&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; PartitionCount: 2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ReplicationFactor: 3&nbsp;&nbsp;&nbsp; Configs:<\/code><\/p>\n\n\n\n<p><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Topic: myPartitionsChannel&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Partition: 0&nbsp;&nbsp;&nbsp; Leader: 0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Replicas: 0,2,1 Isr: 0,2,1<\/code><\/p>\n\n\n\n<p><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;Topic: myPartitionsChannel&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Partition: 1&nbsp;&nbsp;&nbsp; Leader: 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Replicas: 1,0,2 Isr: 1,0,2<\/code><\/p>\n\n\n\n<p>We see that\nthe Kafka broker, that is the leader, differs for partition 0 and 1, allowing\nfor workload distribution if the brokers run on different virtual machines.<\/p>\n\n\n\n<p>In a new\ncommand shell, we start a consumer listening at partition 0 for the new topic:<\/p>\n\n\n\n<p><code>kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic myPartitionsChannel --partition 0 --from-beginning<\/code><\/p>\n\n\n\n<p>In another\nnew command shell, we start a consumer listening at partition 1:<\/p>\n\n\n\n<p><code>kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic myPartitionsChannel --partition 1 --from-beginning<\/code><\/p>\n\n\n\n<p>Finally, we\nstart \u2013 in yet another command shell \u2013 a new producer:<\/p>\n\n\n\n<p><code>kafka-console-producer.bat --broker-list localhost:9092 --topic myPartitionsChannel<\/code><\/p>\n\n\n\n<p>If we type\nin \u201cHello1!\u201d and press the return key, the message is displayed by only one of\nthe consumers. What happens if you send more messages such as \u201cHello2!\u201d,\n\u201cHello3!\u201d etc.?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3.2 Consumer Groups<\/h2>\n\n\n\n<p>Another\nconcept, closely related to partitions, are consumer groups. They are an\nefficient way to distribute the work processing the events delivered by the\nKafka infrastructure. Assume an image processing system. An \u201canalyze image\u201d\nservice might need 2 seconds per image. If there are around 1000 images per\nsecond coming from Kafka, one process cannot handle all the load. Here, the\nconcept of Kafka Consumer Groups helps.<\/p>\n\n\n\n<p>Up to now,\nthe rule was that each consumer reads all messages of the topic the consumer\nsubscribed to. If there are two consumers, each consumer reads all message.\nEach message is read twice. If two brokers form a consumer group, this is\ndifferent. A message is read only by one consumer of the group. This is perfect\nfor the image processing example. You start 500 image processing processes,\neach one is Kafka consumer, and all together they form one consumer group. So, every\ntime one of the image processing processes finishes the processing of one\nimage, it contacts the Kafka broker and fetches the next image.<\/p>\n\n\n\n<p>There is just one aspect to be considered: Each consumer of a consumer group can connect to one or more partitions. However, each partition delivers only events to one consumer of a consumer group. Thus, it does not make sense to have more consumers in a consumer group than there are partitions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3.3 Summary: Partitions,\nConsumption Groups, and Replica in Kafka<\/h2>\n\n\n\n<p>To prevent\nany confusion, I want to point out the differences between partitions and\nreplica:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Replica\nhelp if a Kafka broker crashes. They ensure that no data is lost. Also, one of\nthe replica nodes can take over the role as the leader interacting with\nconsumers and producers of topics. Thus, such crashes do not impact consumers\nand providers at all.<\/li><li>Partitions\nare measures for throughput of the Kafka platform. You distribute the events of\na topic over various Kafka so that there is less load on a single machine. This\nhelps if there are too many events to be handled by a single Kafka node.<\/li><li>Consumption\ngroups help when processes and applications outside of Kafka struggle to\nprocess all the events coming from the Kafka platform. Consumption groups ease\ndistributing and parallelizing such work. <\/li><\/ul>\n\n\n\n<p>In the next part of the tutorial (to be released in Februar 2020), we look at security configurations. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Part 3: Kafka Throughput Optimization using Partitions and Consumer Groups Kafka is known for being able to process a heavy workload of messages and events. One way is to distribute the topics over various Kafka nodes. If you have 70 topics and 10 Kafka nodes, each node might be the leader for seven topics. Thus, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":940,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-947","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.klaushaller.net\/index.php?rest_route=\/wp\/v2\/pages\/947","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.klaushaller.net\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.klaushaller.net\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.klaushaller.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.klaushaller.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=947"}],"version-history":[{"count":3,"href":"https:\/\/www.klaushaller.net\/index.php?rest_route=\/wp\/v2\/pages\/947\/revisions"}],"predecessor-version":[{"id":969,"href":"https:\/\/www.klaushaller.net\/index.php?rest_route=\/wp\/v2\/pages\/947\/revisions\/969"}],"up":[{"embeddable":true,"href":"https:\/\/www.klaushaller.net\/index.php?rest_route=\/wp\/v2\/pages\/940"}],"wp:attachment":[{"href":"https:\/\/www.klaushaller.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=947"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}