December 01, 2019

Kafka Part 7: Why ZooKeeper is always configured with odd number of nodes?

Let's understand a few basics:

ZooKeeper is a highly-available, highly-reliable and fault-tolerant coordination and consensus service for distributed applications like Apache Storm or Kafka. Highly-available and highly-reliable is achieved through replication.

Total replication of the ZooKeeper database is performed on an ensemble (which is an array of hosts/nodes).

ZooKeeper always runs in a standalone or quorum mode. The minimum number of nodes in a ZK cluster that needs to be up and running, for a ZK cluster to work is known as Quorum.

Any update request from the client is considered safe if it has written to the minimum number of nodes equal to the quorum size. Zookeeper cluster will still be up and running, even if there are node failures as long as it can form a quorum (has a number of nodes equal to or greater than a quorum size).

How do we decide what is the safest and optimal size of a quorum? The Size of the Quorum = (No.of Nodes/2)+1.

Why (N/2)+1 and what is 'split-brain' problem?

Let’s say we have a 5 node cluster, in which 2 nodes in data centre 'DC1' and 3 nodes in a different data centre 'DC2'. Now let's say quorum size for this 5 node cluster is 2.

Because of some network failure the two data centres are not able to communicate with each other. And in this case we will have 2 clusters  with 2 nodes in DC1 and 3 nodes in DC2. Both the clusters will be able to form a quorum of size 2 nodes. Hence, both the quorums in the 2 different data centres start accepting write requests from clients.

Because of this there will be data inconsistencies between the servers in the two data centres, as the servers in one datacenter can’t communicate updates to other servers in different data centre. This leads to a common problem called 'split-brain' problem, where two or more subsets of the cluster function independently.

When the quorum size is (n/2 + 1) it will ensure that we do not have the split-brain problem and we can always achieve a majority consensus.

Hence, for the 5 node cluster to form a quorum the minimum nodes required must be (5/2)+1=3. In case of network failure,
DC1 nodes cannot form a quorum and hence can not accept any write requests.

Why odd number of nodes is configured?
Lets say our ZK has 5 nodes, in this case we need a minimum of 3 nodes for quorum and for zookeeper to keep serving the client request. For for a 5 nodes cluster, we can tolerate up to a failure of 2 nodes(5-3).

Lets say our ZK has 6 nodes, in this case we need a minimum of 3 nodes for quorum and for zookeeper to keep serving the client request. For for a 6 nodes cluster, we can tolerate up to a failure of 2 nodes(5-3).

Which means the extra node doesn’t add any tangible benefit for the cluster, that's why replicating to that one extra node is just a performance overhead.

-K Himaanshu Shuklaa..

No comments:

Post a Comment