July 01, 2019

CAP Theorem

CAP theorem (also known as Eric Brewers theorem) states that we can only achieve at most two out of three guarantees for a database: Consistency, Availability and Partition Tolerance.

Consistency
Every read would get you the most recent write, means that all nodes in the network see the same data at the same time.

A system has consistency if a transaction starts with the system in a consistent state, and ends with the system in a consistent state.

Availability 
Every non-failing node returns a response for all read and write requests in a reasonable amount of time. It is a guarantee that every request receives a response about whether it was successful or failed. However it does not guarantee that a read request returns the most recent write. The more number of users a system can cater to better is the availability.

Availability is achieved by replicating the data across different machines. Where as consistency is achieved by updating several nodes before allowing further reads.

Achieving availability in a distributed system requires that the system remains operational 100% of the time. Every client gets a response, regardless of the state of any individual node in the system.

Partition Tolerance
Even if the connections between nodes are down, the other two (Availability and Consistency) promises, are kept. It is a guarantee that the system continues to operate despite arbitrary message loss or failure of part of the system. In other words, even if there is a network outage in the data centre and some of the computers are unreachable, still the system continues to perform.


A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data records are sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.

The CAP theorem categorizes systems into three categories:

  • CP (Consistent and Partition Tolerant):- A system that is consistent and partition tolerant but never available, isn't it confusing? Well, CP is referring to a category of systems where availability is sacrificed only in the case of a network partition.
  • CA (Consistent and Available):- CA systems are consistent and available systems in the absence of any network partition. Often a single node's DB servers are categorized as CA systems. Single node DB servers do not need to deal with partition tolerance and are thus considered CA systems. The only hole in this theory is that single node DB systems are not a network of shared data systems and thus do not fall under the preview of CAP.
  • AP (Available and Partition Tolerant):- These are systems that are available and partition tolerant but cannot guarantee consistency.

No comments:

Post a Comment