12 minutes read

Nowadays, both big tech companies and small startups create highly-loaded platforms that process a huge amount of simultaneous requests from millions of users around the globe 24/7. One of the ways to build such platforms is to leverage the advantages of distributed systems, that consist of multiple nodes communicating over the network, and provide fault tolerance and high availability. If you have ever worked with databases or microservices, you must have heard about NoSQL storages, Distributed Queues, or Search Engines, which are good examples of such systems.

However, distributed systems have their limitations. In this topic, you will learn a few fundamental limitations formulated in the CAP theorem by Eric Brewer in 2000. Although this theorem could be applied to any kind of distributed system, it is most often referred to in the context of NoSQL databases.

Essentials of the CAP theorem

The CAP theorem establishes a trade-off between consistency (C), availability (A), and partition tolerance (P) in a distributed system. Let's consider all these three characteristics in detail:

  • consistency means that every client request receives the most recent data or an error;
  • availability means that every client request receives a response, but the result may contain stale data;
  • partition tolerance means that the distributed system continues working even when network outages between nodes occure.

The CAP theorem states that a distributed system can have at most two of the three following characteristics— consistency, availability, and partition tolerance—at any given time.

The following picture illustrates the main idea of the CAP theorem with the help of Venn diagrams.

Cap theorem diagram

As you can see, a system can't have all three characteristics at once. Moreover, the CAP theorem separates distributed systems into three classes CA, CP, and AP based on their characteristics. We will review the classes a bit later.

The term "consistency" in the CAP theorem doesn't equal the term "consistency" in the ACID acronym used in relational databases.

The main choice in case of partitioning

To consolidate your understanding of the theorem, we will demonstrate a simple example of a distributed system consisting of two connected nodes. Once a network outage occurs, the system gets separated into two independent parts, and the nodes cannot notify each other about data changes. The main question is whether we should keep accepting requests from clients by both parts or not.

Client node connection

According to the CAP theorem, there are only two possible options for this system:

  • preserve the consistency of data: only one part keeps accepting client requests, while another part becomes unavailable to preserve consistency , so some clients will have errors;
  • preserve the availability of all nodes: both parts keep accepting client requests, but the data may not be consistent because the parts operate independently during the partition.

In the case of network partitioning, "keep consistency, but lose availability" and "keep availability, but lose consistency" is the crucial choice for a distributed system according to the CAP theorem. However, if there is no partitioning, the system can have both characteristics at once—at least in theory, not always in practice.

You need to choose between consistency and availability only when partitioning occurs. Moreover, as soon as the network outage is fixed, the distributed system can get into the consistent and available state again through various approaches. But this case is not a part of the CAP theorem.

Classes of distributed systems

One of the CAP theorem implications is classification of distributed systems according to the characteristics they can provide:

  • CP systems provide consistency and partition tolerance ("consistent design"). Such system always returns the most recent data in responses, but some nodes won't respond if there is network partitioning. An example of a CP system is MongoDB.
  • AP systems provide availability and partition tolerance ("available design"). When a partition occurs, all nodes of an AP system respond to client requests, but the data is not always up-to-date. Once the partition is resolved, the data will eventually become consistent. An example of an AP system is Apache Cassandra.
  • CA systems always return consistent data and all nodes are available when there are no partitions. However, if there is a partition, the system becomes completely unavailable because of the CAP theorem limitations ("pick only two"). Some examples of CA are traditional relational databases such as PostgreSQL or MySQL.

In practice, when developing a large-scale distributed system, you need to choose between CP and AP because using CA means either:

  • the system consists of only a single node (no fault tolerance if the node fails);
  • the system is not partition-tolerant and cannot work at all in the case of network partitioning.

Moreover, many real distributed systems are just P according to the CAP theorem, which means that they are not fully consistent and not 100% available during network partitioning.

Drawbacks of the CAP theorem

Despite being well-structured in theory, the CAP theorem doesn't always describe the real cases well. And this is the reason for frequent criticism of this theorem. Let's consider the most important practical drawbacks of the CAP theorem you should understand and consider in your everyday practice.

  • Degrees of consistency and availability: Most real distributed systems cannot provide even two of the three characteristics. Instead, we may say about some degrees of consistency and availability. For example, it is quite common to have partial availability when a node of a distributed storage cannot update data but can return a result. Moreover, availability is usually not something binary, it can vary from 0 to 100 percent.
  • System configurations: Many real distributed systems provide lots of configurations and additional tools that can change the type of the systems according to the CAP theorem classification. So, when we classify a distributed system, we must consider how a particular design with specific configurations and algorithms reacts to partitions.
  • No latency characteristic: A significant practical drawback of this theorem is that the availability doesn't guarantee that a client receives a response in a timely manner (but eventually it should happen).

The PACELC theorem

To deal with the absence of latency in the CAP theorem, in 2010, Daniel J. Abadi formulated a new theorem named PACELC. This theorem extends the CAP theorem by taking into account the latency (L) characteristic.

The PACELC theorem states that in the case of network partitioning (P), you should choose between availability (A) and consistency (C) (like in the CAP theorem), but else (E), you should choose between latency (L) and consistency (C).

This theorem can be shortly described as the following:

if P then (A or C) else (L or C)

Just like CAP, this theorem also separates distributed systems into different classes. We won't spend a lot of time discussing the PACELC here. To learn more, you may read an article about PACELC by JetBrains and find more information on the Internet.

Conclusion

So far, you have learned a lot about the CAP theorem, which can be applied to distributed systems in general, not only databases. Even if the CAP theorem doesn't cover all possible practical cases, it helps us understand the fundamental trade-off between availability and consistency in the case of network partitions.

One of the implications of the theorem is the classification of distributed systems according to the characteristics they can provide: CP, AP, CA. You don't always need to strictly follow this classification, but it doesn't hurt to understand the idea behind every class and know a few examples.

We also learned the significant practical drawbacks of the CAP theorem and then pointed out an extension of the theorem named PACELC. It adds the latency characteristic to the basic CAP theorem and makes it a bit closer to the practical needs.

9 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo