Skip to content

CAP Theorem

CAP states that a distributed system can provide only two out of three guarantees: Consistency, Availability, and Partition Tolerance.

  1. Consistency (C): Every read from the database returns the most recent write. This means that all nodes see the same data at the same time.
  2. Availability (A): Every request receives a response (success or failure), even if some of the nodes are offline. The system is always accessible. This means that every request will receive a response no matter if it fails or succeeds (response guaranteed).
  3. Partition Tolerance (P): The system continues to function despite network partitions, meaning it can handle data being split across multiple locations with communication issues between them. Partition tolerance means that the system will continue to work even if there is message loss or service failure

We generally can only achieve either PC or PA.

Note that consistency in the context of the CAP theorem is quite different from the consistency guaranteed by ACID databases. Confusing, I know.

Here's the key insight that makes CAP theorem much simpler to reason about in interviews: In any distributed system, partition tolerance is a must. Network failures will happen, and your system needs to handle them.

This means that in practice, CAP theorem really boils down to a single choice: Do you prioritize consistency or availability when a network partition occurs?

Let's explore what this means through a practical example.

Example

Imagine you're running a website with two servers - one in the USA and one in Europe. When a user updates their public profile (let's say their display name), here's what happens:

  1. User A connects to their closest server (USA) and updates their name
  2. This update is replicated to the server in Europe
  3. When User B in Europe views User A's profile, they see the updated name

CAP Theorem

Everything works smoothly until we encounter a network partition - the connection between our USA and Europe servers goes down. Now we have a critical decision to make:

CAP Theorem (9)

When User B tries to view User A's profile, should we:

  • Option A: Return an error because we can't guarantee the data is up-to-date (choosing consistency)
  • Option B: Show potentially stale data (choosing availability)

Network Partition

This is where CAP theorem becomes practical - we must choose between consistency and availability.

In the case, the answer is rather clear: we would rather show a user in Europe the old name of User A, rather than show an error. Seeing a stale name is better than seeing no name at all.

When to Choose Consistency

Some systems absolutely require consistency, even at the cost of availability:

  1. Ticket Booking Systems: Imagine if User A booked seat 6A on a flight, but due to a network partition, User B sees the seat as available and books it too. You'd have two people showing up for the same seat!
  2. E-commerce Inventory: If Amazon has one toothbrush left and the system shows it as available to multiple users during a network partition, they could oversell their inventory.
  3. Financial Systems: Stock trading platforms need to show accurate, up-to-date order books. Showing stale data could lead to trades at incorrect prices.

When to Choose Availability

The majority of systems can tolerate some inconsistency and should prioritize availability. In these cases, eventual consistency is fine. Meaning, the system will eventually become consistent, but it may take a few seconds or minutes.

  1. Social Media: If User A updates their profile picture, it's perfectly fine if User B sees the old picture for a few minutes.
  2. Content Platforms (like Netflix): If someone updates a movie description, showing the old description temporarily to some users isn't catastrophic.
  3. Review Sites (like Yelp): If a restaurant updates their hours, showing slightly outdated information briefly is better than showing no information at all.

The key question to ask yourself is: "Would it be catastrophic if users briefly saw inconsistent data?" If the answer is yes, choose consistency. If not, choose availability.

CAP Theorem in System Design Interviews

Understanding CAP theorem matters because it should be one of the first things you discuss in a system design interview as it will have a meaningful impact on how you design your system.

In a system design interview, you typically begin by:

  1. Aligning on functional requirements (features)
  2. Defining non-functional requirements (system qualities)

When discussing non-functional requirements, CAP theorem should be your starting point. You need to ask the all important question: "Does this system need to prioritize consistency or availability?"

If you prioritize consistency, your design might include:

  • Distributed Transactions: Ensuring multiple data stores (like cache and database) remain in sync through two-phase commit protocols. This adds complexity but guarantees consistency across all nodes. This means users will likely experience higher latency as the system ensures data is consistent across all nodes.
  • Single-Node Solutions: Using a single database instance to avoid propagation issues entirely. While this limits scalability, it eliminates consistency challenges by having a single source of truth.
  • Technology Choices:
  • Traditional RDBMSs (PostgreSQL, MySQL)
  • Google Spanner
  • DynamoDB (in strong consistency mode)

On the other hand, if you prioritize availability, your design can include:

  • Multiple Replicas: Scaling to additional read replicas with asynchronous replication, allowing reads to be served from any replica even if it's slightly behind. This improves read performance and availability at the cost of potential staleness.
  • Change Data Capture (CDC): Using CDC to track changes in the primary database and propagate them asynchronously to replicas, caches, and other systems. This allows the primary system to remain available while updates flow through the system eventually.
  • Technology Choices:
  • Cassandra
  • DynamoDB (in multiple availability zone configuration)
  • Redis clusters

Conclusion

CAP theorem is important. It sets the stage for how you approach your design in an interview and should not be overlooked.

But it doesn't need to be complicated. Just ask yourself: "Does every read need to read the most recent write?" If the answer is yes, you need to prioritize consistency. If the answer is no, you can prioritize availability.

Reference

https://app.excalidraw.com/l/56zGeHiLyKZ/8ntWRaa0Q6K

https://www.youtube.com/watch?v=VdrEq0cODu4