Timilearning

MIT 6.824: Lecture 9 - CRAQ

Many distributed systems today sacrifice stronger consistency guarantees for the sake of greater availability and higher throughput. CRAQ, which stands for Chain Replication with Apportioned Queries, is a system designed to challenge this tradeoff. CRAQ's approach differs from existing replication techniques we have seen so far, like in Raft. It improves on the original form of Chain Replication.

CRAQ is a distributed object-storage system that maintains strong consistency while still providing a high read throughput. Object-storage systems are better suited for applications that need flat namespaces, such as a key-value store. These are unlike file-based systems, which store data in a hierarchical directory structure.

This post will start by describing Chain Replication, before presenting how CRAQ improves on it.

Table of Contents

Chain Replication

Chain Replication is an approach to replicating data across multiple nodes that involves arranging the nodes in a chain of a defined length C. All the write operations from clients go to the head of the chain, which passes them down to the next node in the chain. When a node receives a write operation, it applies the write and passes it down to the next node in the chain until the write reaches the tail node.

The tail node handles all read operations. This is because a write will not reach the tail until all the other replicas in the chain have applied it. The write is marked as committed when it reaches the tail. Therefore, a read will only return committed values.

Figure 1: All reads in Chain Replication must be handled by the tail node, while all writes propagate down the chain from the head.

Figure 1 illustrates a sample chain of length four. As shown by the dashed lines in the figure, the tail sends an acknowledgment back to the head when it commits a write.

Chain replication achieves strong consistency

Since all reads go to the tail and the tail stores all the committed write operations, the tail can apply a total ordering over all the operations. There is no possibility of a client seeing stale data. Concurrent reads to the tail will always return the most up-to-date value.

Write operations are cheaper

Another advantage of chain replication is that the cost of writes is spread equally over all nodes. Unlike in primary/backup replication where the primary node transmits data to all its backups, each node in chain replication transmits only to its successor in the chain. The simulation results by the paper's authors showed that chain replication achieved competitive or superior write throughput when compared with primary/backup replication.

The tail is a bottleneck

The major downside of chain replication is that since all the reads must go to the tail, read throughput cannot scale linearly with chain size. This is a trade-off that this approach makes to guarantee strong consistency. If clients can read from intermediate nodes, there is a possibility of concurrent reads for the same value to different nodes seeing different writes as they are being passed down the chain.

CRAQ, which we will discuss next, helps to address this downside.

CRAQ

CRAQ (Chain Replication with Apportioned Queries) is a modification of chain replication that increases the read throughput by allowing any node in the chain to handle read requests, while still guaranteeing strong consistency. CRAQ works as follows:

Figure 2: Reads to clean objects in CRAQ can be completely handled by any node in the system.

In Figure 2, we see a CRAQ chain in the starting clean state. All the nodes will return the same value for any read request since they store an identical copy of the object. The nodes will remain in a clean state until they receive a write operation.

Figure 3: Reads to dirty objects in CRAQ can be received by any node, but require small version requests (dotted blue line) to the chain tail to properly serialize operations.

Figure 3 illustrates a dirty read situation where the successor node to the head makes a version query to the tail for its latest version number. The write request received at the head is still in propagation when the dirty read for key K comes in, which is why the node has multiple versions of the object.
The node then makes a version query to the tail node which returns V1 since that it's latest committed value. As a result, the dirty node returns the object value associated with the version number it gets from the tail. If a clean replica had received the read request, it would have returned its value immediately with no version query.

CRAQ offers throughput improvements over the standard chain replication in two different scenarios:

The performance evaluation of CRAQ described in the paper showed that its read throughput is higher than in chain replication under these workloads.

Consistency models on CRAQ

CRAQ supports the varying needs of applications by providing three consistency models:

CRAQ needs a configuration manager

Unlike Raft, the CRAQ protocol cannot prevent a split-brain by itself. It is only concerned with data replication and does not handle things like leader (or head) election in the event of partitions. To address this, CRAQ is usually coupled with a configuration manager like ZooKeeper to deal with managing the nodes that make up a chain and handling leader election for the chain.

To recover from failure, each node in a chain keeps track of its predecessor and successor, as well as the chain head and tail. When a head fails, its immediate successor becomes the new head. Similarly, the tail's immediate predecessor takes over when the tail fails. Intermediate nodes can also be replaced by adding a new node between two nodes like in a doubly-linked list.

The configuration manager manages the nodes that make up a chain and choosing the chain's head and tail.

One slow node can weaken the chain

The major downside to CRAQ (and the standard chain replication) is that it requires all the nodes in the chain to take part before any write can commit. This is unlike quorum systems like Raft and ZooKeeper that only need a majority of the nodes to participate.

The upshot of this is that CRAQ by itself is less fault-tolerant than Raft and ZooKeeper as the system's throughput can be severely affected by at least one slow node.

Conclusion

CRAQ is a straightforward approach to replication with minimal chit-chat compared to a system like Raft, with its downside being that it's not very fault-tolerant as it needs all the nodes in a chain to respond to writes. It will be interesting to explore a quorum-based approach to chain replication.

Further Reading

mit-6.824 distributed-systems learning-diary

To get notified when I write something new, you can subscribe to the RSS feed.

A small favour

Did you find anything I wrote confusing, outdated, or incorrect? Or do you have an answer to a question I posed here? Please let me know! Just write a few words below and I'll be sure to amend this post with your suggestions.

← Home