-
MIT 6.824: Lecture 14 - Optimistic Concurrency Control
·
9 min read
This lecture on optimistic concurrency control is based on a system called FaRM. FaRM is a main memory distributed computing platform that provides distributed transactions with strict serializability, high performance, durability and high availability by taking advantage of two hardware trends. Here, I explain how FaRM uses these techniques to perform faster and yield far greater throughput than Spanner for simple transactions.
mit-6.824 distributed-systems learning-diary -
MIT 6.824: Lecture 13 - Spanner
·
10 min read
Spanner is a rare example of a distributed database that supports externally consistent distributed transactions. Many other databases either choose not to implement distributed transactions at all, or opt for weaker consistency models because of the performance cost involved. In this post, we'll learn how Google's TrueTime API enables it to provide this guarantee at a good performance.
mit-6.824 distributed-systems learning-diary -
MIT 6.824: Lecture 12 - Distributed Transactions
·
5 min read
Distributed databases typically divide their tables into partitions spread across different servers which get accessed by many clients. In these databases, client transactions often span the different servers, as the transactions may need to read from various partitions. A distributed transaction is a database transaction which spans multiple servers. This post will detail how databases guarantee some ACID properties when executing distributed transactions.
mit-6.824 distributed-systems learning-diary -
MIT 6.824: Lecture 11 - Cache Consistency, Frangipani
·
6 min read
The ideal distributed file system would guarantee that all its users have coherent access to a shared set of files and be easily scalable. It would also be fault-tolerant and require minimal human administration. This post will cover how Frangipani approximates this ideal, with a focus on how it provides a consistent view of shared files while maintaining a cache for each user.
mit-6.824 distributed-systems learning-diary -
MIT 6.824: Lecture 10 - Cloud Replicated DB, Aurora
·
6 min read
Amazon Aurora is a distributed database service provided by AWS. The paper describes the considerations in building a database for the cloud and details how Aurora's architecture differs from many traditional databases today. This post will explain how traditional databases work and then highlight how Aurora provides great performance through quorum writes and by building a database around the log.
mit-6.824 distributed-systems learning-diary -
MIT 6.824: Lecture 9 - CRAQ
·
6 min read
Many distributed systems today sacrifice stronger consistency guarantees for the sake of greater availability and higher throughput. CRAQ, which stands for Chain Replication with Apportioned Queries, is a system designed to challenge this trade-off. CRAQ's approach differs from existing replication techniques we have seen so far, like in Raft. It improves on the original form of Chain Replication. This post will start by presenting the Chain Replication approach, before describing how CRAQ improves on it.
mit-6.824 distributed-systems learning-diary -
MIT 6.824: Lecture 8 - ZooKeeper
·
8 min read
Can the coordination of distributed systems be handled by a stand-alone general-purpose service? If so, what should the API of that service look like? In addition, can we improve the performance of a system by N times if we add N times replica servers? This post will focus on answering these questions using the ZooKeeper system as a case study.
mit-6.824 distributed-systems learning-diary -
MIT 6.824: Lectures 6 & 7 - Fault Tolerance(Raft)
·
14 min read
One common pattern in the previous systems we have discussed like MapReduce, GFS, and VMware FT is that they all rely on a single entity to make the key decisions. While this has the advantage of making it easier for the system to decide, the downside of this approach is that the entity is now a single point of failure. In this post, we'll learn how the Raft consensus algorithm solves this problem.
mit-6.824 distributed-systems learning-diary -
MIT 6.824: Lecture 5 - Go, Threads, and Raft
·
5 min read
This post will contain some examples of good and bad Go code, using them to show common mistakes that can be made when starting to build concurrent programs, and how those can be corrected. It will cover goroutines, mutexes, condition variables, and channels.
mit-6.824 distributed-systems learning-diary -
MIT 6.824: Lecture 4 - Primary/Backup Replication
·
10 min read
Replication is one way by which applications can be made to be more fault tolerant. Using the VMware FT system as a case study, we'll discuss the different ways in which replication can be implemented, the challenges associated with each approach, and some acceptable tradeoffs that can be made when implementing replication in a system.
mit-6.824 distributed-systems learning-diary