MIT 6.824 Lec3: GFS

1. Distributed Storage System

1.1 Fault Tolerance Storage System is hard to build

  1. high performance $\Rightarrow$ share data across servers
  2. many servers $\Rightarrow$ constant failures
  3. fault tolerance $\Rightarrow$ by replication
  4. replication $\Rightarrow$ potential inconsistencies
  5. strong consistency $\Rightarrow$ more interactions between servers $\Rightarrow$ low performance

Struggle between Consistency and Performance.

1.2 Ideal Consistency

Ideal Consistency: Behave as if single system.
Two main challenges: Concurrency and Failures.

Need Protocol to get a balance between Consistency and fault tolerance.

2. GFS

A sucuessful system and became a inspiration to subsequent famous distributed storage systems.

two uncommon features at the time:

  1. A single Master
  2. inconsistencies

2.1 Master

Master is the crucial control center, it holds (most of them in memory for rapid response to the bunch of clients):

  1. file name $\rightarrow$ array of chunk handles
  2. chunk handles $\rightarrow$ version number, list of chunk servers (for replication)
    chunk servers: 3 servers in total (including replications), 1 primary, others secondaries.
    primary: lease time
  3. log and checkpoints: Both of them are in a stable storage.
    GFS follow the rules that we will not response to the cilens before we write our operations record to the log. So when the master carshes, correct states can be reconstructed by replay the log (More precisely, replay all operations in the log after the last checkpoint).

20251118162331

20251118162732

作者

Zylll

发布于

2025-11-18

更新于

2025-11-18

许可协议