Skip to Content

Gfs Paper Notes

The Google File System

Different point against traditional distributed file system

  • component failures is normal
  • files are huge
  • most files are mutated by appending new data rather than overwriting
  • flexibale API

Assumptions for design overview

  • often fail
  • large files
  • workloads
    1. large streaming reads
    2. small random reads
    3. large, sequential writes
    4. modify is rare
  • semantics for multiple clients concurrently append
  • High bandwith than low latency

API

  • usual operation
  • snapshot
    1. COW
  • record append

Architecture

  • single master, multiple chunkservers accessed by multiple clients
  • files are divided into fixed-size chunks, identified by an unique chunk handle, also replicated
    1. Chunk size: 64MB
  • master maintaains all file system metadata
    1. namespace
    2. access control information
    3. mapping from files to chunks
    4. current location of chunks
    5. chunk lease management
    6. gc
    7. chunk migration
    8. Heart beats with chunkservers
  • Not POSIX API
  • master just transmit metadata, all data-bearing communication goes to chunkservers
  • No cache for file data

gfs-architecture

Metadata in master

  • Three major type
    1. namespace(persisted, replicated)
    2. mapping from files to chunks(persisted, replicated)
    3. location of each chun\’s replicas(ask for information)
  • All in memory
  • chunk location
    1. poll at start

Operation log

Consistency Model

  • Weak Consistency
  • Strong Consistency

gfs-consisteny

implicaiton for application level

  • atomically rename
  • checkpoint
  • checksum and unqiue identifier for padding and rare depulication

Leases

  • primary, one of the replications

gfs-dataflow

Atomic Record Appends

  • successful record append is defined while intervening regions are undefined
  • at-least-once, with predictable magic number and unique IDs

Master Operation

  • Namespace and locking if it involves d1/d2…/dn/leaf, it will acquire read-locks on the

directory names /d1, /d1/d2, …, d1/d2…/dn, and either a read lockor a write lockon the full pathname d1/d2…/dn/leaf

allows concurrent mutations in the same directory, multiple file creations can be executed concurrently in the same directory: each acquires a read lockon the directory name and a write lockon the file name

  • replica Plcaement
  • Creation, re-replica, rebalancing
    1. limit the number of “recent” creations on each chunkserver
    2. place new replicas on chunkservers with below-average diskspace utilization
    3. spread replicas of a chunkacross racks
  • GC
    1. master logs the deletion
    2. file is just renamed to a hidden name that includes the deletion timestamp
    3. During the master’s regular scan of the file system namespace, it removes any such hidden files if they have existed for more than three days

fault tolerent

  • fast recovery
  • master/chunk replications
    1. master shadow replica
  • checksum for broken chunk
comments powered by Disqus