WideArea Cooperative Storage with CFS - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

WideArea Cooperative Storage with CFS

Description:

Decreases the number of message exchanges to O(log N) DHash/Chord Interface ... Lookup(blockID) List of node-ID, IP address finger table with node IDs, IP address ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 27
Provided by: robert908
Category:

less

Transcript and Presenter's Notes

Title: WideArea Cooperative Storage with CFS


1
Wide-Area Cooperative Storage with CFS
  • Robert Morris
  • Frank Dabek, M. Frans Kaashoek,
  • David Karger, Ion Stoica
  • MIT and Berkeley

2
Target CFS Uses
node
node
node
Internet
node
node
  • Serving data with inexpensive hosts
  • open-source distributions
  • off-site backups
  • tech report archive
  • efficient sharing of music

3
How to mirror open-source distributions?
  • Multiple independent distributions
  • Each has high peak load, low average
  • Individual servers are wasteful
  • Solution aggregate
  • Option 1 single powerful server
  • Option 2 distributed service
  • But how do you find the data?

4
Design Challenges
  • Avoid hot spots
  • Spread storage burden evenly
  • Tolerate unreliable participants
  • Fetch speed comparable to whole-file TCP
  • Avoid O(participants) algorithms
  • Centralized mechanisms Napster, broadcasts
    Gnutella
  • CFS solves these challenges

5
Why Blocks Instead of Files?
  • Cost one lookup per block
  • Can tailor cost by choosing good block size
  • Benefit load balance is simple
  • For large files
  • Storage cost of large files is spread out
  • Popular files are served in parallel

6
The Rest of the Talk
  • Software structure
  • Chord distributed hashing
  • DHash block management
  • Evaluation

7
CFS Architecture
client
server
client
server
Internet
node
node
  • Each node is a client and a server (like xFS)
  • Clients can support different interfaces
  • File system interface
  • Music key-word search (like Napster and Gnutella)

8
Client-server interface
Insert file f
Insert block
FS Client
server
server
Lookup block
Lookup file f
node
node
  • Files have unique names
  • Files are read-only (single writer, many readers)
  • Publishers split files into blocks
  • Clients check files for authenticity

9
Server Structure
DHash
DHash
Chord
Chord
Node 1
Node 2
  • DHash stores, balances, replicates, caches
    blocks
  • DHash uses Chord SIGCOMM 2001 to locate blocks

10
Chord Hashes a Block ID to its Successor
N10
B112, B120, , B10
Block ID Node ID
N100
B100
Circular ID Space
N32
B11, B30
N80
B65, B70
N60
B33, B40, B52
  • Nodes and blocks have randomly distributed IDs
  • Successor node with next highest ID

11
Successor Lists Ensure Robust Lookup
10, 20, 32
N5
20, 32, 40
N10
5, 10, 20
N110
32, 40, 60
N20
110, 5, 10
N99
40, 60, 80
N32
N40
60, 80, 99
99, 110, 5
N80
N60
80, 99, 110
  • Each node stores r successors, r 2 log N
  • Lookup can skip over dead nodes to find blocks

12
Finger tables aids efficient lookup
  • For a m bit key, each node n has a finger table
    with m entries.
  • Key of each entry increases in the power of two.
  • Decreases the number of message exchanges to
    O(log N)

13
DHash/Chord Interface
Lookup(blockID)
List of ltnode-ID, IP addressgt
DHash
server
Chord
finger table with ltnode IDs, IP addressgt
  • lookup() returns list with node IDs closer in ID
    space to block ID
  • Sorted, closest first

14
Replicate blocks at r successors
N5
N10
N110
N20
N99
Block 17
N40
N80
N50
N60
N68
  • Node IDs are SHA-1 of IP Address
  • Ensures independent replica failure

15
DHash Copies to Caches Along Lookup Path
N5
N10
N110
1.
N20
N99
2.
N40
4.
RPCs 1. Chord lookup 2. Chord lookup 3. Block
fetch 4. Send to cache
N80
N50
3.
N60
N68
Lookup(BlockID45)
16
Caching at Fingers Limits Load
N32
  • Only O(log N) nodes have fingers pointing to N32
  • This limits the single-block load on N32

17
Load Balance with virtual nodes
N60
N10
N101
N5
Node B
Node A
  • Hosts may differ in disk/net capacity
  • Hosts may advertise multiple IDs
  • Chosen as SHA-1(IP Address, index)
  • Each ID represents a virtual node
  • Host load proportional to v.n.s
  • Manually controlled can be made adaptive

18
Quotas
  • Malicious injection of large quantities of data
    can use up all disk space
  • To prevent this, we have quotas for each
    publisher
  • Eg. Only 2 storage space for requests from
    particular IP address

19
Aging and Deletion
  • CFS deletes old blocks that have not been
    refreshed recently to prevent aging
  • Publishers need to refresh their blocks if they
    dont want it deleted by CFS

20
How things work?
  • Read Operation
  • To get the 1st block of /foo
  • get(public key) gt return Root-block
  • Read content-hash of foos inode
  • get(hash(foos inode)) gt return inode of foo
  • Read content-hash of 1st block from inode
  • get(hash(1st block)) gt return 1st block

21
Experimental Setup (12 nodes)
To vu.nl lulea.se ucl.uk
To kaist.kr, .ve
  • One virtual node per host
  • 8Kbyte blocks
  • RPCs use UDP
  • Caching turned off
  • Proximity routing turned off

22
CFS Fetch Time for 1MB File
Fetch Time (Seconds)
Prefetch Window (KBytes)
  • Average over the 12 hosts
  • No replication, no caching 8 KByte blocks

23
Distribution of Fetch Times for 1MB
24 Kbyte Prefetch
40 Kbyte Prefetch
8 Kbyte Prefetch
Fraction of Fetches
Time (Seconds)
24
CFS Fetch Time vs. Whole File TCP
40 Kbyte Prefetch
Whole File TCP
Fraction of Fetches
Time (Seconds)
25
CFS Summary
  • CFS provides peer-to-peer r/o storage
  • Structure DHash and Chord
  • It is efficient, robust, and load-balanced
  • It uses block-level distribution
  • The prototype is as fast as whole-file TCP
  • http//www.pdos.lcs.mit.edu/chord

26
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com