Wide-area cooperative storage with CFS - PowerPoint PPT Presentation

About This Presentation
Title:

Wide-area cooperative storage with CFS

Description:

Wide-area cooperative storage with CFS – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 27
Provided by: kjasdfoiawen
Category:

less

Transcript and Presenter's Notes

Title: Wide-area cooperative storage with CFS


1
Wide-area cooperative storage with CFS
2
Overview
  • CFS Cooperative File System
  • Peer-to-peer read-only file system
  • Distributed hash table for block storage
  • Lookup performed by Chord

3
Design Overview
  • CFS clients contain 3 layers
  • A file system client
  • Dhash storage layer
  • Chord lookup layer
  • CFS servers contain 2 layers
  • Dhash storage layer
  • Chord lookup layer

4
Overview cont
  • Disk blocksDHash blocksDisk AddressesBlock
    Identifiers
  • CFS file systems are read-only as far as clients
    are concerned
  • can be modified by its publisher

5
File System Layout
  • Insert file system blocks into the CFS system
    using a content hash as its identifier
  • Then signs the root block with its private key
  • Inserts the root block into CFS using the
    corresponding public key as its identifier

6
Publisher Updates
  • Updating the file systems root block to point to
    the new data
  • Authentication by checking to make sure that the
    same key signed both old and new block
  • Timestamps prevent replays of old data
  • File systems are updated without changing the
    root blocks identifier

7
CFS properties
  • Decentralized control
  • Scalability
  • Availability
  • Load balance
  • Persistence
  • Quotas
  • Efficiency

8
Chord Layer
  • Same Chord protocol as mentioned earlier with a
    few modifications
  • Server Selection
  • Node ID authentication

9
Quick Chord Overview
  • Consistent Hashing
  • Node joins/leaves
  • Successor lists
  • O(N)
  • finger tables
  • O(log N)

10
Server Selection
  • Chooses next node to contact from finger table
  • Gets the closest node to the destination
  • What about network latency?
  • Introduced measuring and storing latency in the
    finger tables
  • Calculated when acquiring finger table entries
  • Reasoning RPCs to different nodes will incur
    varying latency so you want to choose on that
    minimizes this

11
Node ID authentication
  • Idea network security
  • All chord IDs must be in the form h(x)
  • H SHA-1 has function
  • x the nodes IP virtual node index
  • When a new node joins the system
  • Existing node will send a message to the new node
  • The ID must match the claimed IP virtual node
    index hash to be accepted

12
DHash Layer
  • Handles
  • Storing and retrieving blocks
  • Distribution
  • Replication
  • Caching of blocks
  • Uses Chord to locate blocks
  • Key CFS design split each file system into
    blocks and distribute those across many servers

13
Replication
  • DHash replicates each block on k servers
    immediately after the blocks sucessor
  • Why? even if the blocks successor fails, the
    block is still available
  • Server independence guaranteed because location
    on the ring is determined by hash of IP not by
    physical location

14
Replication cont
  • Could save space by storing coded pieces of
    blocks but.storage space is not expected to be a
    highly-constrained resource
  • Placement of replicas allows a client to select
    the replica with the fastest download
  • Result from Chord lookup will be the immediate
    predecessor to the node with X
  • This nodes successor table contains entries for
    the latencies of the nearest Y nodes

15
Caching
  • Caching blocks prevents overloading servers with
    popular data
  • Using Chord
  • Clients contact server closer and closer to the
    desired location
  • Once source is found or an intermediate cached
    copy is found all servers just contacted receive
    file to cache
  • Replaces cached blocks in least-recently-used
    order

16
Load Balance
  • Virtual servers1 real server acting as several
    virtual servers
  • Administrator can configure the number of based
    on servers storage and network capabilities
  • Possible Side-effect creating more hops in Chord
    algorithm
  • More nodes more hops
  • Solution allow virtual servers to look at each
    others tables

17
Quotas
  • Control amount of data a publisher can inject
  • Based on reliable ID of publishers wont work
    because it requires centralized administration
  • CFS uses quotas based on IP address of publishers
  • Each server imposes a 0.1 limitso as the
    capacity grows the total data amount grows
  • Not easy to subvert this system because
    publishers must respond to initial confirmation
    requests

18
Updates and Deletion
  • Only allows the publisher to modify data
  • 2 conditions for acceptance
  • Marked as a content-hash block ? supplied key
    SHA-1 hash of the blocks content
  • Marked as a signed block ? signed by public key
    SHA-1 hash is the blocks CFS key
  • No explicit delete
  • Publishers must refresh blocks if they want them
    stored
  • CFS deletes blocks that have not been refreshed
    recently

19
Experimental Results Real Life
  • 12 machines over the internet US, the
    Netherlands, Sweden and South Korea

20
Lookup
  • Range of servers
  • 10,000 blocks
  • 10,000 lookups for random blocks
  • Distribution is roughly linear on log plot
  • so O(log N)

21
Load Balance
  • Theoretical
  • 64 physical servers
  • 1,6 and 24 virtual servers each
  • Actual
  • 10,000 blocks
  • 64 actual
  • 6 virtual each

22
Caching
  • Single block (1)
  • 1,000 server system
  • Average without5.7
  • Average 3.2
  • (10 look-ups)

23
Storage Space Control
  • Varying number of virtual servers
  • 7 physical servers
  • 1, 2, 4, 8, 16, 32, 64 and 128 virtual
  • 10,000 blocks

24
Effect of Failure
  • 1,000 blocks
  • 1,000 server system
  • Each block has 6 replicas
  • Fraction of servers fail before the stabilization
    algorithm is run

25
Effect of Failure cont
  • Same set-up as before
  • X of servers fail

26
Conclusion
  • Highly scalable, available, secure read-only file
    system
  • Uses peer-to-peer Chord protocol for lookup
  • Uses replication and caching to achieve
    availability and load balance
  • Simple but effective protection against inserting
    large amount of malicious data
Write a Comment
User Comments (0)
About PowerShow.com