Wide-area cooperative storage with CFS - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Wide-area cooperative storage with CFS

Description:

Distributed lookup and fetching based on scoped flooding, keyword search. Freenet ... Also supports pre-fetching of blocks to reduce user perceived latencies. ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 29
Provided by: abhijeet
Category:

less

Transcript and Presenter's Notes

Title: Wide-area cooperative storage with CFS


1
Wide-area cooperative storage with CFS
  • Frank Dabek, M. Frans Kaashoek, David Karger,
    Robert Morris, Ion Stoica

2
Quick recap of some other p2p systems
  • File sharing systems (focus is on sharing files,
    persistence and reliable content location is not
    important)
  • Napster
  • Centralized lookup, Storage of data is
    distributed, keyword search
  • Gnutella
  • Distributed lookup and fetching based on scoped
    flooding, keyword search
  • Freenet
  • Distributed lookup and storage, anonymity is a
    major design goal, queries are routed based on
    lexical closeness of keys
  • Storage utilities (focus is on persistence,
    efficient lookups, fault tolerance)
  • PAST
  • Self-organizing overlay network that
    co-operatively route queries, caches and
    replicates data to provide a large-scale storage
    utility.

3
Introduction
  • Peer-to-peer read only storage system
  • Decentralized architecture focusing mainly on
  • efficiency of data access
  • robustness
  • load balance
  • scalability
  • Provides a distributed hash table for block
    storage
  • Uses Chord to map keys to nodes.
  • Does not provide
  • anonymity
  • strong protection against malicious participants
  • Focus is on providing an efficient and robust
    lookup and storage layer with simple algorithms.

4
CFS Software Structure
RPC API
Local API
FS
DHASH
DHASH
DHASH
CHORD
CHORD
CHORD
CFS Client
CFS Server
CFS Server
5
Layer functionalities
  • Client file system layer interprets Dhash blocks
    in a format
  • adapted from SFSRO (similar to Unix)
  • Client dhash layer uses client chord layer to
    locate blocks.
  • Client chord layer uses the server Chord layer
    as a proxy to map key to the node that is
    responsible for the key.
  • Server dhash layer stores the blocks, maintains
    proper level of replication and caching.
  • Server chord layer interacts with peer chord
    layers to implement key lookup. Interacts with
    the local dhash layer to integrate checking for
    cached copies of the block during lookup.

6
  • Client identifies the root block using a public
    key generated by
  • the publisher.
  • Uses the public key as the root block identifier
    to fetch the root block and
  • checks for the validity of the block using
    the signature
  • File inode key is obtained by usual search
    through directory
  • blocks . These contain the keys of the file
    inode blocks which are
  • used to fetch the inode blocks.
  • The inode block contains the block numbers and
    their corr. keys
  • which are used to fetch the data blocks.

7
Properties of CFS
  • decentralized control no administrative
    relationship between servers and publishers.
  • scalability lookup uses space and messages at
    most logarithmic in the number of servers.
  • availability client can retrieve data as long
    as at least one replica is reachable using the
    underlying network.
  • load balance for large files, it is done
    through spreading blocks over a number of
    servers. For small files, blocks are cached at
    servers involved in the lookup.
  • persistence once data is inserted, it is
    available for the agreed upon interval.
  • quotas are implemented by limiting the amount
    of data inserted by any particular IP address
  • efficiency - delay of file fetches is
    comparable with FTP due to efficient lookup,
    pre-fetching, caching and server selection.

8
Chord
  • Consistent hashing
  • maps node IP address Virtual host number into
    a m-bit node identifier.
  • maps block keys into the same m bit identifier
    space.
  • Node responsible for a key is the successor of
    the keys id with wrap-around in the m bit
    identifier space.
  • Consistent hashing balances the keys so that all
    nodes share equal load with high probability.
    Minimal movement of keys as nodes enter and leave
    the network.
  • For scalability, Chord uses a distributed version
    of consistent hashing in which nodes maintain
    only O(log N) state and use O(log N) messages for
    lookup with a high probability.

9
the details
  • two data structures used for performing lookups
  • Successor list This maintains the next r
    successors of the node. The successor list can be
    used to traverse the nodes and find the node
    which is responsible for the data in O(N) time.
  • Finger table ith entry in the finger table
    contains the identity of the first node that
    succeeds n by at least 2i 1 on the ID circle.
  • lookup pseudo code
  • find ids predecessor, its successor is the node
    responsible for the key
  • to find the predecessor, check if the key lies
    between the node-id and its successor. Else,
    using the finger table , find the node which is
    the closest predecessor of id and repeat this
    step.
  • since finger table entries point to nodes at
    power-of-two intervals around the ID ring, each
    iteration of above step reduces the distance
    between the predecessor and the current node by
    half.

10
Node join/failure
  • Chord tries to preserve two invariants
  • Each nodes successor is correctly maintained.
  • For every key k, node successor(k) is responsible
    for k.
  • To preserve these invariants, when a node joins a
    network
  • Initialize the predecessors, successors and
    finger table of node n
  • Update the existing finger tables of other nodes
    to reflect the addition of n
  • Notify higher layer software so that state can be
    transferred.
  • For concurrent operations and failures, each
    Chord node runs a stabilization algorithm
    periodically to update the finger tables and
    successor lists to reflect addition/failure of
    nodes.
  • If lookups fail during the stabilization process,
    the higher layer can lookup again. Chord provides
    guarantees that the stabilization algorithm will
    result in a consistent ring.

11
Server selection
  • added to Chord as part of CFS implementation.
  • basic idea is to select a server from the list
    of all servers n such that n (n , id) with
    minimum cost where
  • C(n) di davg H(n)
  • H(n) ones ((n id) gtgt (160 log N))
  • where
  • H(n) is an estimate of the number of the
    chord hops that would remain after contacting n
  • davg is the average latency of all RPCs that
    node n ever issued.
  • di is the latency to n as reported by node
    m.
  • Latencies are measured during finger table
    creation, so no extra measurements necessary.
  • This works only well for latencies such that low
    latencies from a to b and from b to c gt that the
    latency is low between a and c
  • Measurements suggest this is true. A case study
    of server selection, Masters thesis

12
Node Id Authentication
  • Attacker can destroy chosen data by selecting a
    node ID which is the successor of the data key
    and then deny the existence of the data.
  • To prevent this, when a new node joins the
    system, existing nodes check
  • If the hash (node ip virtual number) is same as
    the professed node id
  • send a random nonce to the claimed IP to check
    for IP spoofing
  • To succeed, the attacker would have to control a
    large number of machines so that he can target
    blocks of the same file (which are randomly
    distributed over multiple servers)

13
Dhash Layer
  • Provides a distributed hash table for block
    storage
  • reflects a key CFS design decision split each
    file into blocks and randomly distribute the
    blocks over many servers.
  • This provides good load distribution for large
    files .
  • disadvantage is that lookup cost increases since
    lookup is executed for each block. The lookup
    cost is small though compared to the much higher
    cost of block fetches.
  • Also supports pre-fetching of blocks to reduce
    user perceived latencies.
  • Supports replication, caching, quotas , updates
    of blocks.

14
Replication
  • Replicates the blocks on k servers to increase
    availability.
  • Places the replicas at the k servers which are
    the immediate successors of the node which is
    responsible for the key
  • Can easily find the servers from the successor
    list (r gtk)
  • Provides fault tolerance since when the successor
    fails, the next server can serve the block.
  • Since in general successor nodes are not likely
    to be physically close to each other , since the
    node id is a hash of the IP virtual number,
    this provides robustness against failure of
    multiple servers located on the same network.
  • The client can fetch the block from any of the
    k servers. Latency can be used as a deciding
    factor. This also has the side-effect of
    spreading the load across multiple servers. This
    works under the assumption that the proximity in
    the underlying network is transitive.

15
Caching
  • Dhash implements caching to avoid overloading
    servers for popular data.
  • Caching is based on the observation that as the
    lookup proceeds more and more towards the desired
    key, the distance traveled across the key space
    with each hop decreases. This implies that with a
    high probability, the nodes just before the key
    are involved in a large number of lookups for the
    same block. So when the client fetches the block
    from the successor node, it also caches it at the
    servers which were involved in the lookup .
  • Cache replacement policy is LRU. Blocks which are
    cached on servers at large distances are evicted
    faster from the cache since not many lookups
    touch these servers. On the other hand, blocks
    cached on closer servers remain alive in the
    cache as long as they are referenced.

16
Implementation
  • Implemented in 7000 lines of C code including
    3000 lines of Chord
  • User level programs communicate over UDP with
    RPC primitives provided by the SFS toolkit.
  • Chord library maintains the successor lists and
    the finger tables. For multiple virtual servers
    on the same physical server, the routing tables
    are shared for efficiency.
  • Each Dhash instance is associated with a chord
    virtual server. Has its own implementation of the
    chord lookup protocol to increase efficiency.
  • Client FS implementation exports an ordinary Unix
    like file system. The client runs on the same
    machine as the server, uses Unix domain sockets
    to communicate with the local server and uses the
    server as a proxy to send queries to non-local
    CFS servers.

17
Experimental results
  • Two sets of tests
  • To test real-world client-perceived performance ,
    the first test explores performance on a subset
    of 12 machines of the RON testbed.
  • 1 megabyte file split into 8K size blocks
  • All machines download the file one at a time .
  • Measure the download speed with and without
    server selection
  • The second test is a controlled test in which a
    number of servers are run on the same physical
    machine and use the local loopback interface for
    communication. In this test, robustness,
    scalability, load balancing etc. of CFS are
    studied.

18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Future Research
  • Support keyword search
  • By adopting an existing centralized search engine
    (like Napster)
  • use a distributed set of index files stored on
    CFS
  • Improve security against malicious participants.
  • Can form a consistent internal ring and can route
    all lookups to nodes internal to the ring and
    then deny the existence of the data
  • Content hashes help guard against block
    substitution.
  • Future versions will add periodic routing table
    consistency check by randomly selected nodes to
    see try to detect malicious participants.
  • Lazy replica copying to reduce the overhead for
    hosts which join the network for a short period
    of time.
  • Adapting the pre-fetch window size dynamically
    based on RTT and network bandwidth much like TCP
Write a Comment
User Comments (0)
About PowerShow.com