Scalable ContentAddressable Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Scalable ContentAddressable Networks

Description:

Hash tables (map keys to values) are heavily used in building software applications. The concept of a Content-Addressable Network (CAN) provides hash table-like ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 17
Provided by: labu373
Category:

less

Transcript and Presenter's Notes

Title: Scalable ContentAddressable Networks


1
Scalable Content-Addressable Networks
  • Prepared by
  • Kuhan Paramsothy
  • March 5, 2007

2
High-Level Overview
  • Hash tables (map keys to values) are heavily used
    in building software applications
  • The concept of a Content-Addressable Network
    (CAN) provides hash table-like functionality on
    Internet-like scales.
  • CAN is
  • Scalable
  • Robust/Fault-tolerant
  • Self-organizing
  • Low-latency

3
Hash Tables and CAN
  • A data structure that efficiently maps keys onto
    values
  • CANs are a form of distributed, Internet-scale
    hash tables.

4
What CAN would do for us
  • CAN would improve peer-to-peer systems
  • Napster the process of locating a file is
    centralized
  • Expensive to scale the central repository, single
    point of failure
  • Gnutella decentralized the file location process
    (network self-organizes into an application layer
    mesh)
  • Requests for files are done through flooding, not
    scalable, may not find content
  • Conclusion P2P systems need a scalable indexing
    mechanism
  • CAN would improve large data repositories
  • These systems need efficient insertion and
    retrieval
  • CAN would create large-scale name resolution
    services that dont use a naming scheme (ie. Not
    DNS)
  • No more location-dependent naming schemes

5
Basic Operations Performed On CANs
  • Basic Operations
  • Insertion (of key,value pairs)
  • Lookup (of key,value pairs)
  • Deletion (of key,value pairs)
  • Each CAN stores
  • A piece (called a zone) of the entire hash table
  • Holds information about a small number of
    adjacent zones in the table
  • Routing in a CAN
  • Done by intermediate CAN nodes towards the CAN
    node whose zone contains that key
  • CAN Design is
  • Distributed (requires no centralized control or
    coordination)
  • Scalable (nodes hold only a small about of
    information that doesnt grow with the network)
  • Fault-tolerant (nodes can route around failures)
  • Doesnt require a naming hierarchy
  • Is entirely Application Layer

6
CAN Design
  • Centers around a virtual d-dimensional Cartesian
    coordinate space on a d-torus
  • At any time, the entire coordinate space is
    dynamically partitioned among all the nodes in
    the system
  • Each node owns a distinct zone

7
CAN Design (2)
  • To store a pair, key K1 is mapped to P via a
    uniform hash function
  • The pair is then stored at the node that owns the
    zone where P lies
  • To retrieve an entry corresponding to K1, any
    node can apply the same hash function to map K1
    to P and get the corresponding value
  • A node learns and maintains the IP addresses of
    those nodes that hold adjoining coordinate zones
  • Efficient routing is critical to a useful CAN

8
Routing in a CAN
  • Routing in a Content Addressable Networks works
    by following the straight line path through the
    Cartesian space from source to destination
    coordinates.
  • A CAN node maintains a coordinate routing table
    that holds the IP address and virtual coordinate
    zone of each of its immediate neighbors in the
    coordinate space.
  • Average Path Length (d/4)(n1/d)
  • Individual Nodes Have 2d Neighbors
  • Average Path Length Grows As O(n1/d)

9
Construction of a CAN Overlay
  • The entire CAN space is divided amongst the nodes
    currently in the system
  • Incremental construction process takes three
    steps
  • The new node finds a node already in the CAN
  • Using the CAN routing mechanisms, finds a node
    whose zone will be split
  • The neighbors of the split zone must be notified
    so that routing can include the new node
  • Bootstrapping There are CAN bootstrap nodes
    associated to a DNS domain name
  • Node Insertion Affects Only O(number of
    dimensions) existing nodes

10
Maintenance of a CAN Overlay
  • Node Graceful Departure node explicitly hands
    over its zone and the associated (key,value)
    database to one of its neighbors
  • Node Abrupt Disappearance An immediate takeover
    algorithm ensures one of the failed nodes
    neighbors takes over the zone
  • Under normal conditions, a node sends periodic
    update messages to each of its neighbors and a
    list of neighbors and their zone coordinates.
  • Prolonged absence of an update message from a
    neighbor signals its failure

11
Design Improvements
  • Basic CAN algorithm provides
  • Low per-node state (O(d) for a d-dimensional
    space)
  • Short path lengths (O(dn1/d) hops for d
    dimensions and n nodes)
  • The problem is that there are application-layer
    hops, not IP-layer hops
  • Latency of each hop might be substantial

12
Design Improvements (2)
  • Improvement Multi-dimensioned Coordinate Spaces
  • Increasing the dimensions of the CAN coordinate
    space reduces the routing path length and path
    latency for a small increase in the size of the
    coordinate routing table
  • Path Length scales as O(d(n1/d))
  • Fault-tolerance improves
  • Improvement Multiple Coordinate Spaces (a.k.a.
    Multiple Realities)
  • Maintain multiple independent coordinate spaces
    with each node in the system being assigned a
    different zone in the coordinate space (each
    coordinate space is a reality)
  • Fault-tolerance improves
  • Low per-node state (O(d) for a d-dimensional
    space)
  • Short path lengths (O(dn1/d) hops for d
    dimensions and n nodes)
  • Which is better?
  • Increasing the dimensions

13
Design Improvements (3)
  • Improvement Better CAN Routing Metrics
  • Have each node measure the network-level
    round-trip-time RTT to each of its neighbors.
    Then route messages accordingly.
  • Favors lower latency paths and avoids
    unnecessarily long hops
  • Improvement Caching and Replication
  • A CAN node can maintain a cache of the data keys
    it recently accessed
  • A CAN node can replicate the data key at each of
    its neighboring nodes
  • Both schemes need an associated time-to-live
    field, to eventually expire from the cache

14
Related Systems
  • Domain Name System
  • CANs are more general than the DNS because DNS
    closely ties the naming scheme to the manner in
    which a name is resolved to an IP address
  • Peer-to-Peer
  • A simple example is keys being analogous to a URL
  • Will improve robustness
  • Key difference is that content within the CAN can
    always be located by any other node because there
    is a clear home (point) in the CAN for that
    content and every other node knows what the home
    is how to reach it

15
Discussion
  • Security?
  • Better or worse with CAN?
  • Any Other Design Improvement?
  • Is The Communication Overhead Significant?

16
References
  • A Scalable Content-Addressable Network,
    Ratnasamy, University of California Berkeley,
    http//www.sigcomm.org/sigcomm2001/p13-ratnasamy.p
    df
Write a Comment
User Comments (0)
About PowerShow.com