A scalable Content- Addressable Network - PowerPoint PPT Presentation

About This Presentation
Title:

A scalable Content- Addressable Network

Description:

From DNS domain name, one or more bootstrap nodes is determined. ... Hash table availability: adequate replication of a (key,value) entry to ... – PowerPoint PPT presentation

Number of Views:247
Avg rating:3.0/5.0
Slides: 34
Provided by: pirammanay
Learn more at: http://cecs.wright.edu
Category:

less

Transcript and Presenter's Notes

Title: A scalable Content- Addressable Network


1
A scalable Content- Addressable Network
  • Sylvia Rathnasamy, Paul Francis, Mark Handley,
    Richard Karp, Scott Shenker

Pirammanayagam Manickavasagam
2
Overview
  • Introduction
  • Design
  • Design Improvements
  • Design Review
  • Related works
  • Discussion

3
Introduction
  • Hash Table Functionality
  • Maps key to a value.
  • Content Addressable Network (CAN) -
  • Is a concept that provides distributed
    infrastructure which has Hash Table like
    functionality on Internet like Scale.
  • Characteristics
  • scalable, fault-tolerant and completely
    self-organizing.

4
Introduction (cont..)
  • Napster
  • Locating a file is centralized.
  • Gnutella
  • Floods the request for a file, not scalable
  • CAN provides a solution
  • Scalable - Nodes maintain small amount of control
    state
  • Distributed - Hash table is stored in all Peers,
    so it is.

5
Design
  • Each node stores a chunk of hash table entry and
    details of adjacent zones.
  • Requests are forwarded towards the CAN node that
    contains the key.
  • Indexing uses virtual d-dimensional Cartesian
    coordinates.
  • Coordinates are purely logical

6
Coordinate Space
Each node randomly picks a coordinate. Coordinate
space is dynamically partitioned Each node owns
its individual zone
0,1
  • A
  • C
  • D
  • B

1,0
0,0
7
Design (cont..)
  • Inserting a pair ( key K1, value V1)
  • Use Hash function to map K1 to a point P1 in
    space
  • Then this pair is stored in the Node that owns
    the zone
  • Retrieving a value
  • Need to know the key and use the key to identify
    the node
  • Node learns and maintains the table of details of
    adjacent nodes.

8
Routing
  • Information's needed for routing
  • CAN node hold routing table that contains IP
    address and its virtual coordinate space.
  • Neighbor is determined if one of the d-dimension
    is same and another dimension abuts.
  • For a d-dimensional coordinate individual node
    maintains 2d neighbors

9
In figure nodes 51 are neighbors, as 5 has same
Y coordinates as 1 and X coordinate abut 1s.
10
Routing (Cont..)
  • CAN message has destination address
  • By simple greedy forwarding to the neighbor
    closest to the destination it proceeds it
    routing.
  • average path length (d/4)n1/d hops. ( n - of
    zones)
  • As many path is available, network sustains even
    if some node fails.

11
Construction
  • 1. First the new node must find a node already in
    the CAN.
  • 2. Next, using the CAN routing mechanisms, it
    must find a node whose zone will be split.
  • 3. Finally, the neighbors of the split zone must
    be notified so that routing can include the new
    node.

12
Bootstrap
  • From DNS domain name, one or more bootstrap nodes
    is determined.
  • A bootstrap node maintains a partial list of CAN
    nodes it believes are currently in the system.
  • TO join a CAN, a new node looks up the CAN domain
    name in DNS to retrieve a bootstrap nodes IP
    address.
  • This bootstrap node then supplies the IP address
    of several randomly chosen nodes currently in
    system.

13
Finding a zone
  • New node randomly chooses a point (p) in space.
  • Sends JOIN request destined for P.
  • This is sent into CAN via existing CAN node.
  • Current occupant node then splits its zone in
    half and assigns one half to the new node.
  • Splitting is done by assuming certain order.
  • Eg, in 2 d, X coordinate splits first and then Y
    coordinate.

14
Maintenance
  • Departure of a Node
  • Single Node Failure
  • Multiple Failure

15
Departure of a Node
  • The node that departs hands over the details to
    the one of its neighbor.
  • If the zone of one of the neighbors can be merged
    with the departing nodes zone to produce a valid
    single zone, then this is done.
  • If not, then the zone is handed to the neighbor
    whose current zone is smallest, and that node
    will then temporarily handle both zones.

16
Departure of a Node
When node F fails, E will be merged with F
0,1
  • D
  • A
  • C
  • E
  • F
  • .
  • D
  • B

1,0
0,0
17
Failures
  • Prolonged absence of update message will indicate
    the failure of a node.
  • Neighbor node starts a takeover timer running.
  • When the timer expires, a node sends a TAKEOVER
    message conveying its own zone volume to all of
    the failed nodes neighbors.
  • It accepts the TAKEOVER only if the zone volume
    in the message is smaller than its own zone
    volume.
  • Otherwise it sends its TAKEOVER message.

18
Multiple Failure
  • First does a ring search to get the unreachable
    nodes.
  • Then rebuilds neighbor state table to do safe
    takeover.

19
Design Improvements
  • Multi-dimensioned coordinate spaces
  • Increasing the dimensions of the CAN coordinate
    space reduces the routing path length, and hence
    the path latency.
  • Increase in Dimension gt increase in neighbor gt
    increase in routing gt increases routing fault
    tolerance

20
(No Transcript)
21
Design Improvements
  • Realities multiple coordinate spaces
  • Each node maintain multiple, independent
    coordinate spaces with each node in the system.
    Each such coordinate space is a reality.
  • Given a coordinate, it is searched in all
    realities.
  • This reduces the average path length.
  • Multiple dimensions vs. multiple realities
  • Multiple Reality has increased fault tolerance
    and data availability than multiple dimensions.

22
Design Improvements
  • Overloading coordinate zones
  • allow multiple nodes to share the same zone.
    Nodes that share the same zone are termed peers.
  • MAXPEERS, which is the maximum number of
    allowable peers per zone.
  • reduced path length (number of hops), and hence
    reduced path latency
  • improved fault tolerance
  • Multiple hash functions
  • Almost equal to multi realities.

23
Design Improvements
  • Topologically-sensitive construction of the CAN
    overlay network
  • CAN nodes are ordered with their round-trip-time
    to each of landmarks.
  • With m landmarks, m! such orderings are possible.
  • Every portion is assigned a landmark ordering.
  • a new node joins the CAN at a random point in
    that portion of the coordinate space associated
    with its landmark ordering.

24
Design Improvements
  • More Uniform Partitioning
  • Zone are split after comparing volume of its zone
    with those of its immediate neighbors in the
    coordinate space.
  • Zone with the largest volume is split.
  • we can see that without the uniform partitioning
    feature a little over 40 of the nodes are
    assigned to zones with volume V as compared to
    almost 90 with this feature and the largest zone
    volume drops from 8V to 2V .
  • Not surprisingly, the partitioning of the space
    further improves with increasing dimensions.
  • Caching and Replication techniques

25
(No Transcript)
26
Design Review
  • Following metrics were used to evaluate system
    performance
  • Path length the number of (application-level)
    hops required to route between two points in the
    coordinate space.
  • Neighbor-state the number of CAN nodes for which
    an individual node must retain state.
  • Latency we consider both the end-to-end latency
    of the total routing path between two points in
    the coordinate space and the per-hop latency,
    i.e., latency of individual application level
    hops obtained by dividing the end-to-end latency
    by the path length.
  • Volume the volume of the zone to which a node is
    assigned that is indicative of the request and
    storage load a node must handle.
  • Routing fault tolerance the availability of
    multiple paths between two points in the CAN.
  • Hash table availability adequate replication of
    a (key,value) entry to withstand the loss of one
    or more replicas.

27
Design Review
  • The key design parameters affecting system
    performance are
  • dimensionality of the virtual coordinate space d
  • number of realities r
  • number of peer nodes per zone p
  • number of hash functions (i.e. number of points
    per reality at which a (key, value) pair is
    stored) k
  • use of the RTT-weighted routing metric
  • use of the uniform partitioning
  • Test system specification
  • A system size of n218 nodes ,Transit-Stub
    topology with delay of 100ms on intra-transit
    links, 10ms on stub-transit links and 1ms on
    intra-stub links (i.e. 100ms on links that
    connect two transit nodes, 10ms on links that
    connect a transit node to a stubnode and so
    forth).
  • Transit-stub models explicitly group vertices
    into domains, and reflect that grouping in the
    connectivity between vertices.

28
100 node transit-stub topology
29
Bare bones CAN that does not utilize most of
our additional design features Knobs-on-full
CAN making full use of our added features
(without the landmark ordering feature)
30
Related Work
  • Related Algorithms
  • Distance vector and Link State algorithms
  • These need widespread topological information.
  • CAN in other hand stores only less data.
  • Plaxton algorithm
  • Each node has n bit label divided into l levels.
  • Each level has width w n/ l.
  • Each node forwards a packet to a neighbor whose
    label matches the destination label in more
    digits.

31
Related Work
  • Algorithms with geographic routing.
  • space in this algorithm refers to physical
    space.
  • No neighbor search problem.
  • Correctly mimic the space is a trivial problem
  • It is not extensible to multi dimension

32
Related System
  • Domain Name System
  • It stores (domain name, IP address).
  • Ocean Store
  • To provide continuous access to persistent
    information
  • Uses Plaxtons algorithm
  • Peer-to-Peer file sharing systems
  • Freenet
  • Stores Keys ( analogous URL ), address of other
    nodes, data corresponding to key.

33
Discussion
  • Addresses two key problems in the design of
    Content-Addressable Networks scalable routing
    and indexing.
  • Simulation results validate the scalability of
    our overall design for a CAN with over 260,000
    nodes, we can route with a latency that is less
    than twice the IP path latency.
  • Future works
  • Secure CAN
  • Key word searching
Write a Comment
User Comments (0)
About PowerShow.com