Plethora: A Locality Enhancing PeertoPeer Network - PowerPoint PPT Presentation

About This Presentation
Title:

Plethora: A Locality Enhancing PeertoPeer Network

Description:

... Addresses as ... In most cases, nodes with IP addresses that are numerically close are also ... Use of IP addresses as virtual IDs would probably produce ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 46
Provided by: csPu
Category:

less

Transcript and Presenter's Notes

Title: Plethora: A Locality Enhancing PeertoPeer Network


1
Plethora A Locality Enhancing Peer-to-Peer
Network
  • Ronaldo Alves Ferreira
  • Advisor Ananth Grama
  • Co-advisor Suresh Jagannathan
  • Department of Computer Sciences Purdue
    University
  • July - 2003

2
Outline
  • Introduction
  • Motivation
  • IP Addresses as Virtual IDs
  • Autonomous Systems as Basis for Locality
    Plethora Organization and Algorithms
  • Simulation Results
  • Conclusions
  • Ongoing Work

3
Introduction
  • Peer-to-Peer (P2P) networks are self-organizing
    distributed systems where participating nodes
    both provide and receive services from each other
    in a cooperative manner without distinguished
    roles as pure clients or pure servers.
  • P2P Internet applications have recently been
    popularized by file sharing applications like
    Napster and Gnutella.
  • P2P systems have many interesting technical
    aspects such as decentralized control,
    self-organization, adaptation and scalability.
  • One of the key problems in large-scale P2P
    applications is to provide efficient algorithms
    for object location and routing within the
    network.

4
Location and Routing
  • Central server (Napster)
  • Controlled flooding (Gnutella)
  • Sequential version of flooding (Freenet)
  • Structured solution DHT (Chord, Pastry,
    Tapestry, CAN)

5
Location and Routing - DHT
  • All known proposals take as input a key and, in
    response, route a message to the node responsible
    for that key.
  • The keys are strings of digits of some length
    (generally 128 bits).
  • Nodes have identifiers taken from the same space
    as the keys (same number of digits).
  • Each node maintains a routing table consisting of
    a small subset of nodes in the system.
  • Nodes route queries to neighbor nodes that make
    the most progress towards resolving the query.

6
Location and Routing - DHT
  • The notion of progress differs from algorithm to
    algorithm.
  • Plaxton developed the first ideas that could be
    applied in a scalable manner.
  • While intended for a static node population,
    Plaxton algorithm provides efficient routing of
    queries.
  • The algorithm works by correcting a single
    digit at a time.
  • Chord, Pastry, and Tapestry are variants of
    Plaxton algorithm.

7
Location and Routing - DHT
0XXX
1XXX
2XXX
3XXX
0112
2321
START 0112 routes a message to key 2000.
2032
First hop fixes first digit (2)
2001
Second hop fixes second digit (20)
END 2001 closest live node to 2000.
8
Location and Routing - DHT
1
0
1
1
0
0
1
1
1
1
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
1
2
14
3
4
5
6
7
8
9
10
11
12
13
15
9
Location and Routing - DHT
Node 0 Routing Table
12
5
1
0
3
1
1
1
0
0
Leaf Set
1
1
1
1
0
0
0
0
13 14 15 1 2 3
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
1
2
14
3
4
5
6
7
8
9
10
11
12
13
15
10
Location and Routing - DHT
Node 0 Routing Table
0 1 2 3
-- 6 10 12
-- 1 2 3
1
0
1
1
0
0
1
1
1
1
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
1
2
14
3
4
5
6
7
8
9
10
11
12
13
15
11
Location and Routing - Pastry
  • Computers (nodes) have unique ID
  • Typically 128 bits long
  • Assignment should lead to uniform distribution in
    the node ID space, for example SHA-1 of nodes IP
  • Primitive route(msg, key)
  • Deliver msg to currently alive node with ID
    numerically closest to key
  • Node state
  • Routing table
  • Neighborhood set
  • Leaf set
  • Scalable, efficient
  • O(log(N)) routing table entries per node
  • Route in O(log(N)) number of hops

12
DHT Performance Issues
  • Virtualization destroys locality.
  • Messages may have to travel around the world to
    reach a node in the same LAN.
  • Query responses do not contain locality
    information.
  • Heuristics to minimize the problem
  • Proximity routing
  • Topology-based node ID assignment
  • Proximity neighbor selection

13
Motivation
  • Virtualization destroys locality.
  • Query responses do not contain locality
    information.
  • Recent studies show that queries for multiple
    keys in P2P networks follow a Zipf-like
    distribution.
  • For many wide-are distributed applications, nodes
    in the same region share common interests. For
    example, music sharing applications.

14
IP Addresses as Virtual IDs
  • A natural way of building locality in an overlay
    network is to explore the addressing scheme of
    the underlying network.
  • In most cases, nodes with IP addresses that are
    numerically close are also physically close.
  • Organization of the Internet in ASs. By
    correcting a few bits in each hop, the last hops
    would be inside an AS.

15
IP Addresses as Virtual IDs
  • IP space is not uniformly populated by peers.
  • Load imbalance at the peers.
  • The upper bound of O(log n) can no longer be
    guaranteed.

16
IP Addresses as Virtual IDs
  • How severe would be the load imbalance if we use
    the IP address of the node as its overlay
    identifier?
  • Is it possible to find a boundary in the IP
    address such that distribution of peers is
    uniform and such that some form of locality is
    captured?
  • Experimental Basis
  • Gnutella traces from June 2002 with 56M messages.
  • 62,000 different IP addresses. Addresses were
    validated using a whois server and Ping.

17
IP Addresses as Virtual IDs
18
IP Addresses as Virtual IDs
  • 2,420 nodes. 20 keys per node.

19
IP Addresses as Virtual IDs
20
IP Addresses as Virtual IDs
21
IP Addresses as Virtual IDs
  • Average CIDR prefix length for the address over
    19 bits.
  • Negative result.
  • Provides us with an insight to propose a
    two-level overlay architecture.
  • One global overlay, and several local overlays.
  • A local overlay is formed with nodes that share
    the first 8 bits.

22
IP Addresses as Virtual IDs
23
Plethora
  • Two-level overlay
  • One global overlay
  • Several local overlays
  • Global overlay is the main repository of data.
    Any DHT protocol can be used.
  • Global overlay helps nodes organize themselves
    into local overlays.
  • Local overlays explore the organization of the
    Internet in ASs.
  • Local overlays use a modified version of Pastry.
  • Size of the local overlay is controlled by a
    local overlay leader.
  • Uses efficient distributed algorithms for merging
    and splitting local overlays.

24
Plethora Data Access
25
Plethora LO Routing Information
  • Corrects a single bit at each hop.
  • Each node has a routing table and a leaf set as
    in Pastry.
  • Each routing table entry has pointers to a
    primary and to a secondary neighbor. Primary
    neighbors are used to implement proximity
    neighbor selection. Secondary neighbors are used
    to implement the local overlay split operation.

26
Node Arrivals
  • When joining the network, a node first joins the
    global overlay using the specific DHT protocol.
  • After joining the global overlay, the new node
    contacts the rendezvous point of its AS to
    determine which local overlay it will join.
  • A new node uses its AS neighborhood information
    to join other AS local overlays when there is no
    node of its own AS in the network.

27
Splitting Local Overlays
  • AS Invariant
  • Nodes of the same autonomous system must always
    stay in the same local overlay after a split
    operation.

28
Splitting Local Overlays
  • Nodes use a hash function on their AS numbers to
    determine other nodes that will stay together in
    the same local overlay after a split operation.
  • During network operation, nodes make secondary
    neighbor pointers in their routing tables point
    to nodes with the same AS hash value.
  • Local overlay leader periodically circulates a
    message to determine the number of nodes in the
    LO. If the number of nodes exceeds the maximum
    threshold, the leader issues a split message to
    all nodes in the LO.
  • On receiving a split message, a node n discards
    pointers to nodes whose hash values differ from
    it.

29
Splitting Local Overlays
30
Splitting Local Overlays
  • Lemma After a split operation, the two new local
    overlays are connected with high probability.
  • Set the leaf set to K log M, where M is the
    maximum number of nodes allowed in a local
    overlay, and K is a constant greater than 1.
  • Assuming that the hash values 0 and 1 are equally
    possible, the probability of a node n being
    disconnected is equal to
  • The probability of n being in a connected overlay
    is

31
Node Departures
  • Node departures are handled lazily.
  • If a node detects that one of its neighbors has
    left the network, it routes using alternative
    mechanisms (for example, leaf set) and tries to
    find a replacement for the missing node.
  • If the local overlay leader leaves, the first
    node that detects its departure triggers a new
    leader election protocol.

32
Merging Local Overlays
  • If the sizes of the two overlays differ for more
    than a constant factor a, simple insertions of
    the nodes of the smaller overlay are performed
    into the larger overlay.
  • If the sizes of the two overlays are within a
    constant factor a, use distributed algorithm
    based on hypercube merging.
  • Analogous to merging two hypercubes of dimension
    d to produce a hypercube of dimension d1.
  • On receiving a merge message, nodes add a new row
    to their routing table.

33
Merging Local Overlays
34
Merging Local Overlays
Node 0 Routing Table
5
L
L
1
2
3
1
1
1
0
0
Leaf Set
1
1
1
1
0
0
0
0
5 6 7 1 2 3
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
1
2
6
3
4
5
6
7
0
1
2
3
4
5
7
35
Merging Local Overlays
Node 0 Routing Table
12
5
1
0
3
1
1
1
0
0
Leaf Set
1
1
1
1
0
0
0
0
13 14 15 1 2 3
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
1
2
14
3
4
5
6
7
8
9
10
11
12
13
15
36
Simulation Setup
  • Internet topology generated using GT-ITM topology
    generator.
  • 10 transit domains. 1,000 stub domains. 100,000
    hosts
  • Each stub domain is one AS.
  • 10,000 overlay nodes selected randomly from the
    hosts.
  • NLANR web proxy trace with 500,254 objects.
  • Zipf distribution parameters 0.70, 0.75, 0.80,
    0.85, 0.90
  • Maximum overlay sizes 200 300 400 500
    1,000 2,000
  • Local cache size 5MB (LRU replacement policy).

37
Simulation Results
Response Delay
38
Simulation Results
Response Delay
39
Simulation Results
Response Delay
40
Simulation Results
Number of Messages
41
Simulation Results
Number of Messages
42
Simulation Results
Split Operation
43
Simulation Results
Merge Operation
44
Conclusions
  • Use of IP addresses as virtual IDs would probably
    produce overlays with good locality properties,
    but the non-uniform population of nodes in the IP
    space leads to severe load imbalances and no
    guarantees on the number of hops exist.
  • Plethora is a two-level overlay architecture.
    Local overlays are created to cluster nodes that
    are close in the underlying network.
  • Plethora uses efficient distributed algorithms
    for merging and splitting local overlays.
  • The performance gains of a two-level architecture
    are significant, when compared with a single
    global overlay.
  • The costs of maintaining the two-level
    architecture are very low.

45
Future Work
  • Short term goal develop a caching replacement
    policy using availability of the nodes as a
    parameter.
  • Long term goal implementation of a version-based
    wide-area read-write distributed file system
    using Plethora as its routing core.
Write a Comment
User Comments (0)
About PowerShow.com