Title: PeertoPeer Overlay Networks
1Peer-to-Peer Overlay Networks
- Distributed systems without any hierarchical
organization or centralized control - Peers form self organizing overlay networks over
IP network - Allows access to its resources - requires
- Robust wide-area routing architecture
- Efficient search of data items
- Selection of nearby peers
- Redundant storage
- Self Organization
- Massive Scalability and fault tolerance
2Abstract P2P Overlay Architecture
3Structured vs Unstructured Overlays
- Structured - Overlay network topology is tightly
controlled and content are placed not at random
peers but at specified locations that will make
subsequent queries more efficient. - Uses Distributed Hash Table (DHT) as a substrate,
- data object (or value) location information is
placed deterministically, at the peers with
identifiers corresponding to the data objects
unique key. - Examples CAN, Chord, Pastry, Tapestry
4Structured DHT based P2P Overlays
- Data objects are assigned unique identifiers
called keys, chosen from the same identifier
space. - Keys are mapped by the overlay network protocol
to a unique live peer in the overlay network. - The P2P overlay supports the scalable storage and
retrieval of key,value pairs on the overlay
network, - Each peer maintains a small routing table
consisting of its neighboring peers Node IDs
and IP addresses
5Unstructured P2P Overlay Networks
- System composed of peers joining the network with
some loose rules, without any prior knowledge of
the topology. - Network uses flooding or random walks as the
mechanism to send queries across the overlay with
a limited scope. - When a peer receives the flood query, it sends a
list of all content matching the query to the
originating peer - Examples FreeNet, Gnutella,KaZaA, BitTorrent
6Structured vs Unstructured
- Structured
- Gives an upper-bound on hops for data lookup
- Massively scalable
- Underlying network path can be significantly
different from the path on the DHT-based overlay
network. Lookup latency can be quite high and
could adversely affect the performance of the
applications running over it. - Higher overheads for popular content
- Unstructured
- Effective in locating highly replicated items
- Resilient to peers joining and leaving the system
- Poor in locating rare items
- Load on peers increases linearly with the total
number of queries and the system size
Scalability problems
7Chord A Scalable Peer-to-peer Lookup Service
for Internet Applications
Ion Stoica, Robert Morris, David Karger, M. Frans
Kaashoek, Hari Balakrishnan MIT and Berkeley
- presentation based on slides by Robert Morris
8Outline
- Motivation and background
- Consistency caching
- Chord
- Performance evaluation
- Conclusion and discussion
9Motivation
How to find data in a distributed file sharing
system?
Publisher
KeyLetItBe ValueMP3 data
Internet
?
Client
Lookup(LetItBe)
- Lookup is the key problem
10Centralized Solution
Publisher
KeyLetItBe ValueMP3 data
Internet
Client
Lookup(LetItBe)
- Requires O(M) state
- Single point of failure
11Distributed Solution (1)
- Flooding (Gnutella, Morpheus, etc.)
Publisher
KeyLetItBe ValueMP3 data
Internet
Client
Lookup(LetItBe)
- Worst case O(N) messages per lookup
12Distributed Solution (2)
- Routed messages (Freenet, Tapestry, Chord, CAN,
etc.)
Publisher
KeyLetItBe ValueMP3 data
Internet
Client
Lookup(LetItBe)
13Routing Challenges
- Define a useful key nearness metric
- Keep the hop count small
- Keep the routing tables right size
- Stay robust despite rapid changes in membership
- Authors claim
- Chord emphasizes efficiency and simplicity
14Chord Overview
- Provides peer-to-peer hash lookup service
- Lookup(key) ? IP address
- Chord does not store the data
- How does Chord locate a node?
- How does Chord maintain routing tables?
- How does Chord cope with changes in membership?
15Chord properties
- Efficient O(Log N) messages per lookup
- N is the total number of servers
- Scalable O(Log N) state per node
- Robust survives massive changes in membership
- Proofs are in paper / tech report
- Assuming no malicious participants
16Chord IDs
- m bit identifier space for both keys and nodes
- Key identifier SHA-1(key)
- Node identifier SHA-1(IP address)
- Both are uniformly distributed
- How to map key IDs to node IDs?
17Consistent Hashing Karger 97
K5
0
IP198.10.10.1
N123
K20
Circular 7-bit ID space
N32
K101
KeyLetItBe
N90
K60
- A key is stored at its successor node with next
higher ID
18Consistent Hashing
- Every node knows of every other node
- requires global information
- Routing tables are large O(N)
- Lookups are fast O(1)
0
N10
Where is LetItBe?
Hash(LetItBe) K60
N123
N32
N90 has K60
N90
K60
N55
19Chord Basic Lookup
- Every node knows its successor in the ring
0
N10
Where is LetItBe?
N123
Hash(LetItBe) K60
N32
N90 has K60
N55
N90
K60
20Finger Tables
- Every node knows m other nodes in the ring
- Increase distance exponentially
N16
N112
80 25
80 26
N96
80 24
80 23
80 22
80 21
80 20
N80
21Finger Tables
- Finger i points to successor of n2i-1
N120
N16
N112
80 25
80 26
N96
80 24
80 23
80 22
80 21
80 20
N80
22Lookups are Faster
- Lookups take O(Log N) hops
N5
N10
N110
K20
N20
N99
N32
Lookup(K20)
N80
N60
23Joining the Ring
- Three step process
- Initialize all fingers of new node
- Update fingers of existing nodes
- Transfer keys from successor to new node
- Less aggressive mechanism (lazy finger update)
- Initialize only the finger to successor node
- Periodically verify immediate successor,
predecessor - Periodically refresh finger table entries
24Joining the Ring - Step 1
- Initialize the new node finger table
- Locate any node p in the ring
- Ask node p to lookup fingers of new node N36
- Return results to new node
- Optimizations possible
N5
N20
N99
N36
1. Lookup(37,38,40,,100,164)
N40
N80
N60
25Joining the Ring - Step 2
- Updating fingers of existing nodes
- new node calls update function on existing nodes
- existing nodes can recursively update fingers of
other nodes
N5
N20
N99
N36
N40
N80
N60
26Joining the Ring - Step 3
- Transfer keys from successor node to new node
- only keys in the range are transferred
N5
N20
N99
N36
Copy keys 21..36 from N40 to N36
N40
K30 K38
N80
N60
27Stabilization
- Maintain the invariants
- Each nodes successor is correctly maintained.
- For every key k, successor(k) is responsible for
k - Keep nodes successor pointers up
- to date, which is sufficient to
- guarantee correctness of lookups.
- Successor pointers are used to
- verify and correct finger table
- entries in the background
28Handing Failures
- Failure of nodes might cause incorrect lookup
N120
N10
N113
N102
Lookup(90)
N85
N80
- N80 doesnt know correct successor, so lookup
fails - Successor fingers are enough for correctness
29Handling Failures
- Use successor list
- Each node knows r immediate successors
- After failure, will know first live successor
- Correct successors guarantee correct lookups
- Guarantee is with some probability
- Can choose r to make probability of lookup
failure arbitrarily small
30Evaluation Overview
- Quick lookup in large systems
- Low variation in lookup costs
- Robust despite massive failure
- Experiments confirm theoretical results
31Load Balance
- Not quite as predicted with normal scheme.
Achieved with virtual nodes
32Cost of lookup
- Cost is O(Log N) as predicted by theory
Average Messages per Lookup
Number of Nodes
33Robustness
- Simulation results static scenario
- Failed lookup means original node with key
failed (no replica of keys)
- Result implies good balance of keys among nodes!
34Robustness
- Simulation results dynamic scenario
- Failed lookup means finger path has a failed node
- 500 nodes initially
- average stabilize( ) call 30s
- 1 lookup per second (Poisson)
- x join/fail per second (Poisson)
35Limitations
- Of DHT based systems in general
- Peers route a message to the next intermediate
peer that can be located very far away with
regard to physical topology of the underlying IP
network. This can result in high network delay
and unnecessary long-distance network traffics. - DHT-based systems assume that all peers equally
participate in hosting published data objects or
their location information. This would lead to a
bottleneck at low capacity peers. - Of Chord
- Ring Partitions might pose a problem
- Security -gt how to deal with malicious
participants? - Virtualized ID space lacks locality
- Scalability of Stabilization protocol how often
doees the stabilization procedure need to run?
How do we balance consistence and network
overhead?
36Strengths
- Sound theoretical work almost best tradeoffs
in storage, lookups, and routing in face of joins
or exits - Analyze a large number of system properties
load balance, path length, recovery time etc. - Has been used widely. Considered a seminal work
- General purpose DHash layer for applications
- Distributed DNS
- CFS (Wide area co-operative file system for
distributed read-only storage) - Ivy (p2p read/write file system)
- Internet Indirection Infrastructure
37Issues
Security considerations (many possible attacks
beyond data integrity) ? Routing attacks
incorrect lookups/updates/partitions ? Storage
Retrieval attacks denial-of-service/data ? Other
misc. attacks inconsistent behavior, overload,
etc. Performance considerations ? No
consideration of underlying routing topology
(locality properties) ? No consideration of
underlying network traffic/congestion condition ?
Bound on lookups still not good enough for some
applications Utilize caching on search paths to
improve performance for popular DHT lookups !
Cache coherence problems? Application-Specific
considerations ? Each application requires its
own set of access functions in the DHT ? Lack of
sophisticated API for supporting such
applications ? E.g DHASH API is too basic to
support sophisticated functionality ? Support
only for DHT as library vs. as a service
38Related Work
- Hierarchical chord
- (Canon in G-Major)
- Ring vs. Hypercubes
39SkipNet A Scalable Overlay Network with
Practical Locality Properties
- Nick Harvey, Mike Jones,
- Stefan Saroiu, Marvin Theimer, Alec Wolman
- Microsoft Research
- University of Washington
- presentation from Stefan Saroiu
40Overlay Networks
- Overlays have achieved several goals
- Scalable and decentralized infrastructure
- Uniform and random load and data distribution
- But, at the price of data controllability
- Data may be stored far from its users
- Data may be stored outside its domain
- Local accesses leave local organization
- Basic trade-off data controllability vs. data
uniformity - SkipNet
- Traditional overlay functionality
- Provides an abstraction to control this
trade-off - Constrained load balancing (CLB)
41Talk Outline
- Practical data locality requirements
- Basic SkipNet design
- SkipNet locality properties
- Performance evaluation
- Conclusions
42Talk Outline
- Practical data locality requirements
- Basic SkipNet design
- SkipNet locality properties
- Performance evaluation
- Conclusions
43Key Locality Properties and Abstraction
- In practice, two properties are important
- Content Locality ability to explicitly place
data - Placement on a single node or on a set of nodes
- Path Locality ability to guarantee that local
traffic remains local - One abstraction is important CLB
- SkipNet abstraction to control the trade-off
- Multiple DHT scopes within one single overlay
44Practical Requirements
- Data Controllability
- Organizations want control over their own data
- Even if local data is globally available
- Manageability
- Data control allows for data administration,
provisioning and manageability - Data center/cluster constrained set of nodes
- CLB ensures load balance across data
center/cluster
45Practical Requirements (contd)
- Security
- Content and path locality are key building blocks
for dealing with certain external attacks - Data availability
- Local data survives network partitions
- Performance
- Data can be stored near clients that use it
46Talk Outline
- Practical data locality requirements
- Basic SkipNet design
- SkipNet locality properties
- Performance evaluation
- Conclusions
47SkipNet
- Key property two address spaces
- Name ID space nodes are sorted by their names
(e.g. DNS names) - Numeric ID space nodes are randomly distributed
- Combining both spaces achieves
- Content Path locality
- Other uses could emerge range queries AS 03
- Scalable peer-to-peer overlay network
- O(log N) routing performance in both spaces
- O(log N) routing state per node
48SkipNet Ring
- Pointers at level h skip over 2h nodes
- Nodes are ordered by names
49SkipNet Ring
- Pointers at level h skip over 2h nodes
- Nodes are ordered by names
50SkipNet Global View
L 3
L 2
L 1
M
D
O
Root Ring
Level L 0
T
A
V
Z
X
51SkipNet Global View
L 3
L 2
L 1
M
D
O
Root Ring
Level L 0
T
A
V
Z
X
52Two Address Spaces
- SkipNet can route efficiently in both address
spaces - Name ID space (e.g. DNS names)
- Numeric ID space
53Routing by Name ID
Ring 000
Ring 001
Ring 010
Ring 011
Ring 100
Ring 101
Ring 110
Ring 111
M
O
D
L 3
T
A
V
X
Z
M
D
Ring 01
Ring 00
Ring 10
Ring 11
A
T
L 2
O
V
Z
X
M
O
D
Ring 1
Ring 0
T
L 1
A
Z
V
X
M
D
O
Root Ring
T
A
Level L 0
V
Z
X
- Example route from A to V
- Simple Rule Forward the message to node that is
closest to dest, without going too far.
54Routing by Name ID
Ring 000
Ring 001
Ring 010
Ring 011
Ring 100
Ring 101
Ring 110
Ring 111
M
O
D
L 3
T
A
V
X
Z
M
D
Ring 01
Ring 00
Ring 10
Ring 11
A
T
L 2
O
V
Z
X
M
O
D
Ring 1
Ring 0
T
L 1
A
Z
V
X
M
D
O
Root Ring
T
A
Level L 0
V
Z
X
- Example route from A to V
- Simple Rule Forward the message to node that is
closest to dest, without going too far.
55Routing by Name ID
Ring 000
Ring 001
Ring 010
Ring 011
Ring 100
Ring 101
Ring 110
Ring 111
M
O
D
L 3
T
A
V
X
Z
M
D
Ring 01
Ring 00
Ring 10
Ring 11
A
T
L 2
O
V
Z
X
M
O
D
Ring 1
Ring 0
T
L 1
A
Z
V
X
M
D
O
Root Ring
T
A
Level L 0
V
Z
X
- Example route from A to V
- Simple Rule Forward the message to node that is
closest to dest, without going too far.
56Routing by Name ID
Ring 000
Ring 001
Ring 010
Ring 011
Ring 100
Ring 101
Ring 110
Ring 111
M
O
D
L 3
T
A
V
X
Z
M
D
Ring 01
Ring 00
Ring 10
Ring 11
A
T
L 2
O
V
Z
X
M
O
D
Ring 1
Ring 0
T
L 1
A
Z
V
X
M
D
O
Root Ring
T
A
Level L 0
V
Z
X
- Example route from A to V
- Simple Rule Forward the message to node that is
closest to dest, without going too far.
57Routing by Name ID
Ring 000
Ring 001
Ring 010
Ring 011
Ring 100
Ring 101
Ring 110
Ring 111
M
O
D
L 3
T
A
V
X
Z
M
D
Ring 01
Ring 00
Ring 10
Ring 11
A
T
L 2
O
V
Z
X
M
O
D
Ring 1
Ring 0
T
L 1
A
Z
V
X
M
D
O
Root Ring
T
A
Level L 0
V
Z
X
- Example route from A to V
- Simple Rule Forward the message to node that is
closest to dest, without going too far.
58Routing by Name ID
Ring 000
Ring 001
Ring 010
Ring 011
Ring 100
Ring 101
Ring 110
Ring 111
M
O
D
L 3
T
A
V
X
Z
M
D
Ring 01
Ring 00
Ring 10
Ring 11
A
T
L 2
O
V
Z
X
M
O
D
Ring 1
Ring 0
T
L 1
A
Z
V
X
M
D
O
Root Ring
T
A
Level L 0
V
Z
X
- Example route from A to V
- Simple Rule Forward the message to node that is
closest to dest, without going too far.
59Routing by Numeric ID
- Provides the basic DHT primitive
- To store file Foo.c
- Hash(Foo.c) ? a random numeric ID
- Find highest ring matching that numeric ID
- Store file on node in that ring
- Log N routing efficiency
60DHT Example
Ring 000
Ring 001
Ring 010
Ring 011
Ring 100
Ring 101
Ring 110
Ring 111
M
O
D
L 3
T
A
V
Z
X
M
D
Ring 01
Ring 00
Ring 10
Ring 11
A
T
L 2
O
V
Z
X
M
O
D
Ring 1
Ring 0
T
L 1
A
Z
V
X
M
D
O
Root Ring
T
A
Level L 0
V
Z
X
- Store file Foo.c from node A
- Hash(Foo.c) 101
- Route from A to V in numeric space
61Talk Outline
- Practical data locality requirements
- Basic SkipNet design
- SkipNet locality properties
- Performance evaluation
- Conclusions
62Constrained Load Balancing (CLB)
- Multiple DHTs with differing scopes using a
single SkipNet structure - A result of the ability to route in both address
spaces - Divide data object names into 2 partsusing the
! special character - CLB Domain
CLB Suffix - microsoft.com!skipnet.htm
l
63CLB Example
com.microsoft
com.sun
gov.irs
edu.ucb
- To read file com.microsoft!skipnet.html
- Route by name ID to com.microsoft
- Route by numeric ID to Hash(skipnet.html)within
the com.microsoft constraint
64SkipNet Path Locality
com.microsoft
com.sun
gov.irs
edu.ucb
- Organizations correspond to contiguous SkipNet
segments - Internal routing by NameID remains internal
- Nodes have left / right pointers
65Fault Tolerance
- Many failures occur along organizational
boundaries - Gateway/firewall failure, BGP misconfig, physical
network cut, - SkipNet handles organizational disconnect
gracefully - Results in two well-connected, partitioned
SkipNets - Efficient remerging algorithms
- Node independent failures
- Same resiliency as systems such as Chord and
Pastry - Similar approach to repair (Leaf Set)
66Primary Security Benefit Weakness
- SkipNet name access control mechanism
- Content locality ensures that content stays
within organization - Path locality prevents
- malicious forwarders
- analysis of internal traffic
- external tampering
- Easier to target organizations
- Someone creates one million nodes with name
prefixes microsofa.com and microsort.com - Most traffic to/from Microsoft will go through a
microsofa / microsort intermediate node
67Talk Outline
- Practical data locality requirements
- Basic SkipNet design
- SkipNet locality properties
- Performance evaluation
- Conclusions
68Methodology
- Packet-level, event-driven simulator
- SkipNet implementation
- Basic SkipNet
- Full SkipNet Basic SkipNet network proximity
- Pastry and Chord implementation
- Uses Mercator and GT-ITM network topologies
- Experimentally evaluated
- Name ID routing performance
- Tolerance to organizational disconnect
-
-
-
69Methodology
- Packet-level, event-driven simulator
- SkipNet implementation
- Basic SkipNet
- Full SkipNet Basic SkipNet network proximity
- Pastry and Chord implementation
- Uses Mercator and GT-ITM network topologies
- Experimentally evaluated
- Name ID routing performance
- Tolerance to organizational disconnect
- Numeric ID routing performance
- Effectiveness of network proximity optimizations
- Effectiveness of CLB routing optimizations
70Routing by Name ID Performance
Benefits come at no extra cost
71Surviving Organizational Disconnect
Disconnected Org Size 15 of all nodes
72Conclusions
- SkipNet
- Traditional overlay functionality
- Explicit control of data placement
- Constrained load balancing
- Content Path Locality are basic ingredients to
- Data controllability
- Manageability
- Security
- Data availability
- Performance
73Thoughts
- P-Table and C-Table approaches to speedup
routing seem hacky and unclear how is network
proximity captured? - Hierarchical DHTs
- Adapts to Physical Network, Efficient Caching,
Efficient Multicast, Exhibits same content
locality properties as SkipNet, Content and path
locality, Local administrative domains, Fault
Isolation
74P2P overlay networking research
- Quantitative evaluation on P2P overlay
application and Internet topology matching and
the scalability of P2P overlay applications by
the efficient use of the underlying physical
network resources. - Proximity
- Mapping the peers into coordinate-based space
- Reduce the stretch (ratio of overlay path to
underlying network path) routing metric based on
scalable and robust proximity calculations. - A mixed set of metrics which include delay,
throughput, available bandwidth and packet loss
would provide a more efficient global routing
optimization. - Application of P2P overlay networking models in
mobile and ad-hoc wireless network. - Similar features, such as self-organizing,
peer-to-peer routing, and resilient
communications - Would allow mobile peers to have optimized flow
control, load balancing mechanism,
proximity-aware and QoS routing