Search and Replication in Unstructured Peer-to-Peer Networks presentation

About This Presentation

Transcript and Presenter's Notes

Title: Search and Replication in Unstructured Peer-to-Peer Networks

1
Search and Replication in Unstructured
Peer-to-Peer Networks

Pei Cao
Cisco Systems, Inc.
(Joint work with Christine Lv, Edith Cohen, Kai
Li and Scott Shenker)

2
Disclaimer

Results, statements, opinions in this talk do not
represent Cisco in anyway
This talk is about technical problems in
networking, and does not discuss moral, legal and
other issues related to P2P networks and their
applications

3
Outline

Brief survey of P2P architectures
Evaluation methodologies
Search methods
Replication strategies and analysis
Simulation results

4
Characteristics of Peer-to-Peer Networks

Unregulated overlay network
Current application file swapping
Dynamic nodes join or leave frequently
Example systems
Napster, Gnutella
Freenet, FreeHaven, MajoNation, Alpine, ...
JXTA, Ohaha,
Chord, CAN, Past, Tapestry, Oceanstore

5
Architecture Comparisons

Napster centralized
A central website to hold file directory of all
participants Very efficient
Scales
Problem Single point of failure
Gnutella decentralized
No central directory use flooding w/ TTL
Very resilient against failure
Problem Doesnt scale

6
Architecture Comparisons

Various research projects such as CAN
decentralized, but structured
CAN distributed hash table
Structure all nodes participate in a precise
scheme to maintain certain invariants
Extra work when nodes join and leave
Scales very well, but can be fragile

7
Architecture Comparisons

FreeNet decentralized, but semi-structured
Intended for file storage
Files are stored along a route biased by hints
Queries for files follow a route biased by the
same hints
Scales very well
Problem would it really work?
Simulation says yes in most cases, but no proof
so far

8
Our Focus Gnutella-Style Systems

Advantages of Gnutella
Support more flexible queries
Typically, precise name search is a small
portion of all queries
Simplicity, high resilience against node failures
Problems of Gnutella Scalability
Bottleneck interrupt rates on individual nodes
Self-limiting network nodes have to exit to get
real work done!

9
Evaluation Methodologies

Simulation based
Network topology
Distribution of object popularity
Distribution of replication density of objects

10
Evaluation Methods

Network topologies
Uniform Random Graph (Random)
Average and median node degree is 4
Power-Law Random Graph (PLRG)
max node degree 1746, median 1, average 4.46
Gnutella network snapshot (Gnutella)
Oct 2000 snapshot
max degree 136, median 2, average 5.5
Two-dimensional grid (Grid)

11
Modeling Methods

Object popularity distribution pi
Uniform
Zipf-like
Object replication density distribution ri
Uniform
Proportional ri ? pi
Square-Root ri ? ? pi

12
Evaluation Metrics

Overhead average of messages per node per
query
Probability of search success Pr(success)
Delay of hops till success

13
Load on Individual Nodes

Why is a node interrupted
To process a query
To route the query to other nodes
To process duplicated queries sent to it

14
Duplication in Flooding-Based Searches
1
3
2
4
6
5
7
8
. . . . . . . . . . . .

Duplication increases as TTL increases in
flooding
Worst case a node A is interrrupted by N q
degree(A) messages

15
Duplications in Various Network Topologies
16
Relationship between TTL and Search Successes
17
Problems with Simple TTL-Based Flooding

Hard to choose TTL
For objects that are widely present in the
network, small TTLs suffice
For objects that are rare in the network, large
TTLs are necessary
Number of query messages grow exponentially as
TTL grows

18
Idea 1 Adaptively Adjust TTL

Expanding Ring
Multiple floods start with TTL1 increment TTL
by 2 each time until search succeeds
Success varies by network topology
For Random, 30- to 70- fold reduction in
message traffic
For Power-law and Gnutella graphs, only
3- to 9- fold reduction

19
Limitations of Expanding Ring
20
Idea 2 Random Walk

Simple random walk
takes too long to find anything!
Multiple-walker random walk
N agents after each walking T steps visits as
many nodes as 1 agent walking NT steps
When to terminate the search check back with the
query originator once every C steps

21
Search Traffic Comparison
22
Search Delay Comparison
23
Lessons Learnt about Search Methods

Adaptive termination
Minimize message duplication
Small expansion in each step

24
Flexible Replication

In unstructured systems, search success is
essentially about coverage visiting enough nodes
to probabilistically find the object gt
replication density matters
Limited node storage gt whats the optimal
replication density distribution?
In Gnutella, only nodes who query an object store
it gt ri ? pi
What if we have different replication strategies?

25
Optimal ri Distribution

Goal minimize ?( pi/ ri ), where ? ri R
Calculation
introduce Lagrange multiplier ?, find ri and ?
that minimize
?( pi/ ri ) ? (? ri - R)
gt ? - pi/ ri2 0 for all i
gt ri ? ? pi

26
Square-Root Distribution

General principle to minimize ?( pi/ ri ) under
constraint ? ri R, make ri propotional to
square root of pi
Other application examples
Bandwidth allocation to minimize expected
download times
Server load balancing to minimize expected
request latency

27
Achieving Square-Root Distribution

Suggestions from some heuristics
Store an object at a number of nodes that is
proportional to the number of node visited in
order to find the object
Each node uses random replacement
Two implementations
Path replication store the object along the path
of a successful walk
Random replication store the object randomly
among nodes visited by the agents

28
Evaluation of Replication Methods

Metrics
Overall message traffic
Search delay
Dynamic simulation
Assume Zipf-like object query probability
5 query/sec Poisson arrival
Results are during 5000sec-9000sec

29
Distribution of ri
30
Total Search Message Comparison

Observation path replication is slightly
inferior to random replication

31
Search Delay Comparison
32
Summary

Multi-walker random walk scales much better than
flooding
It wont scale as perfectly as structured
network, but current unstructured network can be
improved significantly
Square-root replication distribution is desirable
and can be achieved via path replication

Write a Comment

User Comments (0)

About PowerShow.com

Search and Replication in Unstructured Peer-to-Peer Networks PowerPoint PPT Presentation