Title: Search and Replication in Unstructured Peer-to-Peer Networks
1Search and Replication in Unstructured
Peer-to-Peer Networks
- Pei Cao
- Cisco Systems, Inc.
- (Joint work with Christine Lv, Edith Cohen, Kai
Li and Scott Shenker)
2Disclaimer
- Results, statements, opinions in this talk do not
represent Cisco in anyway - This talk is about technical problems in
networking, and does not discuss moral, legal and
other issues related to P2P networks and their
applications
3Outline
- Brief survey of P2P architectures
- Evaluation methodologies
- Search methods
- Replication strategies and analysis
- Simulation results
4Characteristics of Peer-to-Peer Networks
- Unregulated overlay network
- Current application file swapping
- Dynamic nodes join or leave frequently
- Example systems
- Napster, Gnutella
- Freenet, FreeHaven, MajoNation, Alpine, ...
- JXTA, Ohaha,
- Chord, CAN, Past, Tapestry, Oceanstore
5Architecture Comparisons
- Napster centralized
- A central website to hold file directory of all
participants Very efficient - Scales
- Problem Single point of failure
- Gnutella decentralized
- No central directory use flooding w/ TTL
- Very resilient against failure
- Problem Doesnt scale
6Architecture Comparisons
- Various research projects such as CAN
decentralized, but structured - CAN distributed hash table
- Structure all nodes participate in a precise
scheme to maintain certain invariants - Extra work when nodes join and leave
- Scales very well, but can be fragile
7Architecture Comparisons
- FreeNet decentralized, but semi-structured
- Intended for file storage
- Files are stored along a route biased by hints
- Queries for files follow a route biased by the
same hints - Scales very well
- Problem would it really work?
- Simulation says yes in most cases, but no proof
so far
8Our Focus Gnutella-Style Systems
- Advantages of Gnutella
- Support more flexible queries
- Typically, precise name search is a small
portion of all queries - Simplicity, high resilience against node failures
- Problems of Gnutella Scalability
- Bottleneck interrupt rates on individual nodes
- Self-limiting network nodes have to exit to get
real work done!
9Evaluation Methodologies
- Simulation based
- Network topology
- Distribution of object popularity
- Distribution of replication density of objects
10Evaluation Methods
- Network topologies
- Uniform Random Graph (Random)
- Average and median node degree is 4
- Power-Law Random Graph (PLRG)
- max node degree 1746, median 1, average 4.46
- Gnutella network snapshot (Gnutella)
- Oct 2000 snapshot
- max degree 136, median 2, average 5.5
- Two-dimensional grid (Grid)
11Modeling Methods
- Object popularity distribution pi
- Uniform
- Zipf-like
- Object replication density distribution ri
- Uniform
- Proportional ri ? pi
- Square-Root ri ? ? pi
12Evaluation Metrics
- Overhead average of messages per node per
query - Probability of search success Pr(success)
- Delay of hops till success
13Load on Individual Nodes
- Why is a node interrupted
- To process a query
- To route the query to other nodes
- To process duplicated queries sent to it
14Duplication in Flooding-Based Searches
1
3
2
4
6
5
7
8
. . . . . . . . . . . .
- Duplication increases as TTL increases in
flooding - Worst case a node A is interrrupted by N q
degree(A) messages
15Duplications in Various Network Topologies
16Relationship between TTL and Search Successes
17Problems with Simple TTL-Based Flooding
- Hard to choose TTL
- For objects that are widely present in the
network, small TTLs suffice - For objects that are rare in the network, large
TTLs are necessary - Number of query messages grow exponentially as
TTL grows
18Idea 1 Adaptively Adjust TTL
- Expanding Ring
- Multiple floods start with TTL1 increment TTL
by 2 each time until search succeeds - Success varies by network topology
- For Random, 30- to 70- fold reduction in
message traffic - For Power-law and Gnutella graphs, only
- 3- to 9- fold reduction
19Limitations of Expanding Ring
20Idea 2 Random Walk
- Simple random walk
- takes too long to find anything!
- Multiple-walker random walk
- N agents after each walking T steps visits as
many nodes as 1 agent walking NT steps - When to terminate the search check back with the
query originator once every C steps
21Search Traffic Comparison
22Search Delay Comparison
23Lessons Learnt about Search Methods
- Adaptive termination
- Minimize message duplication
- Small expansion in each step
24Flexible Replication
- In unstructured systems, search success is
essentially about coverage visiting enough nodes
to probabilistically find the object gt
replication density matters - Limited node storage gt whats the optimal
replication density distribution? - In Gnutella, only nodes who query an object store
it gt ri ? pi - What if we have different replication strategies?
25Optimal ri Distribution
- Goal minimize ?( pi/ ri ), where ? ri R
- Calculation
- introduce Lagrange multiplier ?, find ri and ?
that minimize - ?( pi/ ri ) ? (? ri - R)
- gt ? - pi/ ri2 0 for all i
- gt ri ? ? pi
26Square-Root Distribution
- General principle to minimize ?( pi/ ri ) under
constraint ? ri R, make ri propotional to
square root of pi - Other application examples
- Bandwidth allocation to minimize expected
download times - Server load balancing to minimize expected
request latency
27Achieving Square-Root Distribution
- Suggestions from some heuristics
- Store an object at a number of nodes that is
proportional to the number of node visited in
order to find the object - Each node uses random replacement
- Two implementations
- Path replication store the object along the path
of a successful walk - Random replication store the object randomly
among nodes visited by the agents
28Evaluation of Replication Methods
- Metrics
- Overall message traffic
- Search delay
- Dynamic simulation
- Assume Zipf-like object query probability
- 5 query/sec Poisson arrival
- Results are during 5000sec-9000sec
29Distribution of ri
30Total Search Message Comparison
- Observation path replication is slightly
inferior to random replication
31Search Delay Comparison
32Summary
- Multi-walker random walk scales much better than
flooding - It wont scale as perfectly as structured
network, but current unstructured network can be
improved significantly - Square-root replication distribution is desirable
and can be achieved via path replication