Search and Replication in Unstructured Peer-to-Peer Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Search and Replication in Unstructured Peer-to-Peer Networks

Description:

Multiple-walker random walk ... Path replication: store the object along the path of a successful 'walk' ... Multi-walker random walk scales much better than flooding ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 33
Provided by: Pei4
Category:

less

Transcript and Presenter's Notes

Title: Search and Replication in Unstructured Peer-to-Peer Networks


1
Search and Replication in Unstructured
Peer-to-Peer Networks
  • Pei Cao
  • Cisco Systems, Inc.
  • (Joint work with Christine Lv, Edith Cohen, Kai
    Li and Scott Shenker)

2
Disclaimer
  • Results, statements, opinions in this talk do not
    represent Cisco in anyway
  • This talk is about technical problems in
    networking, and does not discuss moral, legal and
    other issues related to P2P networks and their
    applications

3
Outline
  • Brief survey of P2P architectures
  • Evaluation methodologies
  • Search methods
  • Replication strategies and analysis
  • Simulation results

4
Characteristics of Peer-to-Peer Networks
  • Unregulated overlay network
  • Current application file swapping
  • Dynamic nodes join or leave frequently
  • Example systems
  • Napster, Gnutella
  • Freenet, FreeHaven, MajoNation, Alpine, ...
  • JXTA, Ohaha,
  • Chord, CAN, Past, Tapestry, Oceanstore

5
Architecture Comparisons
  • Napster centralized
  • A central website to hold file directory of all
    participants Very efficient
  • Scales
  • Problem Single point of failure
  • Gnutella decentralized
  • No central directory use flooding w/ TTL
  • Very resilient against failure
  • Problem Doesnt scale

6
Architecture Comparisons
  • Various research projects such as CAN
    decentralized, but structured
  • CAN distributed hash table
  • Structure all nodes participate in a precise
    scheme to maintain certain invariants
  • Extra work when nodes join and leave
  • Scales very well, but can be fragile

7
Architecture Comparisons
  • FreeNet decentralized, but semi-structured
  • Intended for file storage
  • Files are stored along a route biased by hints
  • Queries for files follow a route biased by the
    same hints
  • Scales very well
  • Problem would it really work?
  • Simulation says yes in most cases, but no proof
    so far

8
Our Focus Gnutella-Style Systems
  • Advantages of Gnutella
  • Support more flexible queries
  • Typically, precise name search is a small
    portion of all queries
  • Simplicity, high resilience against node failures
  • Problems of Gnutella Scalability
  • Bottleneck interrupt rates on individual nodes
  • Self-limiting network nodes have to exit to get
    real work done!

9
Evaluation Methodologies
  • Simulation based
  • Network topology
  • Distribution of object popularity
  • Distribution of replication density of objects

10
Evaluation Methods
  • Network topologies
  • Uniform Random Graph (Random)
  • Average and median node degree is 4
  • Power-Law Random Graph (PLRG)
  • max node degree 1746, median 1, average 4.46
  • Gnutella network snapshot (Gnutella)
  • Oct 2000 snapshot
  • max degree 136, median 2, average 5.5
  • Two-dimensional grid (Grid)

11
Modeling Methods
  • Object popularity distribution pi
  • Uniform
  • Zipf-like
  • Object replication density distribution ri
  • Uniform
  • Proportional ri ? pi
  • Square-Root ri ? ? pi

12
Evaluation Metrics
  • Overhead average of messages per node per
    query
  • Probability of search success Pr(success)
  • Delay of hops till success

13
Load on Individual Nodes
  • Why is a node interrupted
  • To process a query
  • To route the query to other nodes
  • To process duplicated queries sent to it

14
Duplication in Flooding-Based Searches
1
3
2
4
6
5
7
8
. . . . . . . . . . . .
  • Duplication increases as TTL increases in
    flooding
  • Worst case a node A is interrrupted by N q
    degree(A) messages

15
Duplications in Various Network Topologies
16
Relationship between TTL and Search Successes
17
Problems with Simple TTL-Based Flooding
  • Hard to choose TTL
  • For objects that are widely present in the
    network, small TTLs suffice
  • For objects that are rare in the network, large
    TTLs are necessary
  • Number of query messages grow exponentially as
    TTL grows

18
Idea 1 Adaptively Adjust TTL
  • Expanding Ring
  • Multiple floods start with TTL1 increment TTL
    by 2 each time until search succeeds
  • Success varies by network topology
  • For Random, 30- to 70- fold reduction in
    message traffic
  • For Power-law and Gnutella graphs, only
  • 3- to 9- fold reduction

19
Limitations of Expanding Ring
20
Idea 2 Random Walk
  • Simple random walk
  • takes too long to find anything!
  • Multiple-walker random walk
  • N agents after each walking T steps visits as
    many nodes as 1 agent walking NT steps
  • When to terminate the search check back with the
    query originator once every C steps

21
Search Traffic Comparison
22
Search Delay Comparison
23
Lessons Learnt about Search Methods
  • Adaptive termination
  • Minimize message duplication
  • Small expansion in each step

24
Flexible Replication
  • In unstructured systems, search success is
    essentially about coverage visiting enough nodes
    to probabilistically find the object gt
    replication density matters
  • Limited node storage gt whats the optimal
    replication density distribution?
  • In Gnutella, only nodes who query an object store
    it gt ri ? pi
  • What if we have different replication strategies?

25
Optimal ri Distribution
  • Goal minimize ?( pi/ ri ), where ? ri R
  • Calculation
  • introduce Lagrange multiplier ?, find ri and ?
    that minimize
  • ?( pi/ ri ) ? (? ri - R)
  • gt ? - pi/ ri2 0 for all i
  • gt ri ? ? pi

26
Square-Root Distribution
  • General principle to minimize ?( pi/ ri ) under
    constraint ? ri R, make ri propotional to
    square root of pi
  • Other application examples
  • Bandwidth allocation to minimize expected
    download times
  • Server load balancing to minimize expected
    request latency

27
Achieving Square-Root Distribution
  • Suggestions from some heuristics
  • Store an object at a number of nodes that is
    proportional to the number of node visited in
    order to find the object
  • Each node uses random replacement
  • Two implementations
  • Path replication store the object along the path
    of a successful walk
  • Random replication store the object randomly
    among nodes visited by the agents

28
Evaluation of Replication Methods
  • Metrics
  • Overall message traffic
  • Search delay
  • Dynamic simulation
  • Assume Zipf-like object query probability
  • 5 query/sec Poisson arrival
  • Results are during 5000sec-9000sec

29
Distribution of ri
30
Total Search Message Comparison
  • Observation path replication is slightly
    inferior to random replication

31
Search Delay Comparison
32
Summary
  • Multi-walker random walk scales much better than
    flooding
  • It wont scale as perfectly as structured
    network, but current unstructured network can be
    improved significantly
  • Square-root replication distribution is desirable
    and can be achieved via path replication
Write a Comment
User Comments (0)
About PowerShow.com