Search and Replication in Unstructured Peer-to-Peer Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Search and Replication in Unstructured Peer-to-Peer Networks

Description:

Peers are connected by an overlay network. Users cooperate to share files (e.g., music, videos, etc. ... Use of central directory server (CDS) ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 29
Provided by: antoniade
Category:

less

Transcript and Presenter's Notes

Title: Search and Replication in Unstructured Peer-to-Peer Networks


1
Search and Replication in Unstructured
Peer-to-Peer Networks
  • Pei Cao, Christine Lv., Edith Cohen, Kai Li and
    Scott Shenker
  • ICS 2002

2
Outline
  • Brief survey of P2P architectures
  • Evaluation Methodology
  • Search Methods
  • Replication
  • Conclusions

3
Peer-to-Peer Networks
  • Peers are connected by an overlay network.
  • Users cooperate to share files (e.g., music,
    videos, etc.)
  • Dynamic nodes join or leave frequently

4
P2P Network Architectures I
  • Centralized
  • Use of central directory server (CDS)
  • Peers query to the CSD to find other peers that
    hold the desired object
  • Pros very efficient
  • Cons poorly scales
  • single point of failure

5
P2P Network Architectures II
  • Decentralized No central directory server
  • But structured
  • P2P network topology is tightly controlled
  • Files are placed at specified locations
  • Unstructured
  • No control in Network topology or file placement

6
P2P Network Architectures III
  • Decentralized but Structured
  • loose structured
  • Placement of files is based on hints
  • tight structure
  • Precisely declare
  • structure of P2P network and
  • file placement
  • Use of distributed hash table
  • Pros Efficient satisfaction of queries
  • Good scaling
  • Cons No proof it works

7
P2P Network Architectures IV
  • Decentralized and Unstructured
  • Placement of files not based on topology
    knowledge
  • Finding files
  • Node queries neighbors (usually using flooding)
  • Pros extremely resilient to network changes
  • Cons extremely unscalable
  • generates large loads

8
Evaluation Methodology I
  • Terminology
  • Network Topology
  • instant graph formed by nodes in the network
  • Query Distribution
  • frequency of lookups to files
  • Replication Distribution
  • percentage of nodes that have a particular file

9
Evaluation Methodology II
  • Network Topologies
  • Powel-Law Random Graph (PLRG)
  • Max node degree 1746, median 1 average 4.46
  • Normal Random Graph (Random)
  • Average and median node degree is 4
  • Gnutella graph (Gnutella)
  • Oct 2000 snapshot
  • Max degree 136, median 2, average 5.5
  • Two-dimensional Grid
  • 100x100 ? 10000 nodes

10
Evaluation Methodology III
  • Object query distribution qi
  • Uniform
  • Zipf-like
  • Object replication density distribution ri
  • Uniform
  • Proportional ri ? qi
  • Square-Root ri ? ? qi

11
Evaluation Methodology IV
  • Metrics
  • User aspects
  • Pr(success)
  • hops
  • Load aspects
  • Average messages per node
  • nodes visited
  • Peak messages

12
Limitation of Flooding I
  • Gnutella uses TTL to check hops queries travel
  • Problem
  • Hard to choose TTL
  • For objects that are widely present in the
    network, small TTLs suffice
  • For objects that are rare in the network, large
    TTLs are necessary
  • Number of query messages grow exponentially as
    TTL grows

13
Limitation of Flooding II
  • Node may receive the same messages more than once
  • Need for duplication detection mechanisms
  • Still duplication increases as TTL increases in
    flooding

14
Limitation of Flooding Conclusion
  • Flooding increases per-node overhead
  • Need for more scalable search methods
  • Expanding Ring
  • Random Walks

15
Expanding Ring
  • Adaptively Adjust TTL
  • Multiple floods start with TTL1 increment TTL
    by 2 each time until search succeeds

Still have duplicate messages
16
Random Walk
  • Simple random walk
  • Takes too long to find anything
  • Multiple-walker random walk
  • K walkers after each walking T steps visits as
    many nodes as 1 walker walking KT steps
  • More messages ? more overhead
  • When to terminate the search
  • TTL
  • Checking check back with query originator once
    every C steps

17
Search Traffic Comparison
18
Search Delay Comparison
19
Lessons Learned about Search Methods
  • Key Cover the right number of nodes as quickly
    as possible and with as little overhead as
    possible
  • Pay Attention to
  • Adaptive termination
  • Minimize message duplication
  • Small expansion in each step

20
Replication
  • In unstructured P2P systems, search success is
    essentially about coverage visiting enough nodes
    to find the object gt replication density matters
  • Goal minimize average search size (number of
    probes till query is satisfied)
  • Theoretical Optimal copy everything everywhere
  • Limited node storage

21
Replication Strategies
  • Uniform Replication
  • pi 1/m
  • Simple, resources are divided equally
  • Proportional Replication
  • pi qi
  • Fair, resources per item proportional to demand
  • Reflects current P2P practices

22
Square-Root Replication
  • pi is proportional to square-root(qi)
  • Lies In-between Uniform and Proportional

23
Achieving Square-Root Replication I
  • Assuming that each query keeps track the number
    of probes needed
  • Store an object at a number of nodes that is
    proportional to the number of probes
  • Two implementations
  • Path replication store the object along the path
    of a successful walk
  • Random replication store the object randomly
    among nodes visited by the agents

24
Achieving Square-Root Replication II
25
Evaluation of Replication Methods I
  • Metrics
  • Overall message traffic
  • Search delay
  • Dynamic simulation
  • Assume Zipf-like object query probability
  • 5 query/sec Poisson arrival
  • Results are during 5000sec-9000sec
  • Search method 32-walkers random walk with state
    keeping and check every 4 steps

26
Evaluation of Replication Methods II
Square-Root Replication reduces search traffic
27
Evaluation of Replication Methods III
28
Conclusions
  • Multi-walker random walk scales much better than
    flooding
  • Can find data more quickly
  • Reduces the traffic overload
  • Square-root replication distribution is desirable
  • Minimizes search delay
  • Minimizes the overall search traffic
Write a Comment
User Comments (0)
About PowerShow.com