Exploiting Content Localities for Efficient Search in P2P Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Exploiting Content Localities for Efficient Search in P2P Systems

Description:

Exploiting Content Localities for Efficient Search in P2P Systems ... 1College of William and Mary, USA. 2Los Alamos ... Open source code of LimeWire Gnutella ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 30
Provided by: LAS3
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Exploiting Content Localities for Efficient Search in P2P Systems


1
Exploiting Content Localities for Efficient
Search in P2P Systems
  • Lei Guo1 Song Jiang2 Li Xiao3 and
  • Xiaodong Zhang1
  • 1College of William and Mary, USA
  • 2Los Alamos National Laboratory, USA
  • 3Michigan State University, USA

2
Peer-to-Peer Search
  • Two Performance Objectives
  • Individual peer improve the search quality
  • Internet management minimize the search cost

Fast, fast, fast, and the more the better!
P2P user
3
Existing Solutions
  • Generally aim to one of the two objectives and
    have performance limits to the other
  • Flooding
  • Most effective for users experience
  • Least efficient for network resource utilization
  • Random walk
  • Traffic efficient, but
  • Long response time and limited number of search
    results

4
Super-Node Architecture
  • Super-node
  • Index server for its leaf nodes
  • Problems
  • Index based search has limits
  • Hard for full-text search
  • Impossible for encrypted content search
  • Not responsible for the content quality of its
    leaf nodes
  • The structure becomes large and inefficient.
  • A leaf node has to connect to multiple
    super-nodes to avoid single point failure
  • Generating an increasingly large number of
    super-nodes

5
Gnutella Population in One Day (2003)
number of peers
number of super peers
One super node only connects to 3-4 peers in
average!
6
Outline
  • Our Measurement Study
  • CAC Constructing Content Abundant Cluster
  • SPIRP Selectively Prefetching Indices from
    Responding Peers
  • CAC-SPIRP Combining CAC and SPIRP
  • Performance Evaluation
  • Conclusion

7
Our Measurement Study
  • Existing measurement studies
  • A small percentage of popular files account for
    most shared storage and transmissions in P2P
    systems
  • A small amount of peers contribute majority
    number of files in P2P.
  • They are only the indirect evidence of content
    locality
  • Some files may be never accessed, or accessed
    rarely
  • Our purpose
  • Fully understand the localities in the peer
    community and individual peers
  • Get first-hand traces for our simulation study

8
Trace Collection
  • Four-day crawling on the Gnutella network
  • Open source code of LimeWire Gnutella
  • Session based collection (for the whole life time
    of peers)
  • Query sending traces by different peers
  • 25,764 peers
  • 409,129 queries
  • Content indices of different peers
  • Full indices of 18,255 peers
  • 37 free riders

9
Content Locality in the Peer Community
A small group of peers can reply nearly all
queries and provide most of results
10
The Localities of Search Interests of Individual
Peers
Result Contributions ()
Query Contributions ()
top 1 top 10 top 5 top 10 top 20
top 1 top 10 top 5 top 10 top 20
Top Query Responders
Top Result Providers
  • A peer can get search results from a small number
    of its top query responders they share the same
    search interests
  • Similar to the idea in Locality of Interest
    scheme, but our conclusion is based on real P2P
    systems

11
Reorganizing the P2P Management Structure
  • Clustering those small number of content abundant
    peers
  • Prefetching indices from those top query
    responders

12
CAC Constructing Content Abundant Cluster
  • Objectives
  • Clustering those small number of content abundant
    peers in P2P overlay
  • Providing high quality and fast service
  • Content Abundant Cluster
  • An overlay on top of P2P network
  • Self-evaluate, self-identify, and self-organize
  • Persistent public service for all peers in the
    system
  • Strong content-based (not index-based)

13
CAC System Structure
Clustering
Leveling
Dynamic Update
C A C
X
4
14
CAC Search Operations
  • Queries are sent to CAC first
  • Up-flowing operation
  • Flooding in CAC
  • Unsatisfied queries are propagated from CAC to
    the whole system
  • Down-flooding operation
  • Propagated from low levels to high levels

15
Up-flowing
C A C
4
16
Down-flooding
Unused links
C A C
4
17
SPIRP Selectively Prefetching Indices from
Responding Peers
  • Basic operations
  • Peer I initiates a query q
  • Query hits displays the results
  • Misses sends q
  • Peer R responds query q
  • sends query results as well as
  • piggybacks indices of all shared files
  • Peer I receives response
  • Display the searching results as well as
  • stores piggybacked indices
  • Indices updating
  • Active updating indices by responding peers
  • Updating indices demanded by requesting peers
  • Replacement of file indices

18
SPIRP Technique
Classic music
R1
I
Pop music
R2
Query Beethoven mp3
19
SPIRP Technique
classic
R1
I
pop
NULL
R2
Query Beetle mp3
20
SPIRP Technique
classic
R1
I
pop
R2
Query Beetle mp3
21
SPIRP Technique
classic
R1
No enough space to save indices
I
pop
R2
Query Beetle mp3
22
SPIRP Technique
classic
R1
Replace complete
I
pop
R2
Query Beetle mp3
23
CAC-SPIRP
  • CAC application level infrastructure
  • Significantly reducing bandwidth consumption
  • Good response time when queries success in CAC
  • Long response time when queries fail in CAC
  • SPIRP client-oriented and overlay independent
  • Significantly reducing response time
  • Small traffic when queries can be satisfied in
    cache
  • Same traffic as flooding when cache misses
  • CAC-SPIRP
  • Easy to combine the two techniques
  • Consider the trade-off between the two
    performance objectives
  • Has both merits of search quality and search cost

24
Simulation Environment
  • Content trace and query trace
  • 4 day Gnutella crawling in our measurement
  • Overlay topology
  • Traces by Clip2 Distributed Search Solutions
  • Session duration
  • Pareto distribution fitted from measurement
    results
  • P(x) 14.5311 x -1.8598

25
Evaluation Metrics
  • Query success rate
  • CAC success rate in CAC (normalized to flooding)
  • SPIRP success rate in local cache (normalized to
    flooding)
  • Overall network traffic
  • accumulated communication traffics for all
    queries, responses, and index transferring
    (normalized to flooding)
  • Average response time
  • use the number of routing hops (normalized to
    flooding)
  • Evaluate for different query satisfactions
  • 1, 10, 50 results, representing different user
    demands

26
Performance Evaluation for CAC
Overall Traffic (Normalized)
Success Rate in CAC (normalized)
Minimum Results 1 Minimum Results 10 Minimum
Results 50
Minimum Results 1 Minimum Results 10 Minimum
Results 50
Avg Response Time (Normalized)
5 top content abundant peers are good enough
for cluster construction
Minimum Results 1 Minimum Results 10 Minimum
Results 50
27
CAC Member Selection
Avg Response Time (Normalized)
Success Rate in CAC (normalized)
Minimum Results 1 Minimum Results 10 Minimum
Results 50
0 0.01 0.02 0.03
0.04
Success Response Rate of CAC Peers
Minimum Results 1 Minimum Results 10 Minimum
Results 50
Overall Traffic (Normalized)
Minimum Results 1 Minimum Results 10 Minimum
Results 50
0 0.01 0.02 0.03
0.04
Success Response Rate of Content-Abundant Peers
  • Overall traffic is not sensitive to CAC member
    quality
  • Traffic can be significantly reduced even for
  • randomly selected CAC members
  • CAC down flooding is very efficient

0 0.01 0.02 0.03
0.04
Success response rate of CAC Peers
28
CAC-SPIRP Overall Performance
1
Success Rate in Local Cache
2
Average Response Time (Normalized)
0.8
1.6
0.6
0.4
1.2
0.2
0.8
0
0.4
Overall Traffic (Normalized)
1
0.8
0
0 2 4 6
8 10
0.6
Size of Incoming Index Set Buffer (in M Bytes)
0.4
CAC-SPIRP reduces both the overall traffic and
response time significantly
0.2
0
29
Conclusion
  • CAC-SPIRP fundamentally addresses the P2P search
    problem by a re-organization.
  • Exploiting organizational content locality
  • CAC a content abundant cluster provides high
    quality and fast services.
  • Exploiting user content locality
  • SPIRP a client prefetching technique to speed up
    search by avoiding unnecessary queries
Write a Comment
User Comments (0)
About PowerShow.com