Exploiting Content Localities for Efficient Search in P2P Systems

About This Presentation

Title:

Exploiting Content Localities for Efficient Search in P2P Systems

Description:

Exploiting Content Localities for Efficient Search in P2P Systems ... 1College of William and Mary, USA. 2Los Alamos ... Open source code of LimeWire Gnutella ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 30

Provided by: LAS3

Learn more at: http://www.cse.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Exploiting Content Localities for Efficient Search in P2P Systems

1
Exploiting Content Localities for Efficient
Search in P2P Systems

Lei Guo1 Song Jiang2 Li Xiao3 and
Xiaodong Zhang1
1College of William and Mary, USA
2Los Alamos National Laboratory, USA
3Michigan State University, USA

2
Peer-to-Peer Search

Two Performance Objectives
Individual peer improve the search quality
Internet management minimize the search cost

Fast, fast, fast, and the more the better!
P2P user
3
Existing Solutions

Generally aim to one of the two objectives and
have performance limits to the other
Flooding
Most effective for users experience
Least efficient for network resource utilization
Random walk
Traffic efficient, but
Long response time and limited number of search
results

4
Super-Node Architecture

Super-node
Index server for its leaf nodes
Problems
Index based search has limits
Hard for full-text search
Impossible for encrypted content search
Not responsible for the content quality of its
leaf nodes
The structure becomes large and inefficient.
A leaf node has to connect to multiple
super-nodes to avoid single point failure
Generating an increasingly large number of
super-nodes

5
Gnutella Population in One Day (2003)
number of peers
number of super peers
One super node only connects to 3-4 peers in
average!
6
Outline

Our Measurement Study
CAC Constructing Content Abundant Cluster
SPIRP Selectively Prefetching Indices from
Responding Peers
CAC-SPIRP Combining CAC and SPIRP
Performance Evaluation
Conclusion

7
Our Measurement Study

Existing measurement studies
A small percentage of popular files account for
most shared storage and transmissions in P2P
systems
A small amount of peers contribute majority
number of files in P2P.
They are only the indirect evidence of content
locality
Some files may be never accessed, or accessed
rarely
Our purpose
Fully understand the localities in the peer
community and individual peers
Get first-hand traces for our simulation study

8
Trace Collection

Four-day crawling on the Gnutella network
Open source code of LimeWire Gnutella
Session based collection (for the whole life time
of peers)
Query sending traces by different peers
25,764 peers
409,129 queries
Content indices of different peers
Full indices of 18,255 peers
37 free riders

9
Content Locality in the Peer Community
A small group of peers can reply nearly all
queries and provide most of results
10
The Localities of Search Interests of Individual
Peers
Result Contributions ()
Query Contributions ()
top 1 top 10 top 5 top 10 top 20
top 1 top 10 top 5 top 10 top 20
Top Query Responders
Top Result Providers

A peer can get search results from a small number
of its top query responders they share the same
search interests
Similar to the idea in Locality of Interest
scheme, but our conclusion is based on real P2P
systems

11
Reorganizing the P2P Management Structure

Clustering those small number of content abundant
peers
Prefetching indices from those top query
responders

12
CAC Constructing Content Abundant Cluster

Objectives
Clustering those small number of content abundant
peers in P2P overlay
Providing high quality and fast service
Content Abundant Cluster
An overlay on top of P2P network
Self-evaluate, self-identify, and self-organize
Persistent public service for all peers in the
system
Strong content-based (not index-based)

13
CAC System Structure
Clustering
Leveling
Dynamic Update
C A C
X
4
14
CAC Search Operations

Queries are sent to CAC first
Up-flowing operation
Flooding in CAC
Unsatisfied queries are propagated from CAC to
the whole system
Down-flooding operation
Propagated from low levels to high levels

15
Up-flowing
C A C
4
16
Down-flooding
Unused links
C A C
4
17
SPIRP Selectively Prefetching Indices from
Responding Peers

Basic operations
Peer I initiates a query q
Query hits displays the results
Misses sends q
Peer R responds query q
sends query results as well as
piggybacks indices of all shared files
Peer I receives response
Display the searching results as well as
stores piggybacked indices
Indices updating
Active updating indices by responding peers
Updating indices demanded by requesting peers
Replacement of file indices

18
SPIRP Technique
Classic music
R1
I
Pop music
R2
Query Beethoven mp3
19
SPIRP Technique
classic
R1
I
pop
NULL
R2
Query Beetle mp3
20
SPIRP Technique
classic
R1
I
pop
R2
Query Beetle mp3
21
SPIRP Technique
classic
R1
No enough space to save indices
I
pop
R2
Query Beetle mp3
22
SPIRP Technique
classic
R1
Replace complete
I
pop
R2
Query Beetle mp3
23
CAC-SPIRP

CAC application level infrastructure
Significantly reducing bandwidth consumption
Good response time when queries success in CAC
Long response time when queries fail in CAC
SPIRP client-oriented and overlay independent
Significantly reducing response time
Small traffic when queries can be satisfied in
cache
Same traffic as flooding when cache misses
CAC-SPIRP
Easy to combine the two techniques
Consider the trade-off between the two
performance objectives
Has both merits of search quality and search cost

24
Simulation Environment

Content trace and query trace
4 day Gnutella crawling in our measurement
Overlay topology
Traces by Clip2 Distributed Search Solutions
Session duration
Pareto distribution fitted from measurement
results
P(x) 14.5311 x -1.8598

25
Evaluation Metrics

Query success rate
CAC success rate in CAC (normalized to flooding)
SPIRP success rate in local cache (normalized to
flooding)
Overall network traffic
accumulated communication traffics for all
queries, responses, and index transferring
(normalized to flooding)
Average response time
use the number of routing hops (normalized to
flooding)
Evaluate for different query satisfactions
1, 10, 50 results, representing different user
demands

26
Performance Evaluation for CAC
Overall Traffic (Normalized)
Success Rate in CAC (normalized)
Minimum Results 1 Minimum Results 10 Minimum
Results 50
Minimum Results 1 Minimum Results 10 Minimum
Results 50
Avg Response Time (Normalized)
5 top content abundant peers are good enough
for cluster construction
Minimum Results 1 Minimum Results 10 Minimum
Results 50
27
CAC Member Selection
Avg Response Time (Normalized)
Success Rate in CAC (normalized)
Minimum Results 1 Minimum Results 10 Minimum
Results 50
0 0.01 0.02 0.03
0.04
Success Response Rate of CAC Peers
Minimum Results 1 Minimum Results 10 Minimum
Results 50
Overall Traffic (Normalized)
Minimum Results 1 Minimum Results 10 Minimum
Results 50
0 0.01 0.02 0.03
0.04
Success Response Rate of Content-Abundant Peers

Overall traffic is not sensitive to CAC member
quality
Traffic can be significantly reduced even for
randomly selected CAC members
CAC down flooding is very efficient

0 0.01 0.02 0.03
0.04
Success response rate of CAC Peers
28
CAC-SPIRP Overall Performance
1
Success Rate in Local Cache
2
Average Response Time (Normalized)
0.8
1.6
0.6
0.4
1.2
0.2
0.8
0
0.4
Overall Traffic (Normalized)
1
0.8
0
0 2 4 6
8 10
0.6
Size of Incoming Index Set Buffer (in M Bytes)
0.4
CAC-SPIRP reduces both the overall traffic and
response time significantly
0.2
0
29
Conclusion

CAC-SPIRP fundamentally addresses the P2P search
problem by a re-organization.
Exploiting organizational content locality
CAC a content abundant cluster provides high
quality and fast services.
Exploiting user content locality
SPIRP a client prefetching technique to speed up
search by avoiding unnecessary queries

Write a Comment

User Comments (0)

About PowerShow.com

Exploiting Content Localities for Efficient Search in P2P Systems - PowerPoint PPT Presentation

Exploiting Content Localities for Efficient Search in P2P Systems

Exploiting Content Localities for Efficient Search in P2P Systems ... 1College of William and Mary, USA. 2Los Alamos ... Open source code of LimeWire Gnutella ... – PowerPoint PPT presentation