A Local Search Mechanism for PeertoPeer Networks

About This Presentation

Title:

A Local Search Mechanism for PeertoPeer Networks

Description:

Vana Kalogeraki, Dimitrios Gunopulos & Demetris Zeinalipour (University ... e.g. Limewire's Ultrapeers. Centralized Index. 1) Upload Index. 2) Query/QueryHit ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 27

Provided by: demetriosz

Learn more at: http://alumni.cs.ucr.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Local Search Mechanism for PeertoPeer Networks

1
A Local Search Mechanism for Peer-to-Peer
Networks

Vana Kalogeraki, Dimitrios Gunopulos
Demetris Zeinalipour (University of California
Riverside)
csyiazti_at_cs.ucr.edu

CIKM 2002 Eleventh International Conference on
Information and Knowledge Management
November 4-9, Mclean VA
http//www.cs.ucr.edu/csyiazti/publications.html
2
Presentation Outline

Introduction Information Retrieval (I.R) in
Peer-to-Peer networks.
Techniques for Distributed I.R.
Breadth-First Search.
Random Breadth-First Search.
Intelligent Search with profiling.
Experimental Evaluation.
Related Work.
Conclusions Future Work.

3
Introduction to Peer-to-Peer

Peer-to-Peer Computing definition
Sharing of computer resources and information
through direct exchange

Clients (downloaders) are also servers
Clients may join or leave the network at any time
highly fault-tolerant but with a cost!
Searches are done within the virtual network
while actual downloads are done offline (with
HTTP).

4
Introduction to Peer-to-Peer

Peer-to-Peer (P2P) systems are increasingly
becoming popular.
P2P file-sharing systems, such as Gnutella,
Napster and Freenet realized a distributed
infrastructure for sharing files.
Traditionally, files were shared using the
Client-Server model (e.g. http). Not scalable
since they are centralized services.
P2P uncover new advantages in simplicity of use,
robustness, self organization and scalability.

5
Information Retrieval in P2P

Problem
How to efficiently retrieve Information in P2P
systems where each node shares a collection of
documents?

Documents consists of keywords.
Resembles Information Retrieval but resources are
distributed now.
Primary Data Structures such as Global Inverted
Indexes cant be maintained efficiently.

6
Solutions for P2P Information Retrieval

1) Centralized Approaches
Centralized Indexes
e.g. Napster, SETI_at_HOME
2) Purely Distributed Approaches
Each node has only local knowledge.
I.R is done using Brute force mechanisms
e.g. Gnutella, Fasttrack (Kazaa)
3) Hybrid Approaches
One or more peers have partial indexes of the
contents of others.
e.g. Limewire's Ultrapeers

Centralized Index
1) Upload Index
2) Query/QueryHit
3) Download (offline)
1
2
3
1) Connect
2) Query/QueryHit
3) Download (offline)
1,2
3
1) Connect
2) Intelligent Query/QueryHit
3) Download (offline)
1,2
3
7
Motivation

On 1st June we crawled the Gnutella P2P Network
for 5 hours with 17 workstations.
We analyzed 15,153,524 query messages.
Observation High locality of specific queries.
We try to exploit this property for more
efficient searches?

8
Presentation Outline

Introduction Information Retrieval (I.R) in
Peer-to-Peer networks.
Techniques for Distributed I.R.
Breadth-First Search.
Random Breadth-First Search.
Intelligent Search with profiling.
Experimental Evaluation.
Related Work.
Conclusions Future Work.

9
Techniques for Distributed I.R.

Breadth-First Search (Gnutella)
Each Query Message is propagated along all
outgoing links of a peer using TTL
(time-to-live).
TTL is decremented on each forward until it
becomes 0
Technique for I.R in P2P systems such as
Gnutella.
Results?
The physical network comes to its knees
Long Delays for search results.

P2P Network N
A
QUERY
1
QUERYHIT
2
Peer q
Peer d
10
Techniques for Distributed I.R.

2. Modified Random BFS
Each Query Message is forwarded to only a
fraction of outgoing links (e.g. ½ of them).
TTL is again decremented on each forward until it
becomes 0.
Results?
Fewer Messages but possibly less results
This algorithm is probabilistic.
Some segments may become
unreachable

unreachable
B
A
QUERY
1
P2P Network N
QUERYHIT
2
C
Peer d
11
Techniques for Distributed I.R.

3. Intelligent Search Mechanism (ISM)
Idea Each Query Message is forwarded
intelligently based on what queries a peer
answered in the past.
Components of ISM (for each node u)
Profile Mechanism, for each neighbor N(u).
Peer Ranking Mechanism, for ranking peers locally
and send a search query only to the ones that
most likely will answer.
Similarity Function, for finding similar search
queries.
Search Mechanism, for propagating queries based
on local indexes

A
QUERY
1
profiles
QUERYHIT
2
?
Peer d
12
Techniques for Distributed I.R.

3. Intelligent Search Mechanism (ISM)
a) Profile mechanism.
Maintains a list of past queries routed through
that host.
Every time a QueryHit is received the table is
updated
The profile manager uses a Least Recently Used
policy to keep most recent queries in
repository.
Profiles are kept for neighbors only so the cost
for maintaining this cost is O(Td), T is a
limiting factor per profile, d is the degree of a
node

Size Td

13
Techniques for Distributed I.R.

3. Intelligent Search Mechanism (ISM)
b) Peer Ranking Mechanism.
Before forwarding a Query Message a peer performs
an on-the-fly ranking of its peers to determine
the best paths.
We use the Aggregate Similarity of peer Pi to a
query q, computed by a peer Pk as

14
Techniques for Distributed I.R.

3. Intelligent Search Mechanism (ISM)
c) Similarity Function The cosine similarity.
Assume that L is a set of all words (in Profile
Manager)\
e.g. Lelections, bush, clinton, super, bowl,
san, diego, ,italy, earthquake, disaster
We define an L-dimensional space where each
query is a vector.
If qitaly disaster q (vector of q)
0,0,0,,1,0,1
Recall that we have a vector for each qi stored
in the Profile Manager ( i.e. qi )

15
Techniques for Distributed I.R.

3. Intelligent Search Mechanism (ISM)
d) Search Mechanism
Utilizes the Peer Ranking Mechanism to forward
Queries to nodes that will potentially contain
the info we are looking for

Peer d
profiles
?
QUERY
1
?
16
Presentation Outline

Introduction Information Retrieval (I.R) in
Peer-to-Peer networks.
Techniques for Distributed I.R.
Breadth-First Search.
Random Breadth-First Search.
Intelligent Search with profiling.
Experimental Evaluation.
Related Work.
Conclusions Future Work.

17
Experimental Evaluation

We use a decentralized Newspaper application
built on top of the REUTERS dataset (22,531
documents grouped by 84 countries).
Random Network of 100 peers
Each peer has documents from 3 countries
The average degree of a node is 7 log2100
(connected graph)

18
Experimental Evaluation

We perform 400 sequential queries with a delay of
4 sec.
We compare Doc. Ratio (recall rate) vs. Num. of
messages
BFS (Gnutella Message Flooding) (forward to
degree nodes).
Modified BFS (randomly forward to degree/2
nodes).
Intelligent Search Mechanism
(forward to M3 highest rank nodes 1 random).

19
Experimental Evaluation

We measure Doc. Ratio (recall rate) vs. Num. of
messages with Time-to-Live (TTL)4
BFS (Gnutella) uses 763 messages w/ recall rate
100
Random BFS(degree/2) uses 120 (16) msgs w/
recall rate 42
Intelligent Search uses 131 (17) msgs w/ recall
rate 55
Recall Rate improves over time with Intelligent
Search since Peer Profiles get more knowledge.

20
Experimental Evaluation

We again measure Doc. Ratio (recall rate) vs.
Num. of messages by increasing Time-to-Live (TTL)
5
BFS (TTL4) uses 763 messages w/ recall rate
100
Random BFS(degree/2) uses 28 msgs w/ recall
rate 72
Intelligent Search uses 35 (of BFS msgs) w/
recall rate 90 !
A large number of peers receive unnecessary
messages.
We get almost identical recall (90) with only
35 of msgs

21
Presentation Outline

Introduction Information Retrieval (I.R) in
Peer-to-Peer networks.
Techniques for Distributed I.R.
Breadth-First Search.
Random Breadth-First Search.
Intelligent Search with profiling.
Experimental Evaluation.
Related Work.
Conclusions Future Work.

22
Related Work

Improving Search in P2P B.Yang et al. (Stanford)
Iterative Deepening, until Z results are returned
Directed BFS based on aggregate statistics (e.g.
num of results a peer returned, shortest queue,
forwarded the most data)
Local Indexes, each node maintains an index over
the data of peers r hops away.
Routing Indices for P2P Crespo et al. (Stanford)
Compound Indices, each node sends a clustered
summary of its topic to its neighbors. (e.g. 100
databases, 4 theory, 10 OS)
Might be too costly for Highly dynamic P2P
systems.

23
Related Work

Freenet (Clark et al.) Search by Identifiers.
uses SHA1 hashes of resources and information is
retrieved based on the key closeness in a DFS
manner.
Others such as Chord.
Systems that focus on scalable object location,
which becomes feasible by hashing and
distributing objects in the P2P system. (Searches
are by Identifier).

24
Conclusions

P2P systems offer several advantages such as
scalability, robustness and simplicity of use.
Efficient P2P Information Retrieval is not
feasible with the current Search Algorithms.
We propose an Intelligent Search Mechanism that
uses local knowledge to improve Information
Retrieval in P2P.
Our mechanism achieves 90 recall rate while
using only 35 of the initial messaging.

25
Future Work

We plan to deploy our middleware infrastructure
on a larger P2P network with more Queries.
We want to probe different Network Topologies
such as ASMap with PowerLaws.
We want to probe different Peer-Profile
maintenance policies at peers.
Compare the performance of our method with
different proposed algorithms (iterative
deepening, local indexes, etc).

26
A Local Search Mechanism for Peer-to-Peer
Networks

Vana Kalogeraki, Dimitrios Gunopulos
Demetris Zeinalipour (University of California
Riverside)
csyiazti_at_cs.ucr.edu

CIKM 2002 Eleventh International Conference on
Information and Knowledge Management
November 4-9, Mclean VA
http//www.cs.ucr.edu/csyiazti/publications.html

Write a Comment

User Comments (0)

About PowerShow.com

A Local Search Mechanism for PeertoPeer Networks - PowerPoint PPT Presentation

A Local Search Mechanism for PeertoPeer Networks

Vana Kalogeraki, Dimitrios Gunopulos & Demetris Zeinalipour (University ... e.g. Limewire's Ultrapeers. Centralized Index. 1) Upload Index. 2) Query/QueryHit ... – PowerPoint PPT presentation