A Local Search Mechanism for PeertoPeer Networks - PowerPoint PPT Presentation

About This Presentation
Title:

A Local Search Mechanism for PeertoPeer Networks

Description:

Vana Kalogeraki, Dimitrios Gunopulos & Demetris Zeinalipour (University ... e.g. Limewire's Ultrapeers. Centralized Index. 1) Upload Index. 2) Query/QueryHit ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 27
Provided by: demetriosz
Learn more at: http://alumni.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: A Local Search Mechanism for PeertoPeer Networks


1
A Local Search Mechanism for Peer-to-Peer
Networks
  • Vana Kalogeraki, Dimitrios Gunopulos
  • Demetris Zeinalipour (University of California
    Riverside)
  • csyiazti_at_cs.ucr.edu

CIKM 2002 Eleventh International Conference on
Information and Knowledge Management
November 4-9, Mclean VA
http//www.cs.ucr.edu/csyiazti/publications.html
2
Presentation Outline
  • Introduction Information Retrieval (I.R) in
    Peer-to-Peer networks.
  • Techniques for Distributed I.R.
  • Breadth-First Search.
  • Random Breadth-First Search.
  • Intelligent Search with profiling.
  • Experimental Evaluation.
  • Related Work.
  • Conclusions Future Work.

3
Introduction to Peer-to-Peer
  • Peer-to-Peer Computing definition
  • Sharing of computer resources and information
    through direct exchange
  • Clients (downloaders) are also servers
  • Clients may join or leave the network at any time
    highly fault-tolerant but with a cost!
  • Searches are done within the virtual network
    while actual downloads are done offline (with
    HTTP).

4
Introduction to Peer-to-Peer
  • Peer-to-Peer (P2P) systems are increasingly
    becoming popular.
  • P2P file-sharing systems, such as Gnutella,
    Napster and Freenet realized a distributed
    infrastructure for sharing files.
  • Traditionally, files were shared using the
    Client-Server model (e.g. http). Not scalable
    since they are centralized services.
  • P2P uncover new advantages in simplicity of use,
    robustness, self organization and scalability.

5
Information Retrieval in P2P
  • Problem
  • How to efficiently retrieve Information in P2P
    systems where each node shares a collection of
    documents?
  • Documents consists of keywords.
  • Resembles Information Retrieval but resources are
    distributed now.
  • Primary Data Structures such as Global Inverted
    Indexes cant be maintained efficiently.

6
Solutions for P2P Information Retrieval
  • 1) Centralized Approaches
  • Centralized Indexes
  • e.g. Napster, SETI_at_HOME
  • 2) Purely Distributed Approaches
  • Each node has only local knowledge.
  • I.R is done using Brute force mechanisms
  • e.g. Gnutella, Fasttrack (Kazaa)
  • 3) Hybrid Approaches
  • One or more peers have partial indexes of the
    contents of others.
  • e.g. Limewire's Ultrapeers

Centralized Index
1) Upload Index
2) Query/QueryHit
3) Download (offline)
1
2
3
1) Connect
2) Query/QueryHit
3) Download (offline)
1,2
3
1) Connect
2) Intelligent Query/QueryHit
3) Download (offline)
1,2
3
7
Motivation
  • On 1st June we crawled the Gnutella P2P Network
    for 5 hours with 17 workstations.
  • We analyzed 15,153,524 query messages.
  • Observation High locality of specific queries.
  • We try to exploit this property for more
    efficient searches?

8
Presentation Outline
  • Introduction Information Retrieval (I.R) in
    Peer-to-Peer networks.
  • Techniques for Distributed I.R.
  • Breadth-First Search.
  • Random Breadth-First Search.
  • Intelligent Search with profiling.
  • Experimental Evaluation.
  • Related Work.
  • Conclusions Future Work.

9
Techniques for Distributed I.R.
  • Breadth-First Search (Gnutella)
  • Each Query Message is propagated along all
    outgoing links of a peer using TTL
    (time-to-live).
  • TTL is decremented on each forward until it
    becomes 0
  • Technique for I.R in P2P systems such as
    Gnutella.
  • Results?
  • The physical network comes to its knees
  • Long Delays for search results.

P2P Network N
A
QUERY
1
QUERYHIT
2
Peer q
Peer d
10
Techniques for Distributed I.R.
  • 2. Modified Random BFS
  • Each Query Message is forwarded to only a
    fraction of outgoing links (e.g. ½ of them).
  • TTL is again decremented on each forward until it
    becomes 0.
  • Results?
  • Fewer Messages but possibly less results
  • This algorithm is probabilistic.
  • Some segments may become
  • unreachable

unreachable
B
A
QUERY
1
P2P Network N
QUERYHIT
2
C
Peer d
11
Techniques for Distributed I.R.
  • 3. Intelligent Search Mechanism (ISM)
  • Idea Each Query Message is forwarded
    intelligently based on what queries a peer
    answered in the past.
  • Components of ISM (for each node u)
  • Profile Mechanism, for each neighbor N(u).
  • Peer Ranking Mechanism, for ranking peers locally
    and send a search query only to the ones that
    most likely will answer.
  • Similarity Function, for finding similar search
    queries.
  • Search Mechanism, for propagating queries based
    on local indexes

A
QUERY
1
profiles
QUERYHIT
2
?
Peer d
12
Techniques for Distributed I.R.
  • 3. Intelligent Search Mechanism (ISM)
  • a) Profile mechanism.
  • Maintains a list of past queries routed through
    that host.
  • Every time a QueryHit is received the table is
    updated
  • The profile manager uses a Least Recently Used
    policy to keep most recent queries in
    repository.
  • Profiles are kept for neighbors only so the cost
    for maintaining this cost is O(Td), T is a
    limiting factor per profile, d is the degree of a
    node

Size Td

13
Techniques for Distributed I.R.
  • 3. Intelligent Search Mechanism (ISM)
  • b) Peer Ranking Mechanism.
  • Before forwarding a Query Message a peer performs
    an on-the-fly ranking of its peers to determine
    the best paths.
  • We use the Aggregate Similarity of peer Pi to a
    query q, computed by a peer Pk as

14
Techniques for Distributed I.R.
  • 3. Intelligent Search Mechanism (ISM)
  • c) Similarity Function The cosine similarity.
  • Assume that L is a set of all words (in Profile
    Manager)\
  • e.g. Lelections, bush, clinton, super, bowl,
    san, diego, ,italy, earthquake, disaster
  • We define an L-dimensional space where each
    query is a vector.
  • If qitaly disaster q (vector of q)
    0,0,0,,1,0,1
  • Recall that we have a vector for each qi stored
    in the Profile Manager ( i.e. qi )

15
Techniques for Distributed I.R.
  • 3. Intelligent Search Mechanism (ISM)
  • d) Search Mechanism
  • Utilizes the Peer Ranking Mechanism to forward
    Queries to nodes that will potentially contain
    the info we are looking for

Peer d
profiles
?
QUERY
1
?
16
Presentation Outline
  • Introduction Information Retrieval (I.R) in
    Peer-to-Peer networks.
  • Techniques for Distributed I.R.
  • Breadth-First Search.
  • Random Breadth-First Search.
  • Intelligent Search with profiling.
  • Experimental Evaluation.
  • Related Work.
  • Conclusions Future Work.

17
Experimental Evaluation
  • We use a decentralized Newspaper application
    built on top of the REUTERS dataset (22,531
    documents grouped by 84 countries).
  • Random Network of 100 peers
  • Each peer has documents from 3 countries
  • The average degree of a node is 7 log2100
    (connected graph)

18
Experimental Evaluation
  • We perform 400 sequential queries with a delay of
    4 sec.
  • We compare Doc. Ratio (recall rate) vs. Num. of
    messages
  • BFS (Gnutella Message Flooding) (forward to
    degree nodes).
  • Modified BFS (randomly forward to degree/2
    nodes).
  • Intelligent Search Mechanism
  • (forward to M3 highest rank nodes 1 random).

19
Experimental Evaluation
  • We measure Doc. Ratio (recall rate) vs. Num. of
    messages with Time-to-Live (TTL)4
  • BFS (Gnutella) uses 763 messages w/ recall rate
    100
  • Random BFS(degree/2) uses 120 (16) msgs w/
    recall rate 42
  • Intelligent Search uses 131 (17) msgs w/ recall
    rate 55
  • Recall Rate improves over time with Intelligent
    Search since Peer Profiles get more knowledge.

20
Experimental Evaluation
  • We again measure Doc. Ratio (recall rate) vs.
    Num. of messages by increasing Time-to-Live (TTL)
    5
  • BFS (TTL4) uses 763 messages w/ recall rate
    100
  • Random BFS(degree/2) uses 28 msgs w/ recall
    rate 72
  • Intelligent Search uses 35 (of BFS msgs) w/
    recall rate 90 !
  • A large number of peers receive unnecessary
    messages.
  • We get almost identical recall (90) with only
    35 of msgs

21
Presentation Outline
  • Introduction Information Retrieval (I.R) in
    Peer-to-Peer networks.
  • Techniques for Distributed I.R.
  • Breadth-First Search.
  • Random Breadth-First Search.
  • Intelligent Search with profiling.
  • Experimental Evaluation.
  • Related Work.
  • Conclusions Future Work.

22
Related Work
  • Improving Search in P2P B.Yang et al. (Stanford)
  • Iterative Deepening, until Z results are returned

  • Directed BFS based on aggregate statistics (e.g.
    num of results a peer returned, shortest queue,
    forwarded the most data)
  • Local Indexes, each node maintains an index over
    the data of peers r hops away.
  • Routing Indices for P2P Crespo et al. (Stanford)
  • Compound Indices, each node sends a clustered
    summary of its topic to its neighbors. (e.g. 100
    databases, 4 theory, 10 OS)
  • Might be too costly for Highly dynamic P2P
    systems.

23
Related Work
  • Freenet (Clark et al.) Search by Identifiers.
  • uses SHA1 hashes of resources and information is
    retrieved based on the key closeness in a DFS
    manner.
  • Others such as Chord.
  • Systems that focus on scalable object location,
    which becomes feasible by hashing and
    distributing objects in the P2P system. (Searches
    are by Identifier).

24
Conclusions
  • P2P systems offer several advantages such as
    scalability, robustness and simplicity of use.
  • Efficient P2P Information Retrieval is not
    feasible with the current Search Algorithms.
  • We propose an Intelligent Search Mechanism that
    uses local knowledge to improve Information
    Retrieval in P2P.
  • Our mechanism achieves 90 recall rate while
    using only 35 of the initial messaging.

25
Future Work
  • We plan to deploy our middleware infrastructure
    on a larger P2P network with more Queries.
  • We want to probe different Network Topologies
    such as ASMap with PowerLaws.
  • We want to probe different Peer-Profile
    maintenance policies at peers.
  • Compare the performance of our method with
    different proposed algorithms (iterative
    deepening, local indexes, etc).

26
A Local Search Mechanism for Peer-to-Peer
Networks
  • Vana Kalogeraki, Dimitrios Gunopulos
  • Demetris Zeinalipour (University of California
    Riverside)
  • csyiazti_at_cs.ucr.edu

CIKM 2002 Eleventh International Conference on
Information and Knowledge Management
November 4-9, Mclean VA
http//www.cs.ucr.edu/csyiazti/publications.html
Write a Comment
User Comments (0)
About PowerShow.com