Information Retrieval Techniques For Peer-To-Peer Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Information Retrieval Techniques For Peer-To-Peer Networks

Description:

... routing indices are similar to the routing tables deployed in the Bellman Ford ... recall rate the fraction of documents each of the search mechanisms retrieves ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 20
Provided by: mohanrathi
Learn more at: https://crystal.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: Information Retrieval Techniques For Peer-To-Peer Networks


1
Information Retrieval Techniques For Peer-To-Peer
Networks
  • Demetrios Zeinalipour-Yazti, Vana Kalogeraki and
  • Dimitrios Gunopulos
  • Presented By Ranjan Dash

2
Layout
  • Introduction
  • P2P Network IR Techniques
  • PeerWare Infrastructure and experiments

3
Introduction
  • Major challenge
  • efficiently search the content of other peers
  • Definition
  • Large number of peers collaborate dynamically in
    an ad hoc manner and share information in
    large-scale distributed environments without
    centralized co-ordination
  • P2P environment characteristic
  • Each peer has a database or collection of docs
  • Query contains set of key words
  • Reply message contains pointers to matching
    documents
  • Different from static data environments
  • No central repository
  • Nodes join and leave in ad hoc and dynamically

4
P2P Network IR Techniques
  • P2P Network IR Techniques
  • Breadth-First Search (BFS)
  • Random Breadth-First-Search (RBFS)
  • Intelligent Search Mechanism (ISM)
  • Directed BFS and gtRES
  • Random Walker Searches
  • Randomized Gossiping
  • Local Routing Indices
  • Centralized Approaches
  • Searching Object Identifiers
  • Distributed IR

5
P2P Network IR Techniques
  • Breadth-First Search (BFS)
  • Widely used in file-sharing systems
  • Propagates to all neighbors except sender
  • QueryHit Msg (of docs, bandwidth info) follows
    the same path
  • Simple, guarantees high hit rate
  • Poor in performance and network utilization
  • Low bandwidth node - a bottleneck
  • Can be improved using TTL

6
P2P Network IR Techniques
  • Random Breadth-First Search (RBFS)
  • Dramatic improvements over BFS
  • Forwards only to a fraction of its peers,
    selected at random
  • Does not need global knowledge, takes local
    decisions - faster
  • Probabilistic might not reach some large
    network segments

7
P2P Network IR Techniques
  • Intelligent Search Mechanism (ISM)
  • Quick, efficient and least communication costs
  • Propagates only to peers more likely to reply
  • Consists of 2 components that run in each peer
  • Profile mechanism
  • Relevance rank
  • Works good for query locality
  • Forwards to same neighbor always -Starvation
    for new peers
  • Solution add small random subset of peers to
    most relevant set

8
P2P Network IR Techniques
  • Profile mechanism
  • Builds a profile for each of its neighboring
    peers
  • Maintains T most recent Queries and QueryHits
    with no of results
  • Least recently used replacement policy for most
    recent query

9
P2P Network IR Techniques
  • Relevance rank
  • Ranking of neighbors to decide which ones to
    forward a query
  • Ranking of a peer Pi for a query q
  • Qsim is cosine similarity between 2 queries

0, most results in the past that matters like
gtRES
10
P2P Network IR Techniques
  • Directed BFS and gtRES
  • forwards a query to a subset of its peers based
    on some aggregated statistics
  • Send out to k peers which had returned the most
    results for the last m queries
  • BFS turned into a DFS for k 1, m10
  • Similar to ISM, but simpler
  • Does not explore nodes that contain content
    related to query
  • Performs well because it routes larger networks
    segments

11
P2P Network IR Techniques
  • Random-Walker Searches
  • Each node randomly forwards a query message,
    called a walker to one of its peers
  • Can be extended from 1-walker to k-walker
  • Resembles RBFS but message numbers increase
    linearly
  • Like RBFS does not use most relevant content to
    guide query
  • Adaptive Probability search (APS) similar
  • Uses feed back from previous searches to
    probabilistically guide future walkers

12
P2P Network IR Techniques
  • Randomized Gossiping PlanetP
  • Global inverted index, partially constructed by
    each node, called local index bloom filter
  • Propagates it to the rest through gossiping
  • Adv. Of bloom filter
  • Smaller messages
  • Saving in network I/O
  • Problem of scalability for PlanetP

13
P2P Network IR Techniques
  • Local Routing Indices
  • by Arturo Crespo and Hector Garcia-Molina
  • Hybrid technique uses local indices containing
    the direction toward the documents
  • 3 techniques
  • compound routing indices (CRI)
  • hop-count routing index (HRI)
  • exponentially aggregated index (ERI)
  • Good for topologies where only few nodes have
    very large numbers of neighbors - (tree, tree
    with cycles)
  • The routing indices are similar to the routing
    tables deployed in the BellmanFord
  • CRI - a node q maintains statistics for each
    neighbor that indicate how many documents are
    reachable through each neighbor.
  • HRI - CRI for k hops prohibitive storage cost
    for large k.
  • ERI - addresses the issue of HRI by aggregating
    HRI using a cost formula.

14
P2P Network IR Techniques
  • Centralized Approaches
  • maintain an inverted index over all the documents
    in the participating hosts collections - Google,
    Yahoo, Napster
  • Each joining peer A uploads an index of all its
    shared documents to the central repository R.
  • A querying node B searches As documents through
    R.
  • B can communicate with A directly (using an
    out-of-band protocol such as HTTP).
  • Kazaa - Little different. Uses a set of
    more-powerful peers that acts as a central
    repositories
  • different kind of animal than the rest.
  • Simple, Robust, shorter search time, guaranteed
    to find all results

15
P2P Network IR Techniques
  • Searching Object Identifiers
  • Distributed file indexing systems - Chord,
    OceanStore, and Content Addressable Network
    (CAN), Freenet
  • efficient searches using object identifiers (a
    hashcode on the name of a file) rather than
    keywords.
  • Perform object lookup operations to get the
    address (an IP address) of the node that is
    storing the object.
  • Optimizes object retrieval by minimizing the
    numbers of messages and hops required.
  • Disadvantage - only search for object identifiers
    and thus cant capture the relevance of the doc.

16
P2P Network IR Techniques
  • Distributed IR
  • Having distributed databases, the main IR problem
    is deciding which databases are most likely to
    contain the most relevant documents.
  • Its possible to achieve good results for
    conceptually separated collections.
  • However, the assumption is that the querying
    party has some statistical knowledge about each
    databases contents (word frequencies in
    documents) and therefore must have a global view
    of the system.

17
PeerWare Infrastructure and experiments
  • Evaluation metrics
  • recall rate the fraction of documents each of
    the search mechanisms retrieves
  • Efficiency - the number of messages needed to
    find the results
  • Implemented only algorithms that require local
    knowledge when searching for documents.
  • BFS (the baseline)
  • Implemented RBFS, gtRES (k 0.5 d and m 100,
    where d is the degree of a node) , and ISM
  • these 3 techniques forward query messages to half
    the neighbors that BFS contacts.
  • gtRES and ISM use previous knowledge to decide on
    which peers to forward the query

18
PeerWare Infrastructure and experiments
BFS requires almost 2.5 times as many messages as
its competitors.
19
PeerWare Infrastructure and experiments
ISM found the most documents. ISM achieved almost
a 90-percent recall rate while using only 38
percent of the messages BFS required. ISM
improves its knowledge over time. Both gtRES and
ISM started out with a low recall rate (around 40
to 50 percent) because initially they randomly
choose their neighbors.
Write a Comment
User Comments (0)
About PowerShow.com