Information Retrieval Techniques For Peer-To-Peer Networks - PowerPoint PPT Presentation

About This Presentation

Title:

Information Retrieval Techniques For Peer-To-Peer Networks

Description:

... routing indices are similar to the routing tables deployed in the Bellman Ford ... recall rate the fraction of documents each of the search mechanisms retrieves ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 20

Provided by: mohanrathi

Learn more at: https://crystal.uta.edu

Category:

more less

Transcript and Presenter's Notes

Title: Information Retrieval Techniques For Peer-To-Peer Networks

1
Information Retrieval Techniques For Peer-To-Peer
Networks

Demetrios Zeinalipour-Yazti, Vana Kalogeraki and
Dimitrios Gunopulos
Presented By Ranjan Dash

2
Layout

Introduction
P2P Network IR Techniques
PeerWare Infrastructure and experiments

3
Introduction

Major challenge
efficiently search the content of other peers
Definition
Large number of peers collaborate dynamically in
an ad hoc manner and share information in
large-scale distributed environments without
centralized co-ordination
P2P environment characteristic
Each peer has a database or collection of docs
Query contains set of key words
Reply message contains pointers to matching
documents
Different from static data environments
No central repository
Nodes join and leave in ad hoc and dynamically

4
P2P Network IR Techniques

P2P Network IR Techniques
Breadth-First Search (BFS)
Random Breadth-First-Search (RBFS)
Intelligent Search Mechanism (ISM)
Directed BFS and gtRES
Random Walker Searches
Randomized Gossiping
Local Routing Indices
Centralized Approaches
Searching Object Identifiers
Distributed IR

5
P2P Network IR Techniques

Breadth-First Search (BFS)
Widely used in file-sharing systems
Propagates to all neighbors except sender
QueryHit Msg (of docs, bandwidth info) follows
the same path
Simple, guarantees high hit rate
Poor in performance and network utilization
Low bandwidth node - a bottleneck
Can be improved using TTL

6
P2P Network IR Techniques

Random Breadth-First Search (RBFS)
Dramatic improvements over BFS
Forwards only to a fraction of its peers,
selected at random
Does not need global knowledge, takes local
decisions - faster
Probabilistic might not reach some large
network segments

7
P2P Network IR Techniques

Intelligent Search Mechanism (ISM)
Quick, efficient and least communication costs
Propagates only to peers more likely to reply
Consists of 2 components that run in each peer
Profile mechanism
Relevance rank
Works good for query locality

Forwards to same neighbor always -Starvation
for new peers
Solution add small random subset of peers to
most relevant set

8
P2P Network IR Techniques

Profile mechanism
Builds a profile for each of its neighboring
peers
Maintains T most recent Queries and QueryHits
with no of results
Least recently used replacement policy for most
recent query

9
P2P Network IR Techniques

Relevance rank
Ranking of neighbors to decide which ones to
forward a query
Ranking of a peer Pi for a query q
Qsim is cosine similarity between 2 queries

0, most results in the past that matters like
gtRES
10
P2P Network IR Techniques

Directed BFS and gtRES
forwards a query to a subset of its peers based
on some aggregated statistics
Send out to k peers which had returned the most
results for the last m queries

BFS turned into a DFS for k 1, m10
Similar to ISM, but simpler
Does not explore nodes that contain content
related to query
Performs well because it routes larger networks
segments

11
P2P Network IR Techniques

Random-Walker Searches

Each node randomly forwards a query message,
called a walker to one of its peers
Can be extended from 1-walker to k-walker
Resembles RBFS but message numbers increase
linearly
Like RBFS does not use most relevant content to
guide query
Adaptive Probability search (APS) similar
Uses feed back from previous searches to
probabilistically guide future walkers

12
P2P Network IR Techniques

Randomized Gossiping PlanetP
Global inverted index, partially constructed by
each node, called local index bloom filter
Propagates it to the rest through gossiping
Adv. Of bloom filter
Smaller messages
Saving in network I/O
Problem of scalability for PlanetP

13
P2P Network IR Techniques

Local Routing Indices
by Arturo Crespo and Hector Garcia-Molina
Hybrid technique uses local indices containing
the direction toward the documents
3 techniques
compound routing indices (CRI)
hop-count routing index (HRI)
exponentially aggregated index (ERI)
Good for topologies where only few nodes have
very large numbers of neighbors - (tree, tree
with cycles)
The routing indices are similar to the routing
tables deployed in the BellmanFord
CRI - a node q maintains statistics for each
neighbor that indicate how many documents are
reachable through each neighbor.
HRI - CRI for k hops prohibitive storage cost
for large k.
ERI - addresses the issue of HRI by aggregating
HRI using a cost formula.

14
P2P Network IR Techniques

Centralized Approaches
maintain an inverted index over all the documents
in the participating hosts collections - Google,
Yahoo, Napster
Each joining peer A uploads an index of all its
shared documents to the central repository R.
A querying node B searches As documents through
R.
B can communicate with A directly (using an
out-of-band protocol such as HTTP).
Kazaa - Little different. Uses a set of
more-powerful peers that acts as a central
repositories
different kind of animal than the rest.
Simple, Robust, shorter search time, guaranteed
to find all results

15
P2P Network IR Techniques

Searching Object Identifiers
Distributed file indexing systems - Chord,
OceanStore, and Content Addressable Network
(CAN), Freenet
efficient searches using object identifiers (a
hashcode on the name of a file) rather than
keywords.
Perform object lookup operations to get the
address (an IP address) of the node that is
storing the object.
Optimizes object retrieval by minimizing the
numbers of messages and hops required.
Disadvantage - only search for object identifiers
and thus cant capture the relevance of the doc.

16
P2P Network IR Techniques

Distributed IR
Having distributed databases, the main IR problem
is deciding which databases are most likely to
contain the most relevant documents.
Its possible to achieve good results for
conceptually separated collections.
However, the assumption is that the querying
party has some statistical knowledge about each
databases contents (word frequencies in
documents) and therefore must have a global view
of the system.

17
PeerWare Infrastructure and experiments

Evaluation metrics
recall rate the fraction of documents each of
the search mechanisms retrieves
Efficiency - the number of messages needed to
find the results
Implemented only algorithms that require local
knowledge when searching for documents.
BFS (the baseline)
Implemented RBFS, gtRES (k 0.5 d and m 100,
where d is the degree of a node) , and ISM
these 3 techniques forward query messages to half
the neighbors that BFS contacts.
gtRES and ISM use previous knowledge to decide on
which peers to forward the query

18
PeerWare Infrastructure and experiments
BFS requires almost 2.5 times as many messages as
its competitors.
19
PeerWare Infrastructure and experiments
ISM found the most documents. ISM achieved almost
a 90-percent recall rate while using only 38
percent of the messages BFS required. ISM
improves its knowledge over time. Both gtRES and
ISM started out with a low recall rate (around 40
to 50 percent) because initially they randomly
choose their neighbors.

Write a Comment

User Comments (0)