Distributed ContentBased Visual Information Retrieval System on PeertoPeer Networks - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Distributed ContentBased Visual Information Retrieval System on PeertoPeer Networks

Description:

Firework Query Model. Experiments and Results. Conclusion and Comment. Motivation ... Firework Query Model ... Firework Query Model (cont.) Firework Query Model (cont. ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 27

Provided by: pruetb

Category:

more less

Transcript and Presenter's Notes

Title: Distributed ContentBased Visual Information Retrieval System on PeertoPeer Networks

1
Distributed Content-Based Visual Information
Retrieval System on Peer-to-Peer Networks

Irwin Kin, Cheuk Hang Ng, and Ka Cheung Sia
The Chinese University of Hong Kong
In ACM Transactions on Information Systems,
V22N3July04P477-501
http//www.cse.cuhk.edu.hk/miplab/discovir/
By Pruet_at_DSSG.CS.UMB

king04distributed
2
Agenda

Motivation
DISCOVIR architecture
Peer Clustering Based on Image Similarity
Firework Query Model
Experiments and Results
Conclusion and Comment

3
Motivation

Currently, most content-based image retrieval
(CBIR) systems are centralized, both on
computational and storage.
P2P should give some advantages to CBIR, eg,
larger and more diversify image collection,
better scalability, better performance and
responsiveness.
Problem with using P2P in CBIR
No centralize feature extraction, so some
standard has to be applied.
No centralize image vector storage, so some
distributed and search mechanism has to be
applied.
Image vector feature is large, so some bandwidth
optimization has to be applied when exchanging
vector data.
Because of bandwidth requirement, query flooding
model for traditional P2P (eg. Gnutella) is not
suitable.

4
Motivation (cont.)

Some approaches for solving the query flooding
CAN/Chord use DHT to distributed index among
peers, the data is convert to hash and
distributed in circular space(Chord)/multi-dimensi
on Cartesian space(CAN).
Crespo uses routing cache to keep previous query
result, so the entries in cache will be used to
assist in forwarding new queries to peers that
are supposed to contain the target data.

5
DISCOVIR architecture

The authors propose new P2P architecture targets
CBIR, called DIStributed Content-based Visual
Information Retrieval (DISCOVIR).
Based on modification made on Gnutella network,
DISCOVIR is compatible with the Gnutella
protocol, with some additional types of messages.
Each peer has their own image collection, the
image feature is extracted from local image
collection using pluggable feature extraction
module, and the image feature is kept in local
database.
Image query is based on example image (so, QBE
Query By Example approach). The query peer has to
extract and send to its neighbors (using Gnutella
protocol).
The other peers uses distance measure to find a
set of similar images and return the result back
to the query peer likewise, these peers will
porpage the query to their connecting peers.

6
DISCOVIR architecture (cont.)
7
DISCOVIR architecture (cont.)

Flow of Operations
Preprocessing
Feature extractor module can be loaded from
DISCOVIR central website by Plug-in Manager and
installed in local system. Then, feature
extractor extracts features and pass feature
vector to Image Indexer which will index the
feature vector and keep in local index storage.
Connection Establishment
Connection Manager asks the Bootstrap Server at
the first time that this peer joins the network.
Then, the peer can hooks up to the DISCOVIR
network via available peers using information
from Bootstrap Server.

8
DISCOVIR architecture (cont.)

Flow of Operations (cont.)
Query Message Routing
When user submits query of an image, Feature
Extractor instantly extract feature from that
image and construct a query message and send out
through Packet Router.
When other peers receive the query message, they
need to perform two operations
Local Index Look Up - searches for similar images
from local index using Image Indexer
Query Message Propagation - Packet Router uses
Gnutella mechanism for forwarding messages, TTL
and Replicated message checking.
Query Result Display
When the query result returns to query peer, user
will obtain a list of location and size of
matched images. Then, user can retrieve images
via HTTP Agent and the image will be displayed on
the User Interface.

9
Peer Clustering Based on Image Similarity

To solve the query flooding problem, or brute
force search problem, the peers in P2P network
has to be clustered based on image similarity.
On top of the P2P network, an overlay network of
connections, called attractive links, groups
similar peers together.
Instead of using feature vector of every images,
a signature value of image collection in each
peer is used to determine the similarity between
two peers.
Some definition

10
Peer Clustering Based on Img. Sim. (cont.)

Definition 1. is the set
of n images shared by peer p. Image feature of
each image in the collection is extracted and map
into a d-dimensional vector (R) by function f as,
. Therefore, each peer will contain
a set of vector
Definition 2. is defined as
where and are the mean and variance
of the vector collection.
Definition 3. is defined as the
Cartesian distance between two peers signature
values and using following
equation

11
Peer Clustering Based on Image Similarity (cont.)

Based on the definitions, the attractive link can
be assigned to group of similar peers using these
steps
Signature Value Calculation - every peer
calculates its signature value, .
Neighborhood Discovery - After a new peer joins
the network, it broadcasts a signature query
message. This broadcasting also be repeated in a
regular interval.
Similarity Calculation and Attractive Link
Establishment - After acquiring the signature
values of other peers, the peer can find the peer
other peer with signature value closet to its
signature value using and make an
attractive connection to link them up.

12
Peer Clustering Based on Img. Sim. (cont.)
13
Firework Query Model

In this query routing model, a query message is
routed selectively according to the content.
When it reaches its designated cluster, based on
similarity, the query message is broadcast by
peers through the attractive connections inside
the cluster.
So, when each peer receive a query, it needs to
carry out two steps
Shared File Look Up - This will compare query
feature vector with feature vector of each image
in local collection, if any image matched, it
will reply to query peer.
Route Selection - The peer calculates the
similarity between the query and its signature
value. If the similarity is more than threshold,
it will send the query to the peers connected by
attractive link (explosion), otherwise, it will
forward the query to P2P connected peer.

14
Firework Query Model (cont.)
15
Firework Query Model (cont.)

For preventing infinite query looping, replicated
message checking rule and TTL are used.
When a query appears to a peer, it is checked
against a local cache for duplication, if found,
the query is dropped.
Each time the query passes through a peer, the
TTL is decreased by one. Once the TTL reaches
zero, the query is dropped. However, if the query
is passed along an attractive link, the TTL value
may not be decreased based on a probability
called Chance-To-Survive (CTS).

16
Experiments and Results

Performance Metrics
Recall
Recall RA/RT RA retrieved relevant images,
RT total relevant images in the network. Higher
is better.
Query scope
Visited Vpeer/Tpeer Vpeer number of peers
that received and handled the query and Tpeer
total number of peers in network. Lower is better
Query efficiency
Efficiency Recall/Visited
The experimental result will compare with query
flooding algorithm (Breath-First Search (BFS)) .
Platform Sun Blade 1000 2GB Ram, Solaris v.8
C, for simulate 20,000 peers and TTL 7 with 10
iterations (queries), it took 3 Hrs.

17
Experiments and Results (cont.)

Data Set
Synthetic data - 100 sets with random mean and
variance. For each set 100 points (images) are
generated according to Gaussian distribution.
Real data - 10,000 images from 100 categories in
Corel Draws Image Collection CD.

18
Experiments and Results (cont.)

Experiments 1 Performances affected by different
number of peers.
Experiments setup
Number of peers 2,000 - 20,000.
Network diameters 9 - 11
Average distance 5.36 - 6.58
Number of images assigned to each peers 100
images
Feature vector dimensions 9
TTL FQM 5 BFS 7

19
Experiments and results (cont.)
Recall vs. peers
Recall
Number of Peers
20
Experiments and results (cont.)
Query Scope vs. peers
Query Scope
Number of Peers
21
Experiments and results (cont.)
Efficiency vs. peers
Recall/Query Scope
Number of Peers
22
Experiments and results (cont.)

Experiments 2 Performances affected by TTL
Experiments setup
Number of peers 10,000
Network diameters 10
Average distance 6.2
Number of images assigned to each peers 100
images
Feature vector dimensions 9
TTL 4-9

23
Experiments and results (cont.)
Recall vs. TTL
Recall
TTL value of query message
24
Experiments and results (cont.)
Query Scope vs. TTL
Query Scope
TTL value of query message
25
Experiments and results (cont.)
Efficiency vs. TTL
Recall/Query Scope
TTL value of query message
26
Conclusion and Comment

FQM outperforms BFS in all tests.
FQM can reduce the network traffic cost (query
scope) while able to maintain high query
efficiency.
Comment
Random query routing, kind of BFS.
Broadcasting when join, update, and explosion.
Costly for dynamic network
Complete feature vector is sent, so more traffic
when using high-dimensional feature vector.
Still log(n)