Title: Distributed ContentBased Visual Information Retrieval System on PeertoPeer Networks
1 Distributed Content-Based Visual Information
Retrieval System on Peer-to-Peer Networks
- Irwin Kin, Cheuk Hang Ng, and Ka Cheung Sia
- The Chinese University of Hong Kong
- In ACM Transactions on Information Systems,
V22N3July04P477-501 - http//www.cse.cuhk.edu.hk/miplab/discovir/
- By Pruet_at_DSSG.CS.UMB
king04distributed
2Agenda
- Motivation
- DISCOVIR architecture
- Peer Clustering Based on Image Similarity
- Firework Query Model
- Experiments and Results
- Conclusion and Comment
3Motivation
- Currently, most content-based image retrieval
(CBIR) systems are centralized, both on
computational and storage. - P2P should give some advantages to CBIR, eg,
larger and more diversify image collection,
better scalability, better performance and
responsiveness. - Problem with using P2P in CBIR
- No centralize feature extraction, so some
standard has to be applied. - No centralize image vector storage, so some
distributed and search mechanism has to be
applied. - Image vector feature is large, so some bandwidth
optimization has to be applied when exchanging
vector data. - Because of bandwidth requirement, query flooding
model for traditional P2P (eg. Gnutella) is not
suitable.
4Motivation (cont.)
- Some approaches for solving the query flooding
- CAN/Chord use DHT to distributed index among
peers, the data is convert to hash and
distributed in circular space(Chord)/multi-dimensi
on Cartesian space(CAN). - Crespo uses routing cache to keep previous query
result, so the entries in cache will be used to
assist in forwarding new queries to peers that
are supposed to contain the target data.
5DISCOVIR architecture
- The authors propose new P2P architecture targets
CBIR, called DIStributed Content-based Visual
Information Retrieval (DISCOVIR). - Based on modification made on Gnutella network,
DISCOVIR is compatible with the Gnutella
protocol, with some additional types of messages. - Each peer has their own image collection, the
image feature is extracted from local image
collection using pluggable feature extraction
module, and the image feature is kept in local
database. - Image query is based on example image (so, QBE
Query By Example approach). The query peer has to
extract and send to its neighbors (using Gnutella
protocol). - The other peers uses distance measure to find a
set of similar images and return the result back
to the query peer likewise, these peers will
porpage the query to their connecting peers.
6DISCOVIR architecture (cont.)
7DISCOVIR architecture (cont.)
- Flow of Operations
- Preprocessing
- Feature extractor module can be loaded from
DISCOVIR central website by Plug-in Manager and
installed in local system. Then, feature
extractor extracts features and pass feature
vector to Image Indexer which will index the
feature vector and keep in local index storage. - Connection Establishment
- Connection Manager asks the Bootstrap Server at
the first time that this peer joins the network.
Then, the peer can hooks up to the DISCOVIR
network via available peers using information
from Bootstrap Server.
8DISCOVIR architecture (cont.)
- Flow of Operations (cont.)
- Query Message Routing
- When user submits query of an image, Feature
Extractor instantly extract feature from that
image and construct a query message and send out
through Packet Router. - When other peers receive the query message, they
need to perform two operations - Local Index Look Up - searches for similar images
from local index using Image Indexer - Query Message Propagation - Packet Router uses
Gnutella mechanism for forwarding messages, TTL
and Replicated message checking. - Query Result Display
- When the query result returns to query peer, user
will obtain a list of location and size of
matched images. Then, user can retrieve images
via HTTP Agent and the image will be displayed on
the User Interface.
9Peer Clustering Based on Image Similarity
- To solve the query flooding problem, or brute
force search problem, the peers in P2P network
has to be clustered based on image similarity. - On top of the P2P network, an overlay network of
connections, called attractive links, groups
similar peers together. - Instead of using feature vector of every images,
a signature value of image collection in each
peer is used to determine the similarity between
two peers. - Some definition
10Peer Clustering Based on Img. Sim. (cont.)
- Definition 1. is the set
of n images shared by peer p. Image feature of
each image in the collection is extracted and map
into a d-dimensional vector (R) by function f as,
. Therefore, each peer will contain
a set of vector - Definition 2. is defined as
where and are the mean and variance
of the vector collection. - Definition 3. is defined as the
Cartesian distance between two peers signature
values and using following
equation
11Peer Clustering Based on Image Similarity (cont.)
- Based on the definitions, the attractive link can
be assigned to group of similar peers using these
steps - Signature Value Calculation - every peer
calculates its signature value, . - Neighborhood Discovery - After a new peer joins
the network, it broadcasts a signature query
message. This broadcasting also be repeated in a
regular interval. - Similarity Calculation and Attractive Link
Establishment - After acquiring the signature
values of other peers, the peer can find the peer
other peer with signature value closet to its
signature value using and make an
attractive connection to link them up.
12Peer Clustering Based on Img. Sim. (cont.)
13Firework Query Model
- In this query routing model, a query message is
routed selectively according to the content. - When it reaches its designated cluster, based on
similarity, the query message is broadcast by
peers through the attractive connections inside
the cluster. - So, when each peer receive a query, it needs to
carry out two steps - Shared File Look Up - This will compare query
feature vector with feature vector of each image
in local collection, if any image matched, it
will reply to query peer. - Route Selection - The peer calculates the
similarity between the query and its signature
value. If the similarity is more than threshold,
it will send the query to the peers connected by
attractive link (explosion), otherwise, it will
forward the query to P2P connected peer.
14Firework Query Model (cont.)
15Firework Query Model (cont.)
- For preventing infinite query looping, replicated
message checking rule and TTL are used. - When a query appears to a peer, it is checked
against a local cache for duplication, if found,
the query is dropped. - Each time the query passes through a peer, the
TTL is decreased by one. Once the TTL reaches
zero, the query is dropped. However, if the query
is passed along an attractive link, the TTL value
may not be decreased based on a probability
called Chance-To-Survive (CTS).
16Experiments and Results
- Performance Metrics
- Recall
- Recall RA/RT RA retrieved relevant images,
RT total relevant images in the network. Higher
is better. - Query scope
- Visited Vpeer/Tpeer Vpeer number of peers
that received and handled the query and Tpeer
total number of peers in network. Lower is better - Query efficiency
- Efficiency Recall/Visited
- The experimental result will compare with query
flooding algorithm (Breath-First Search (BFS)) . - Platform Sun Blade 1000 2GB Ram, Solaris v.8
C, for simulate 20,000 peers and TTL 7 with 10
iterations (queries), it took 3 Hrs.
17Experiments and Results (cont.)
- Data Set
- Synthetic data - 100 sets with random mean and
variance. For each set 100 points (images) are
generated according to Gaussian distribution. - Real data - 10,000 images from 100 categories in
Corel Draws Image Collection CD.
18Experiments and Results (cont.)
- Experiments 1 Performances affected by different
number of peers. - Experiments setup
- Number of peers 2,000 - 20,000.
- Network diameters 9 - 11
- Average distance 5.36 - 6.58
- Number of images assigned to each peers 100
images - Feature vector dimensions 9
- TTL FQM 5 BFS 7
19Experiments and results (cont.)
Recall vs. peers
Recall
Number of Peers
20Experiments and results (cont.)
Query Scope vs. peers
Query Scope
Number of Peers
21Experiments and results (cont.)
Efficiency vs. peers
Recall/Query Scope
Number of Peers
22Experiments and results (cont.)
- Experiments 2 Performances affected by TTL
- Experiments setup
- Number of peers 10,000
- Network diameters 10
- Average distance 6.2
- Number of images assigned to each peers 100
images - Feature vector dimensions 9
- TTL 4-9
23Experiments and results (cont.)
Recall vs. TTL
Recall
TTL value of query message
24Experiments and results (cont.)
Query Scope vs. TTL
Query Scope
TTL value of query message
25Experiments and results (cont.)
Efficiency vs. TTL
Recall/Query Scope
TTL value of query message
26Conclusion and Comment
- FQM outperforms BFS in all tests.
- FQM can reduce the network traffic cost (query
scope) while able to maintain high query
efficiency. - Comment
- Random query routing, kind of BFS.
- Broadcasting when join, update, and explosion.
- Costly for dynamic network
- Complete feature vector is sent, so more traffic
when using high-dimensional feature vector. - Still log(n)