Distributed ContentBased Visual Information Retrieval System on PeertoPeer Networks - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Distributed ContentBased Visual Information Retrieval System on PeertoPeer Networks

Description:

Firework Query Model. Experiments and Results. Conclusion and Comment. Motivation ... Firework Query Model ... Firework Query Model (cont.) Firework Query Model (cont. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 27
Provided by: pruetb
Category:

less

Transcript and Presenter's Notes

Title: Distributed ContentBased Visual Information Retrieval System on PeertoPeer Networks


1
Distributed Content-Based Visual Information
Retrieval System on Peer-to-Peer Networks
  • Irwin Kin, Cheuk Hang Ng, and Ka Cheung Sia
  • The Chinese University of Hong Kong
  • In ACM Transactions on Information Systems,
    V22N3July04P477-501
  • http//www.cse.cuhk.edu.hk/miplab/discovir/
  • By Pruet_at_DSSG.CS.UMB

king04distributed
2
Agenda
  • Motivation
  • DISCOVIR architecture
  • Peer Clustering Based on Image Similarity
  • Firework Query Model
  • Experiments and Results
  • Conclusion and Comment

3
Motivation
  • Currently, most content-based image retrieval
    (CBIR) systems are centralized, both on
    computational and storage.
  • P2P should give some advantages to CBIR, eg,
    larger and more diversify image collection,
    better scalability, better performance and
    responsiveness.
  • Problem with using P2P in CBIR
  • No centralize feature extraction, so some
    standard has to be applied.
  • No centralize image vector storage, so some
    distributed and search mechanism has to be
    applied.
  • Image vector feature is large, so some bandwidth
    optimization has to be applied when exchanging
    vector data.
  • Because of bandwidth requirement, query flooding
    model for traditional P2P (eg. Gnutella) is not
    suitable.

4
Motivation (cont.)
  • Some approaches for solving the query flooding
  • CAN/Chord use DHT to distributed index among
    peers, the data is convert to hash and
    distributed in circular space(Chord)/multi-dimensi
    on Cartesian space(CAN).
  • Crespo uses routing cache to keep previous query
    result, so the entries in cache will be used to
    assist in forwarding new queries to peers that
    are supposed to contain the target data.

5
DISCOVIR architecture
  • The authors propose new P2P architecture targets
    CBIR, called DIStributed Content-based Visual
    Information Retrieval (DISCOVIR).
  • Based on modification made on Gnutella network,
    DISCOVIR is compatible with the Gnutella
    protocol, with some additional types of messages.
  • Each peer has their own image collection, the
    image feature is extracted from local image
    collection using pluggable feature extraction
    module, and the image feature is kept in local
    database.
  • Image query is based on example image (so, QBE
    Query By Example approach). The query peer has to
    extract and send to its neighbors (using Gnutella
    protocol).
  • The other peers uses distance measure to find a
    set of similar images and return the result back
    to the query peer likewise, these peers will
    porpage the query to their connecting peers.

6
DISCOVIR architecture (cont.)
7
DISCOVIR architecture (cont.)
  • Flow of Operations
  • Preprocessing
  • Feature extractor module can be loaded from
    DISCOVIR central website by Plug-in Manager and
    installed in local system. Then, feature
    extractor extracts features and pass feature
    vector to Image Indexer which will index the
    feature vector and keep in local index storage.
  • Connection Establishment
  • Connection Manager asks the Bootstrap Server at
    the first time that this peer joins the network.
    Then, the peer can hooks up to the DISCOVIR
    network via available peers using information
    from Bootstrap Server.

8
DISCOVIR architecture (cont.)
  • Flow of Operations (cont.)
  • Query Message Routing
  • When user submits query of an image, Feature
    Extractor instantly extract feature from that
    image and construct a query message and send out
    through Packet Router.
  • When other peers receive the query message, they
    need to perform two operations
  • Local Index Look Up - searches for similar images
    from local index using Image Indexer
  • Query Message Propagation - Packet Router uses
    Gnutella mechanism for forwarding messages, TTL
    and Replicated message checking.
  • Query Result Display
  • When the query result returns to query peer, user
    will obtain a list of location and size of
    matched images. Then, user can retrieve images
    via HTTP Agent and the image will be displayed on
    the User Interface.

9
Peer Clustering Based on Image Similarity
  • To solve the query flooding problem, or brute
    force search problem, the peers in P2P network
    has to be clustered based on image similarity.
  • On top of the P2P network, an overlay network of
    connections, called attractive links, groups
    similar peers together.
  • Instead of using feature vector of every images,
    a signature value of image collection in each
    peer is used to determine the similarity between
    two peers.
  • Some definition

10
Peer Clustering Based on Img. Sim. (cont.)
  • Definition 1. is the set
    of n images shared by peer p. Image feature of
    each image in the collection is extracted and map
    into a d-dimensional vector (R) by function f as,
    . Therefore, each peer will contain
    a set of vector
  • Definition 2. is defined as
    where and are the mean and variance
    of the vector collection.
  • Definition 3. is defined as the
    Cartesian distance between two peers signature
    values and using following
    equation

11
Peer Clustering Based on Image Similarity (cont.)
  • Based on the definitions, the attractive link can
    be assigned to group of similar peers using these
    steps
  • Signature Value Calculation - every peer
    calculates its signature value, .
  • Neighborhood Discovery - After a new peer joins
    the network, it broadcasts a signature query
    message. This broadcasting also be repeated in a
    regular interval.
  • Similarity Calculation and Attractive Link
    Establishment - After acquiring the signature
    values of other peers, the peer can find the peer
    other peer with signature value closet to its
    signature value using and make an
    attractive connection to link them up.

12
Peer Clustering Based on Img. Sim. (cont.)
13
Firework Query Model
  • In this query routing model, a query message is
    routed selectively according to the content.
  • When it reaches its designated cluster, based on
    similarity, the query message is broadcast by
    peers through the attractive connections inside
    the cluster.
  • So, when each peer receive a query, it needs to
    carry out two steps
  • Shared File Look Up - This will compare query
    feature vector with feature vector of each image
    in local collection, if any image matched, it
    will reply to query peer.
  • Route Selection - The peer calculates the
    similarity between the query and its signature
    value. If the similarity is more than threshold,
    it will send the query to the peers connected by
    attractive link (explosion), otherwise, it will
    forward the query to P2P connected peer.

14
Firework Query Model (cont.)
15
Firework Query Model (cont.)
  • For preventing infinite query looping, replicated
    message checking rule and TTL are used.
  • When a query appears to a peer, it is checked
    against a local cache for duplication, if found,
    the query is dropped.
  • Each time the query passes through a peer, the
    TTL is decreased by one. Once the TTL reaches
    zero, the query is dropped. However, if the query
    is passed along an attractive link, the TTL value
    may not be decreased based on a probability
    called Chance-To-Survive (CTS).

16
Experiments and Results
  • Performance Metrics
  • Recall
  • Recall RA/RT RA retrieved relevant images,
    RT total relevant images in the network. Higher
    is better.
  • Query scope
  • Visited Vpeer/Tpeer Vpeer number of peers
    that received and handled the query and Tpeer
    total number of peers in network. Lower is better
  • Query efficiency
  • Efficiency Recall/Visited
  • The experimental result will compare with query
    flooding algorithm (Breath-First Search (BFS)) .
  • Platform Sun Blade 1000 2GB Ram, Solaris v.8
    C, for simulate 20,000 peers and TTL 7 with 10
    iterations (queries), it took 3 Hrs.

17
Experiments and Results (cont.)
  • Data Set
  • Synthetic data - 100 sets with random mean and
    variance. For each set 100 points (images) are
    generated according to Gaussian distribution.
  • Real data - 10,000 images from 100 categories in
    Corel Draws Image Collection CD.

18
Experiments and Results (cont.)
  • Experiments 1 Performances affected by different
    number of peers.
  • Experiments setup
  • Number of peers 2,000 - 20,000.
  • Network diameters 9 - 11
  • Average distance 5.36 - 6.58
  • Number of images assigned to each peers 100
    images
  • Feature vector dimensions 9
  • TTL FQM 5 BFS 7

19
Experiments and results (cont.)
Recall vs. peers
Recall
Number of Peers
20
Experiments and results (cont.)
Query Scope vs. peers
Query Scope
Number of Peers
21
Experiments and results (cont.)
Efficiency vs. peers
Recall/Query Scope
Number of Peers
22
Experiments and results (cont.)
  • Experiments 2 Performances affected by TTL
  • Experiments setup
  • Number of peers 10,000
  • Network diameters 10
  • Average distance 6.2
  • Number of images assigned to each peers 100
    images
  • Feature vector dimensions 9
  • TTL 4-9

23
Experiments and results (cont.)
Recall vs. TTL
Recall
TTL value of query message
24
Experiments and results (cont.)
Query Scope vs. TTL
Query Scope
TTL value of query message
25
Experiments and results (cont.)
Efficiency vs. TTL
Recall/Query Scope
TTL value of query message
26
Conclusion and Comment
  • FQM outperforms BFS in all tests.
  • FQM can reduce the network traffic cost (query
    scope) while able to maintain high query
    efficiency.
  • Comment
  • Random query routing, kind of BFS.
  • Broadcasting when join, update, and explosion.
  • Costly for dynamic network
  • Complete feature vector is sent, so more traffic
    when using high-dimensional feature vector.
  • Still log(n)
Write a Comment
User Comments (0)
About PowerShow.com