Inference Network Approach to Image Retrieval - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Inference Network Approach to Image Retrieval

Description:

cat, grass, tiger, water. annotation vector (binary, same for each segment) ... grass. Operator Nodes. Combine probabilities from term and image nodes ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 32
Provided by: DM1
Category:

less

Transcript and Presenter's Notes

Title: Inference Network Approach to Image Retrieval


1
Inference Network Approach to Image Retrieval
  • Don Metzler
  • R. Manmatha
  • Center for Intelligent Information Retrieval
  • University of Massachusetts, Amherst

2
Motivation
  • Most image retrieval systems assume
  • Implicit AND between query terms
  • Equal weight to all query terms
  • Query made up of single representation (keywords
    or image)
  • tiger grass find images of tigers AND grass
    where each is equally important
  • How can we search with queries made up of both
    keywords and images?
  • How do we perform the following queries?
  • swimmers OR jets
  • tiger AND grass, with more emphasis on tigers
    than grass
  • find me images of birds that are similar to this
    image

3
Related Work
  • Inference networks
  • Semantic image retrieval
  • Kernel methods

4
Inference Networks
  • Inference Network Framework Turtle and Croft
    89
  • Formal information retrieval framework
  • INQUERY search engine
  • Allows structured queries
  • phrases, term weighting, synonyms, etc
  • wsum( 2.0 phrase ( image retrieval ) 1.0 model
    )
  • Handles multiple document representations (full
    text, abstracts, etc)
  • MIRROR deVries 98
  • General multimedia retrieval framework based on
    inference network framework
  • Probabilities based on clustering of metadata
    feature vectors

5
Image Retrieval / Annotation
  • Co-occurrence model Mori, et al
  • Translation model Duygulu, et al
  • Correspondence LDA Blei and Jordan
  • Relevance model-based approaches
  • Cross-Media Relevance Models (CMRM) Jeon, et al
  • Continuous Relevance Models (CRM) Lavrenko, et
    al

6
Goals
  • Input
  • Set of annotated training images
  • Users information need
  • Terms
  • Images
  • Soft Boolean operators (AND, OR, NOT)
  • Weights
  • Set of test images with no annotations
  • Output
  • Ranked list of test images relevant to users
    information need

7
Data
  • Corel data set
  • 4500 training images (annotated)
  • 500 test images
  • 374 word vocabulary
  • Each image automatically segmented using
    normalized cuts
  • Each image represented as set of representation
    vectors
  • 36 geometric, color, and texture features
  • Same features used in similar past work

Available at http//vision.cs.arizona.edu/kobus
/research/data/eccv_2002/
8
Features
  • Geometric (6)
  • area
  • position (2)
  • boundary/area
  • convexity
  • moment of inertia
  • Color (18)
  • avg. RGB x 2 (6)
  • std. dev. of RGB (3)
  • avg. Lab x 2 (6)
  • std. dev. of Lab (3)
  • Texture (12)
  • mean oriented energy, 30 deg. increments (12)

9
Image representation
cat, grass, tiger, water
annotation vector(binary, same for each segment)
representation vector(real, 1 per image segment)
10
Image Inference Network
  • J representation vectors for image,
    (continuous, observed)
  • qw word w appears in annotation, (binary,
    hidden)
  • qr representation vector r describes image,
    (binary, hidden)
  • qop query operator satisfied (binary, hidden)
  • I users information need is satisfied,
    (binary, hidden)

J
fixed(based on image)
Image Network
qr1
qrk

qw1
qwk

qop1
qop2
dynamic(based on query)
Query Network
I
11
Example Instantiation
tiger
grass
and
or
12
What needs to be estimated?
J
  • P(qw J)
  • P(qr J)
  • P(qop J)
  • P(I J)

qr1
qrk

qw1
qwk

qop1
qop2
I
13
P(qw J) P( tiger )
  • Probability term w appears in annotation given
    image J
  • Apply Bayes Rule and use non-parametric density
    estimation
  • Assumes representation vectors are conditionally
    independent given term w annotates the image

???
14
How can we compute P(ri qw)?
area of low likelihood
area of high likelihood
representation vectors associated with image
annotated by w
training setrepresentation vectors
15
P(qw J) final form
S assumed to be diagonal, estimated from training
data
16
Regularized estimates
  • P(qw J) are good, but not comparable across
    images
  • Is the 2nd image really 2x more cat-like?
  • Probabilities are relative per image

17
Regularized estimates
  • Impact Transformations
  • Used in information retrieval
  • Rank is more important than value Anh and
    Moffat
  • Idea
  • rank each term according to P(qw J)
  • give higher probabilities to higher ranked terms
  • P(qw J) 1/rankqw
  • Zipfian assumption on relevant words
  • a few words are very relevant
  • a medium number of words are somewhat relevant
  • many words are not relevant

18
Regularized estimates
19
What needs to be estimated?
J
  • P(qw J)
  • P(qr J)
  • P(qop J)
  • P(I J)

qr1
qrk

qw1
qwk

qop1
qop2
I
20
P(qr J) P( )
  • Probability representation vector observed given
    J
  • Use non-parametric density estimation again
  • Impose density over Js representation vectors
    just as we did in the previous case
  • Estimates may be poor
  • Based on small sample ( 10 representation
    vectors)
  • Naïve and simple, yet somewhat effective

21
Model Comparison
  • Relevance modeling-based
  • CMRM, CRM
  • General form
  • Fully non-parametric
  • Model used here
  • General form

22
What needs to be estimated?
J
  • P(qw J)
  • P(qr J)
  • P(qop J)
  • P(I J)

qr1
qrk

qw1
qwk

qop1
qop2
I
23
Query Operators
  • Soft Boolean operators
  • and / wand (weighted and)
  • or
  • not
  • One node added to query network for each operator
    present in query
  • Many others possible
  • max, sum, wsum
  • syn, odn, uwn, phrase, etc

24
or( and ( tiger grass ) )
tiger
grass
and
or
25
Operator Nodes
  • Combine probabilities from term and image nodes
  • Closed forms derived from corresponding link
    matrices
  • Allows efficient inference within network

Par(q) Set of qs parent nodes
26
but where do they come from?
A
B
Q
27
Results - Annotation
28
foals (0.46) mare (0.33) horses (0.20) field
(1.9E-5) grass (4.9E-6)
railroad (0.67) train (0.27) smoke
(0.04) locomotive (0.01) ruins (1.7E-5)
sphinx (0.99) polar (5.0E-3) stone (1.0E-3) bear
(9.7E-4) sculpture (6.0E-4)
29
Results - Retrieval
30
(No Transcript)
31
(No Transcript)
32
Future Work
  • Use rectangular segmentation and improved
    features
  • Different probability estimates
  • Better methods for estimating P(qr J)
  • Use CRM to estimate P(qw J)
  • Apply to documents with both text and images
  • Develop a method/testbed for evaluating for more
    interesting queries

33
Conclusions
  • General, robust model based on inference network
    framework
  • Departure from implied AND between query terms
  • Unique non-parametric method for estimating
    network probabilities
  • Pros
  • Retrieval (inference) is fast
  • Makes no assumptions about distribution of data
  • Cons
  • Estimation of term probabilities is slow
  • Requires sufficient data to get a good estimate
Write a Comment
User Comments (0)
About PowerShow.com