Title: Inference Network Approach to Image Retrieval
1Inference Network Approach to Image Retrieval
- Don Metzler
- R. Manmatha
- Center for Intelligent Information Retrieval
- University of Massachusetts, Amherst
2Motivation
- Most image retrieval systems assume
- Implicit AND between query terms
- Equal weight to all query terms
- Query made up of single representation (keywords
or image) - tiger grass find images of tigers AND grass
where each is equally important - How can we search with queries made up of both
keywords and images? - How do we perform the following queries?
- swimmers OR jets
- tiger AND grass, with more emphasis on tigers
than grass - find me images of birds that are similar to this
image
3Related Work
- Inference networks
- Semantic image retrieval
- Kernel methods
4Inference Networks
- Inference Network Framework Turtle and Croft
89 - Formal information retrieval framework
- INQUERY search engine
- Allows structured queries
- phrases, term weighting, synonyms, etc
- wsum( 2.0 phrase ( image retrieval ) 1.0 model
) - Handles multiple document representations (full
text, abstracts, etc) - MIRROR deVries 98
- General multimedia retrieval framework based on
inference network framework - Probabilities based on clustering of metadata
feature vectors
5Image Retrieval / Annotation
- Co-occurrence model Mori, et al
- Translation model Duygulu, et al
- Correspondence LDA Blei and Jordan
- Relevance model-based approaches
- Cross-Media Relevance Models (CMRM) Jeon, et al
- Continuous Relevance Models (CRM) Lavrenko, et
al
6Goals
- Input
- Set of annotated training images
- Users information need
- Terms
- Images
- Soft Boolean operators (AND, OR, NOT)
- Weights
- Set of test images with no annotations
- Output
- Ranked list of test images relevant to users
information need
7Data
- Corel data set
- 4500 training images (annotated)
- 500 test images
- 374 word vocabulary
- Each image automatically segmented using
normalized cuts - Each image represented as set of representation
vectors - 36 geometric, color, and texture features
- Same features used in similar past work
Available at http//vision.cs.arizona.edu/kobus
/research/data/eccv_2002/
8Features
- Geometric (6)
- area
- position (2)
- boundary/area
- convexity
- moment of inertia
- Color (18)
- avg. RGB x 2 (6)
- std. dev. of RGB (3)
- avg. Lab x 2 (6)
- std. dev. of Lab (3)
- Texture (12)
- mean oriented energy, 30 deg. increments (12)
9Image representation
cat, grass, tiger, water
annotation vector(binary, same for each segment)
representation vector(real, 1 per image segment)
10Image Inference Network
- J representation vectors for image,
(continuous, observed) - qw word w appears in annotation, (binary,
hidden) - qr representation vector r describes image,
(binary, hidden) - qop query operator satisfied (binary, hidden)
- I users information need is satisfied,
(binary, hidden)
J
fixed(based on image)
Image Network
qr1
qrk
qw1
qwk
qop1
qop2
dynamic(based on query)
Query Network
I
11Example Instantiation
tiger
grass
and
or
12What needs to be estimated?
J
- P(qw J)
- P(qr J)
- P(qop J)
- P(I J)
qr1
qrk
qw1
qwk
qop1
qop2
I
13P(qw J) P( tiger )
- Probability term w appears in annotation given
image J - Apply Bayes Rule and use non-parametric density
estimation - Assumes representation vectors are conditionally
independent given term w annotates the image
???
14How can we compute P(ri qw)?
area of low likelihood
area of high likelihood
representation vectors associated with image
annotated by w
training setrepresentation vectors
15P(qw J) final form
S assumed to be diagonal, estimated from training
data
16Regularized estimates
- P(qw J) are good, but not comparable across
images
- Is the 2nd image really 2x more cat-like?
- Probabilities are relative per image
17Regularized estimates
- Impact Transformations
- Used in information retrieval
- Rank is more important than value Anh and
Moffat - Idea
- rank each term according to P(qw J)
- give higher probabilities to higher ranked terms
- P(qw J) 1/rankqw
- Zipfian assumption on relevant words
- a few words are very relevant
- a medium number of words are somewhat relevant
- many words are not relevant
18Regularized estimates
19What needs to be estimated?
J
- P(qw J)
- P(qr J)
- P(qop J)
- P(I J)
qr1
qrk
qw1
qwk
qop1
qop2
I
20P(qr J) P( )
- Probability representation vector observed given
J - Use non-parametric density estimation again
- Impose density over Js representation vectors
just as we did in the previous case - Estimates may be poor
- Based on small sample ( 10 representation
vectors) - Naïve and simple, yet somewhat effective
21Model Comparison
- Relevance modeling-based
- CMRM, CRM
- General form
- Fully non-parametric
- Model used here
- General form
22What needs to be estimated?
J
- P(qw J)
- P(qr J)
- P(qop J)
- P(I J)
qr1
qrk
qw1
qwk
qop1
qop2
I
23Query Operators
- Soft Boolean operators
- and / wand (weighted and)
- or
- not
- One node added to query network for each operator
present in query - Many others possible
- max, sum, wsum
- syn, odn, uwn, phrase, etc
24or( and ( tiger grass ) )
tiger
grass
and
or
25Operator Nodes
- Combine probabilities from term and image nodes
- Closed forms derived from corresponding link
matrices - Allows efficient inference within network
Par(q) Set of qs parent nodes
26 but where do they come from?
A
B
Q
27Results - Annotation
28foals (0.46) mare (0.33) horses (0.20) field
(1.9E-5) grass (4.9E-6)
railroad (0.67) train (0.27) smoke
(0.04) locomotive (0.01) ruins (1.7E-5)
sphinx (0.99) polar (5.0E-3) stone (1.0E-3) bear
(9.7E-4) sculpture (6.0E-4)
29Results - Retrieval
30(No Transcript)
31(No Transcript)
32Future Work
- Use rectangular segmentation and improved
features - Different probability estimates
- Better methods for estimating P(qr J)
- Use CRM to estimate P(qw J)
- Apply to documents with both text and images
- Develop a method/testbed for evaluating for more
interesting queries
33Conclusions
- General, robust model based on inference network
framework - Departure from implied AND between query terms
- Unique non-parametric method for estimating
network probabilities - Pros
- Retrieval (inference) is fast
- Makes no assumptions about distribution of data
- Cons
- Estimation of term probabilities is slow
- Requires sufficient data to get a good estimate