Title: Inference Network Approach to Image Retrieval
 1Inference Network Approach to Image Retrieval
- Don Metzler 
 - R. Manmatha 
 - Center for Intelligent Information Retrieval 
 - University of Massachusetts, Amherst
 
  2Motivation
- Most image retrieval systems assume 
 - Implicit AND between query terms 
 - Equal weight to all query terms 
 - Query made up of single representation (keywords 
or image)  - tiger grass  find images of tigers AND grass 
where each is equally important  - How can we search with queries made up of both 
keywords and images?  - How do we perform the following queries? 
 - swimmers OR jets 
 - tiger AND grass, with more emphasis on tigers 
than grass  - find me images of birds that are similar to this 
image 
  3Related Work
- Inference networks 
 - Semantic image retrieval 
 - Kernel methods
 
  4Inference Networks
- Inference Network Framework Turtle and Croft 
89  - Formal information retrieval framework 
 - INQUERY search engine 
 - Allows structured queries 
 - phrases, term weighting, synonyms, etc 
 - wsum( 2.0 phrase ( image retrieval ) 1.0 model 
)  - Handles multiple document representations (full 
text, abstracts, etc)  - MIRROR deVries 98 
 - General multimedia retrieval framework based on 
inference network framework  - Probabilities based on clustering of metadata  
feature vectors 
  5Image Retrieval / Annotation
- Co-occurrence model Mori, et al 
 - Translation model Duygulu, et al 
 - Correspondence LDA Blei and Jordan 
 - Relevance model-based approaches 
 - Cross-Media Relevance Models (CMRM) Jeon, et al 
 - Continuous Relevance Models (CRM) Lavrenko, et 
al 
  6Goals
- Input 
 - Set of annotated training images 
 - Users information need 
 - Terms 
 - Images 
 - Soft Boolean operators (AND, OR, NOT) 
 - Weights 
 - Set of test images with no annotations 
 - Output 
 - Ranked list of test images relevant to users 
information need 
  7Data
- Corel data set 
 - 4500 training images (annotated) 
 - 500 test images 
 - 374 word vocabulary 
 - Each image automatically segmented using 
normalized cuts  - Each image represented as set of representation 
vectors  - 36 geometric, color, and texture features 
 - Same features used in similar past work
 
 Available at http//vision.cs.arizona.edu/kobus
/research/data/eccv_2002/  
 8Features
- Geometric (6) 
 - area 
 - position (2) 
 - boundary/area 
 - convexity 
 - moment of inertia 
 - Color (18) 
 - avg. RGB x 2 (6) 
 - std. dev. of RGB (3) 
 - avg. Lab x 2 (6) 
 - std. dev. of Lab (3) 
 - Texture (12) 
 - mean oriented energy, 30 deg. increments (12)
 
  9Image representation
cat, grass, tiger, water
annotation vector(binary, same for each segment)
representation vector(real, 1 per image segment) 
 10Image Inference Network
- J  representation vectors for image, 
(continuous, observed)  - qw  word w appears in annotation, (binary, 
hidden)  - qr  representation vector r describes image, 
(binary, hidden)  - qop  query operator satisfied (binary, hidden) 
 - I  users information need is satisfied, 
(binary, hidden)  
J
fixed(based on image)
Image Network
qr1
qrk
qw1
qwk
qop1
qop2
dynamic(based on query)
Query Network
I 
 11Example Instantiation
tiger
grass
and
or 
 12What needs to be estimated?
J
- P(qw  J) 
 - P(qr  J) 
 - P(qop  J) 
 - P(I  J)
 
qr1
qrk
qw1
qwk
qop1
qop2
I 
 13P(qw  J)  P( tiger  ) 
- Probability term w appears in annotation given 
image J  - Apply Bayes Rule and use non-parametric density 
estimation  - Assumes representation vectors are conditionally 
independent given term w annotates the image 
??? 
 14How can we compute P(ri  qw)?
area of low likelihood
area of high likelihood
representation vectors associated with image 
annotated by w
training setrepresentation vectors 
 15P(qw  J) final form
S assumed to be diagonal, estimated from training 
data 
 16Regularized estimates
- P(qw  J) are good, but not comparable across 
images  
- Is the 2nd image really 2x more cat-like? 
 - Probabilities are relative per image 
 
  17Regularized estimates
- Impact Transformations 
 - Used in information retrieval 
 - Rank is more important than value Anh and 
Moffat  - Idea 
 - rank each term according to P(qw  J) 
 - give higher probabilities to higher ranked terms 
 - P(qw  J)  1/rankqw 
 - Zipfian assumption on relevant words 
 - a few words are very relevant 
 - a medium number of words are somewhat relevant 
 - many words are not relevant
 
  18Regularized estimates 
 19What needs to be estimated?
J
- P(qw  J) 
 - P(qr  J) 
 - P(qop  J) 
 - P(I  J)
 
qr1
qrk
qw1
qwk
qop1
qop2
I 
 20P(qr  J)  P(  ) 
- Probability representation vector observed given 
J  - Use non-parametric density estimation again 
 - Impose density over Js representation vectors 
just as we did in the previous case  - Estimates may be poor 
 - Based on small sample ( 10 representation 
vectors)  - Naïve and simple, yet somewhat effective
 
  21Model Comparison
- Relevance modeling-based 
 - CMRM, CRM 
 - General form 
 - Fully non-parametric 
 - Model used here 
 - General form 
 
  22What needs to be estimated?
J
- P(qw  J) 
 - P(qr  J) 
 - P(qop  J) 
 - P(I  J)
 
qr1
qrk
qw1
qwk
qop1
qop2
I 
 23Query Operators
- Soft Boolean operators 
 - and / wand (weighted and) 
 - or 
 - not 
 - One node added to query network for each operator 
present in query  - Many others possible 
 - max, sum, wsum 
 - syn, odn, uwn, phrase, etc
 
  24or( and ( tiger grass ) )
tiger
grass
and
or 
 25Operator Nodes
- Combine probabilities from term and image nodes 
 - Closed forms derived from corresponding link 
matrices  - Allows efficient inference within network
 
Par(q)  Set of qs parent nodes 
 26 but where do they come from?
A
B
Q 
 27Results - Annotation 
 28foals (0.46) mare (0.33) horses (0.20) field 
(1.9E-5) grass (4.9E-6)
railroad (0.67) train (0.27) smoke 
(0.04) locomotive (0.01) ruins (1.7E-5)
sphinx (0.99) polar (5.0E-3) stone (1.0E-3) bear 
(9.7E-4) sculpture (6.0E-4) 
 29Results - Retrieval 
 30(No Transcript) 
 31(No Transcript) 
 32Future Work
- Use rectangular segmentation and improved 
features  - Different probability estimates 
 - Better methods for estimating P(qr  J) 
 - Use CRM to estimate P(qw  J) 
 - Apply to documents with both text and images 
 - Develop a method/testbed for evaluating for more 
interesting queries 
  33Conclusions
- General, robust model based on inference network 
framework  - Departure from implied AND between query terms 
 - Unique non-parametric method for estimating 
network probabilities  - Pros 
 - Retrieval (inference) is fast 
 - Makes no assumptions about distribution of data 
 - Cons 
 - Estimation of term probabilities is slow 
 - Requires sufficient data to get a good estimate