Title: What's New in ContentBased Image Retrieval
1What's New in Content-Based Image Retrieval?
Xiang Sean Zhou IFP, Beckman Institute for
Advanced Science and Technology University of
Illinois at Urbana Champaign
2Background MARS
- MARS Multimedia Analysis and Retrieval System
MARS Content Analyzer
MARS User Interface with Relevance feedback
Metadata
3Outline
- Small sample learning algorithm for relevance
feedback - Structural features for image content
representation - Unification of keywords and visual contents
4- Small sample learning algorithm for relevance
feedback - Image content representation structure
- Global Structure Representation
- Local Structure Representation
- Unification of keywords and visual contents
5Relevance Feedback Scenario
- Machine provides initial retrieval results,
through query-by-keyword, sketch, or example,
etc. - Iteratively
- User provides judgment on the current results as
to whether, and to what degree, they are relevant
to her/his request - The machine learns and tries again.
6The current optimal schemes only deal with
positive examples However, without a doubt,
negative examples can help.
7Question
- With both positive and negative feedbacks, how
different is relevance feedback learning from the
age-old two-class classification problem?
82-class SVM under small samples
Target Cluster
9Observation
- The small sample problem
- The small number of negative examples can NOT be
representative for the negative class
distribution while for the positive class of
interest, the situation is usually better. - Positive examples are all alike in a way, each
negative example is negative in its own fashion. - Sean Tolstoy
10Intuition
- Happy families are all alike, every unhappy
family is unhappy in its own way. - --Leo Tolstoy, Anna Karenina
11BiasMap Linear
xi, i1,,NP the positive examples yi,
i1,,NN the negative examples mx is the mean
vector of xi. We want a linear transformation
matrix W, such that
where
and
12The Small Sample Issue Statistical Bias
- Sample-based plug-in biased under small samples
(RDA Friedman, 1989) - Regularized BDA
- Discounting negative examples
13Boosting BiasMap using RankBoost (Freund, et al.
1998)
Given positive set X1, and negative set X0,
Initialize training example weights For t
1, , T Train a weak BDT using weights Wt
use weighted covariance matrix as the scatter
matrix estimates (Equation (8)). Outputs are in
rt(x). Get weak hypothesis ht x?(0, 1),
using Update the weights
Normalize the weights to sum to 1 separately
among positive and negative.
14A Faster, ad hoc Variant RankBoost.H
For t 1, , T Train a weak BDT using
weights Wt Outputs are in rt(x). here
the notation ? 1 when the predicate ? holds,
and 0 otherwise. Update the weights
Normalize the weights to sum to 1 separately
among positive and negative.
15Kernel Machine
The original linear algorithm is applied in a
feature space F, which is related to the
original space by a non-linear mapping ? C ? F
x ?? (x) However this mapping will
not be carried out explicitly, but through the
evaluation of a kernel function k(xi, xj)
?T(xi) ?(xj).
16BiasMap in Feature Space
The task is to rewrite the BiasMap formula in the
feature space into dot-product form
17Solutions in Kernel Form
The numerator
The denominator
The projection of a new pattern z onto w is given
by
18The Kernel Matrices
19Does Kernel Help? Image database testing
20BiasMap vs. KDA and SVM on Face and non-face
classification
Examples of non-faces (1000)
21Precision in top 1000 SVM?larger_margin_first
Precision in top 1000. ?50. Each point is an
average of 100 trials. SVM returns the points
with larger margins first.
22 Boosting vs. Kernel Face vs. Nonface
- Comparable improvement to BDA.
- RankBoost.H outperforms RankBoost clearly in
terms of rank difference. In terms of hit rate in
top 1000, they are very close.
23 Boosting vs. Kernel Image Database
Averaged hit rate in top 100 for 500 rounds of
testing
24- Small sample learning algorithm for relevance
feedback - Image content representation structure
- Global Structure Representation
- Local Structure Representation
- Unification of keywords and visual contents
25Quest for Structure Features
- Texture
- Repetitive patterns
-
- Effective only for uniform texture images or
regions, or requires reliable segmentation
- Shape
- Object contour
-
- Requires good segmentation and are only
effective for simple and clean images
26Defining Structural Feature..
- Non-repetitive illuminance patterns in the image
- Low-level (or generic)
- Features in-between texture and shape
- Image/object structure (e.g., edge length),
structural complexity, loops, etc., which may not
be readily expressible by Texture or Shape.
27Edge contains structural information
28Gathering Information from Edge Maps
- Edge length?
- Connectivity?
- Complexity?
- Line-likeness?
- Loopy structure?
- Edge Directions?
- Etc.
29Water-Filling Algorithm
- Given an edge map, treat the edges as canals
(water channels) - For each set of connected canals (edges), fill
in water until all the water fronts stop - Extract features during the water-filling
process.
30Water-Filling Edge Features Feature
Primitives
- . FillingTime . ForkCount
- . LoopCount . WaterAmount
- . Horizontal (vertical) Cover .
LoopCount
- IF Assume When two water-heads collide, we see
one splash and when n water-heads collide at
the same time, we see n-1 splashes (Think it as
n-1 collide with one sequentially). - THEN
- splashes of non-overlapping loops
- No overhead in computation.
31City/Building vs. Landscape
32Retrieval of Buildings
Scatter plot for of hits in top 10 and top 20
returns
17 out of the 92 images are labeled as
buildings by a human subject
33Images w/ clear structure (Corel dataset
17,000 images)
Water-filling retrieving images of clear
structure
34Compare to Texture Features
Water-filling (WF) versus Wavelet Variances (WV)
100 airplanes and 100 eagles as query images
Airplanes
Hit in top 10
Top 20
Top 40
Top 80
WF
3.56
6.29
10.92
18.03
WV
3.32
5.75
9.94
17.07
Eagles
Hit in top 10
Top 20
Top 4
0
Top 80
WF
2.65
3.33
4.91
6.79
WV
1.98
2.82
4.43
6.58
Note that although the averaged numbers are
comparable between the two features, the
underlying matching mechanisms are very different
35Feature analysis under relevance feedback
20 randomly selected horse images are used as
initial queries,
Feature performance with relevance feedback (C
Color T Texture S Water-filling)
For the horses example, it is also observed
that water-filling features are capable of
pulling the system out of the convergence where
all horses are confined by certain color by
adding horses of different color but the same
edge structures into the top 20 returns.
36- Small sample learning algorithm for relevance
feedback - Image content representation structure
- Global Structure Representation
- Local Structure Representation
- Unification of keywords and visual contents
37Histogram-based Structure Modeling
Product of m 3-D, mltn
3n-dimensional
Histograms
ICA
IDW DW
x2
s2
Local features vectors
x1
s1
x3
s3
All possible tuples
Image and local k-tuples
IDW DW Inverse distance weighted and distance
weighted histograming, i.e., histograming
increments depend on the compactness of the
tuple.
38Harris Interest Points
Differential Invariant Jets
39Differential Invariant Gaussian Jets (Koenderink
and Van Doorn 1987)
- Based on Taylor expansion at a point
- Stacking partial derivative up to the 3rd order
- L Image Intensity L1 ?L/ ?x L2 ?L/ ?y
- Einsteinian notation is used, i.e., summation is
implied - J3 Lij Li Lj
- L11 L1 L1 2 L12 L1 L2 L22 L2 L2
- and J7 has 8 terms, each is a product of one
3rd order and three 1st order derivatives - ?12 -?21 1 ?11 ?22 0
- The derivatives are carried out by convolution
with the derivatives of Gaussian - Rotation, translation invariants Scale
invariance needs some work (Schmid and Mohr,
1997).
402-tuple histograms, 4 ICs
41Image Retrieval/Classification
- Test sets COIL, COREL subsets.
- Compared against traditional texture and
structure feature representations and Euclidean
metric - Preliminary results Comparable or Outperform.
-
- COIL COREL-1
- Traditional 96
91 - PASM 97(HI) 99(L-0.9Norm)
96(HI) - COIL Columbia Object Image Library
- COREL-1 Corel Subset of 7 classes and 10 images
per class. - Distance metrics HI, K-L, Chi-squared, Lp
42?
No ICA, 9 Jets
ICA 9
ICA 3
ICA 7
43 20 rank3
16 car rank2
Object19 rank1
13Phone rank1
7 Piggy rank1
6Marlboro rank1
44Where is the leopard?
45Where is the tiger?
46- Small sample learning algorithm for relevance
feedback - Image content representation structure
- Global Structure Representation
- Local Structure Representation
- Unification of keywords and visual contents
47Desired Working Scenarios
My name is Socks!
- A user calls his cat Socks? The system need to
learn that Socks is a cat or a pet! -
- Keywords Example(s) ? relevance feedback.
-
48Soft Vector Representation of Annotations
Beach Cherokee Flower
Indian Tourism Beach 1 0
0 0 0.7 Cherokee 1
0 0.6 0.3 Flower 1
0 0.1 Indian
1 0.5 Tourism
1
Cherokee, Indian
0, 1, 0, 1, 0.5
49Pseudoclassification in Image Domain WARF
- Relevant Terms Cherokee, Indian
- Relevant Term Frequency
- fCherokee 3
- fIndian 2
- Co-occurrence Frequency
- cCherokee, Indian 1
- So
- SCherokee, Indian SCherokee, Indian
3X(2-1) - SCherokee, Indian 3
CherokeeIndian
Shop Cherokee
Cherokee Ceremony
Ceremony Beach
Indian Artifacts
Shop Artifacts
50Estimated Concept Similarity Matrix
First simulation 30 words, 5000 images, up to
3 keywords per image. User model searching
car, truck, and motorcycle (a) 5 rounds of
training (b) 20 rounds (c) 80 rounds
51Scalability?
- 1000 words, 5000 images.
- Up to 5 keywords per image, 30 rounds training
- (b) Up to 5 keywords per image, 100 rounds
- (c) Up to 80 keywords per image, 30 rounds.
52Multiple Classes
The Second Simulation User model Searching for
three keyword classes.
53Keyword Classification
- Keyword classification
- words ? classes
- to facilitate automatic query expansion and
other inference tasks - Hopfield Network Activation An iterative
clustering approach based on mutual distances. -
-
-
54Conclusion
- Relevance feedback process can be utilized to
learn relationships among words. - Unifying keywords and contents in image retrieval
can provide the user with more accuracy and more
flexibility.