What's New in ContentBased Image Retrieval

1 / 54

About This Presentation

Title:

What's New in ContentBased Image Retrieval

Description:

'Cherokee', 'Indian' 'Beach' 'Cherokee' 'Flower' 'Indian' 'Tourism' 'Beach' 1 0 0 0 0.7 'Cherokee' 1 0 0.6 0.3 'Flower' 1 0 0.1 'Indian' 1 0.5 'Tourism' 1 [0, 1, 0, ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 55

Provided by: qit

more less

Transcript and Presenter's Notes

Title: What's New in ContentBased Image Retrieval

1
What's New in Content-Based Image Retrieval?
Xiang Sean Zhou IFP, Beckman Institute for
Advanced Science and Technology University of
Illinois at Urbana Champaign
2
Background MARS

MARS Multimedia Analysis and Retrieval System

MARS Content Analyzer
MARS User Interface with Relevance feedback
Metadata
3
Outline

Small sample learning algorithm for relevance
feedback
Structural features for image content
representation
Unification of keywords and visual contents

Small sample learning algorithm for relevance
feedback
Image content representation structure
Global Structure Representation
Local Structure Representation
Unification of keywords and visual contents

5
Relevance Feedback Scenario

Machine provides initial retrieval results,
through query-by-keyword, sketch, or example,
etc.
Iteratively
User provides judgment on the current results as
to whether, and to what degree, they are relevant
to her/his request
The machine learns and tries again.

6
The current optimal schemes only deal with
positive examples However, without a doubt,
negative examples can help.

7
Question

With both positive and negative feedbacks, how
different is relevance feedback learning from the
age-old two-class classification problem?

8
2-class SVM under small samples

Target Cluster
9
Observation

The small sample problem
The small number of negative examples can NOT be
representative for the negative class
distribution while for the positive class of
interest, the situation is usually better.
Positive examples are all alike in a way, each
negative example is negative in its own fashion.
Sean Tolstoy

10
Intuition

Happy families are all alike, every unhappy
family is unhappy in its own way.
--Leo Tolstoy, Anna Karenina

11
BiasMap Linear
xi, i1,,NP the positive examples yi,
i1,,NN the negative examples mx is the mean
vector of xi. We want a linear transformation
matrix W, such that

where
and

12
The Small Sample Issue Statistical Bias

Sample-based plug-in biased under small samples
(RDA Friedman, 1989)
Regularized BDA
Discounting negative examples

13
Boosting BiasMap using RankBoost (Freund, et al.
1998)
Given positive set X1, and negative set X0,
Initialize training example weights For t
1, , T    Train a weak BDT using weights Wt
use weighted covariance matrix as the scatter
matrix estimates (Equation (8)). Outputs are in
rt(x).     Get weak hypothesis ht x?(0, 1),
using         Update the weights
Normalize the weights to sum to 1 separately
among positive and negative.
14
A Faster, ad hoc Variant RankBoost.H
For t 1, , T    Train a weak BDT using
weights Wt Outputs are in rt(x).     here
the notation ? 1 when the predicate ? holds,
and 0 otherwise.     Update the weights
Normalize the weights to sum to 1 separately
among positive and negative.
15
Kernel Machine
The original linear algorithm is applied in a
feature space F, which is related to the
original space by a non-linear mapping ? C ? F
x ?? (x) However this mapping will
not be carried out explicitly, but through the
evaluation of a kernel function k(xi, xj)
?T(xi) ?(xj).

16
BiasMap in Feature Space
The task is to rewrite the BiasMap formula in the
feature space into dot-product form

17
Solutions in Kernel Form
The numerator
The denominator
The projection of a new pattern z onto w is given
by
18
The Kernel Matrices
19
Does Kernel Help? Image database testing
20
BiasMap vs. KDA and SVM on Face and non-face
classification

Examples of faces (1000)

Examples of non-faces (1000)
21
Precision in top 1000 SVM?larger_margin_first
Precision in top 1000. ?50. Each point is an
average of 100 trials. SVM returns the points
with larger margins first.
22
Boosting vs. Kernel Face vs. Nonface

Comparable improvement to BDA.
RankBoost.H outperforms RankBoost clearly in
terms of rank difference. In terms of hit rate in
top 1000, they are very close.

23
Boosting vs. Kernel Image Database
Averaged hit rate in top 100 for 500 rounds of
testing
24

Small sample learning algorithm for relevance
feedback
Image content representation structure
Global Structure Representation
Local Structure Representation
Unification of keywords and visual contents

25
Quest for Structure Features

Texture
Repetitive patterns
Effective only for uniform texture images or
regions, or requires reliable segmentation

Shape
Object contour
Requires good segmentation and are only
effective for simple and clean images

26
Defining Structural Feature..

Non-repetitive illuminance patterns in the image
Low-level (or generic)
Features in-between texture and shape
Image/object structure (e.g., edge length),
structural complexity, loops, etc., which may not
be readily expressible by Texture or Shape.

27
Edge contains structural information
28
Gathering Information from Edge Maps

Edge length?
Connectivity?
Complexity?
Line-likeness?
Loopy structure?
Edge Directions?
Etc.

29
Water-Filling Algorithm

Given an edge map, treat the edges as canals
(water channels)
For each set of connected canals (edges), fill
in water until all the water fronts stop
Extract features during the water-filling
process.

30
Water-Filling Edge Features Feature
Primitives

. FillingTime . ForkCount
. LoopCount . WaterAmount
. Horizontal (vertical) Cover .

LoopCount

IF Assume When two water-heads collide, we see
one splash and when n water-heads collide at
the same time, we see n-1 splashes (Think it as
n-1 collide with one sequentially).
THEN
splashes of non-overlapping loops
No overhead in computation.

31
City/Building vs. Landscape
32
Retrieval of Buildings
Scatter plot for of hits in top 10 and top 20
returns

17 out of the 92 images are labeled as
buildings by a human subject
33
Images w/ clear structure (Corel dataset
17,000 images)
Water-filling retrieving images of clear
structure
34
Compare to Texture Features
Water-filling (WF) versus Wavelet Variances (WV)
100 airplanes and 100 eagles as query images
Airplanes

Hit in top 10

Top 20

Top 40

Top 80

WF

3.56

6.29

10.92

18.03

WV

3.32

5.75

9.94

17.07

Eagles

Hit in top 10

Top 20

Top 4
0

Top 80

WF

2.65

3.33

4.91

6.79

WV

1.98

2.82

4.43

6.58

Note that although the averaged numbers are
comparable between the two features, the
underlying matching mechanisms are very different
35
Feature analysis under relevance feedback
20 randomly selected horse images are used as
initial queries,
Feature performance with relevance feedback (C
Color T Texture S Water-filling)
For the horses example, it is also observed
that water-filling features are capable of
pulling the system out of the convergence where
all horses are confined by certain color by
adding horses of different color but the same
edge structures into the top 20 returns.
36

Small sample learning algorithm for relevance
feedback
Image content representation structure
Global Structure Representation
Local Structure Representation
Unification of keywords and visual contents

37
Histogram-based Structure Modeling
Product of m 3-D, mltn
3n-dimensional
Histograms
ICA
IDW DW
x2
s2
Local features vectors
x1
s1
x3
s3
All possible tuples
Image and local k-tuples
IDW DW Inverse distance weighted and distance
weighted histograming, i.e., histograming
increments depend on the compactness of the
tuple.
38
Harris Interest Points
Differential Invariant Jets
39
Differential Invariant Gaussian Jets (Koenderink
and Van Doorn 1987)

Based on Taylor expansion at a point
Stacking partial derivative up to the 3rd order
L Image Intensity L1 ?L/ ?x L2 ?L/ ?y
Einsteinian notation is used, i.e., summation is
implied
J3 Lij Li Lj
L11 L1 L1 2 L12 L1 L2 L22 L2 L2
and J7 has 8 terms, each is a product of one
3rd order and three 1st order derivatives
?12 -?21 1 ?11 ?22 0
The derivatives are carried out by convolution
with the derivatives of Gaussian
Rotation, translation invariants Scale
invariance needs some work (Schmid and Mohr,
1997).

40
2-tuple histograms, 4 ICs
41
Image Retrieval/Classification

Test sets COIL, COREL subsets.
Compared against traditional texture and
structure feature representations and Euclidean
metric
Preliminary results Comparable or Outperform.
COIL COREL-1
Traditional 96
91
PASM 97(HI) 99(L-0.9Norm)
96(HI)
COIL Columbia Object Image Library
COREL-1 Corel Subset of 7 classes and 10 images
per class.
Distance metrics HI, K-L, Chi-squared, Lp

42
?
No ICA, 9 Jets
ICA 9
ICA 3
ICA 7
43
20 rank3
16 car rank2
Object19 rank1
13Phone rank1
7 Piggy rank1
6Marlboro rank1
44
Where is the leopard?
45
Where is the tiger?
46

Small sample learning algorithm for relevance
feedback
Image content representation structure
Global Structure Representation
Local Structure Representation
Unification of keywords and visual contents

47
Desired Working Scenarios
My name is Socks!

A user calls his cat Socks? The system need to
learn that Socks is a cat or a pet!
Keywords Example(s) ? relevance feedback.

48
Soft Vector Representation of Annotations
Beach Cherokee Flower
Indian Tourism Beach 1 0
0 0 0.7 Cherokee 1
0 0.6 0.3 Flower 1
0 0.1 Indian
1 0.5 Tourism
1
Cherokee, Indian
0, 1, 0, 1, 0.5
49
Pseudoclassification in Image Domain WARF

Relevant Terms Cherokee, Indian
Relevant Term Frequency
fCherokee 3
fIndian 2
Co-occurrence Frequency
cCherokee, Indian 1
So
SCherokee, Indian SCherokee, Indian
3X(2-1)
SCherokee, Indian 3

CherokeeIndian
Shop Cherokee
Cherokee Ceremony
Ceremony Beach
Indian Artifacts
Shop Artifacts
50
Estimated Concept Similarity Matrix
First simulation 30 words, 5000 images, up to
3 keywords per image. User model searching
car, truck, and motorcycle (a) 5 rounds of
training (b) 20 rounds (c) 80 rounds
51
Scalability?

1000 words, 5000 images.
Up to 5 keywords per image, 30 rounds training
(b) Up to 5 keywords per image, 100 rounds
(c) Up to 80 keywords per image, 30 rounds.

52
Multiple Classes
The Second Simulation User model Searching for
three keyword classes.
53
Keyword Classification

Keyword classification
words ? classes
to facilitate automatic query expansion and
other inference tasks
Hopfield Network Activation An iterative
clustering approach based on mutual distances.

54
Conclusion

Relevance feedback process can be utilized to
learn relationships among words.
Unifying keywords and contents in image retrieval
can provide the user with more accuracy and more
flexibility.

Write a Comment

User Comments (0)