Title: Towards%20Total%20Scene%20Understanding:%20Classification,%20Annotation%20and%20Segmentation%20in%20an%20Automatic%20Framework
1Towards Total Scene UnderstandingClassification,
Annotation and Segmentation in an Automatic
Framework
Fei-Fei Li (publish under L. Fei-Fei) Computer
Science Dept. Psychology Dept. Princeton
University
2Li-Jia Li, PhD candidate Computer Science Dept
Stanford University
3City Travel
Pagoda
Sunrise Sunshine Sun
4Classification
City Travel
Total Scene Understanding
U
Segmentation
Pagoda
Annotation
Sunrise Sunshine Sun
5Application
6Classification
Annotation
Segmentation
Mutually beneficial!
7Classification
Annotation
Segmentation
class Polo
Athlete Horse Grass Trees Sky Saddle
Horse
Horse
8Classification
Annotation
Segmentation
class Polo
Sky
Tree
Athlete
Athlete Horse Grass Trees Sky Saddle
Horse
Horse
Horse
Horse
Horse
Horse
Grass
9Classification
Annotation
Segmentation
class Polo
Athlete Horse Grass Trees Sky Saddle
Horse
Horse
Horse
Horse
Horse
10Classification
Annotation
Segmentation
class Polo
Related Work
Oliva et al 01 Lazebnik et al 06
Weber et al. 00 Fergus et al 03 Fei-Fei et al
03 Felzenswalb et al 04
Fei-Fei et al 05 Sivic et al 05 Bosch et al. 06
11Classification
Annotation
Segmentation
Athlete Horse Grass Trees Sky Saddle
Related Work
Blei et al 03
Duygulu et al 02
Alipr (Li et al 03)
Gupta et al 08
Barnard et al 03
12Classification
Annotation
Segmentation
Horse
Horse
Horse
Horse
Horse
Related Work
Cao Fei-Fei 07 Russell et al. 06 Wang et al.
07 Todorovic et al. 06
Sali et al. 99 Winn et al. 05 Kumar et al. 05
Shi Malik 00
Felzenszwalb Huttenlocher 04
13Annotation
Classification
Classification
Segmentation
Annotation
Segmentation
Sky
Tree
Athlete
Horse
Grass
Class Polo
Class Polo
Related Work
Tu et al 03
Li Fei-Fei 07
Heitz et al 08
14Outline
Model
Classification
Learning
Segmentation
Annotation
Recognition Experiment
15C
S
Athlete Horse Grass Trees Sky Saddle
O
T
X
R
Z
Ar
NF
Nr
Nt
D
16class Polo
C
Text
Visual
Athlete Horse Grass Trees Sky Saddle
D
Visual Component
Joint distribution of random variable
.
Text Component
17class Polo
C
Text
Visual
O
D
.
Text Component
17
18class Polo
C
Text
Visual
O
R
Color Location Texture Shape
NF
D
.
Text Component
19class Polo
C
Text
Visual
O
X
R
Ar
NF
D
.
Text Component
20class Polo
C
Text
Visual
Athlete Horse Grass Trees Sky Saddle
O
X
R
Z
Ar
NF
Nr
Nt
D
Connector variable
.
Text Component
21class Polo
C
Switch variable
Text
Visible
Not visible
Visual
S
Athlete Horse Grass Trees Sky Saddle
Athlete Horse Grass Trees Sky Saddle
O
X
R
Z
Ar
NF
Nr
Nt
D
Connector variable
.
22class Polo
C
Switch variable
Text
Visible
Not visible
Visual
S
Athlete Horse Grass Trees Sky Saddle
Horse
O
T
X
R
Z
Ar
NF
Nr
Nt
D
Connector variable
.
231. Visual Spatial-LTM Single global region
descriptor Corr-LDA Describe the image by blobs
(regions). Our Model Local descriptors
Multiple global region descriptors
Cao Fei-Fei, Spatial-LTM 2007
2. Text Spatial-LTM Does not model
text Corr-LDA Words are generated from pixel
level visual info Our Model Visual vs.
non-visual Switch
Blei Jordan, Corr-LDA 2003
C
S
3. Object distribution Corr-LDA Spatial-LTM
Image based multinomial Our Model Class
dependent multinomial (top down strength of
class).
O
T
X
R
Z
NF
Ar
Nr
Nt
D
Our Model
24Outline
Model
Learning
Recognition Experiment
25Learning
Exact Inference is Intractable !
Relationship of the random variables
26Collapsed Gibbs Sampling
(R. Neal, 2000)
Top-down force
Bottom-up force from visual information
Bottom-up force from text information
Relationship of the random variables
27There is no object-text correspondence
Scene/Event images from the Internet
28Our model builds the correspondence
Scene/Event images from the Internet
C
S
O
T
X
R
Z
Ar
NF
Nr
Nt
D
29However, a big obstacle is many objects always
co-occur together
Scene/Event images from the Internet
?
Athlete Horse Grass Ball
?
?
Athlete Horse Grass Trees Sky Saddle
30One solution some good initialization of O
C
Scene/Event images from the Internet
S
O
T
X
R
Z
Nr
NF
Ar
Nt
Athlete Horse Grass Trees Sky Saddle
31Initializing O obtain internet images for each O
Scene/Event images from the Internet
Horse
32Initializing O obtain internet images for each O
Scene/Event images from the Internet
Object images
33Initializing O train an object detector for each
O
Object images
Event/Scene images
Scene/Event images
Any object detection segmentation Algorithm
C
S
O
T
X
R
Z
Ar
NF
Nr
Nt
D
34Initializing O train an object detector for each
O
Object images
Event/Scene images
Scene/Event images
Any object detection segmentation Algorithm
C
S
O
T
X
R
Z
Ar
NF
Nr
Nt
D
35Initialize O in the scene image by the trained
object detectors
Object images
Event/Scene images
Scene/Event images
Any object detection segmentation Algorithm
Black box object detection segmentation
Black box object detection segmentation
C
S
O
T
X
R
Z
Ar
NF
Nr
Nt
D
36Initialize O in the scene image by the trained
object detectors
Object images
Event/Scene images
Cao Fei-Fei, 2007
Scene/Event images
?
C
Black box object detection segmentation
Black box object detection segmentation
O
R
X
Ar
Nr
Black box object detection segmentation
C
S
O
T
X
R
Z
Ar
NF
Nr
Nt
D
Our Model
37Auto-semi-supervised learning Small of
initialized images Large of uninitialized
images
Scene/Event images
Small of initialized images
Large of uninitialized images
C
S
O
T
X
R
Z
Ar
NF
Nr
Nt
D
Our Model
38Auto-semi-supervised learning CaoFei-Fei vs
Our Model
Missing
Cao Fei-Fei, 2007
Scene/Event images
?
C
sky
O
sailboat
R
X
water
Ar
Nr
C
S
O
T
X
R
Z
Ar
NF
Nr
Nt
D
Our Model
39Learning challenges and solutions
Challenges
Solutions
- Collapsed Gibbs sampling
- Internet images
- Automatically initialize O
- Intractable coupling
- Large amount of data
- Co-occur objects words
40Outline
Model
Learning
Small of automatically initialized images
Large of uninitialized images
Recognition Experiment
- Dataset
- Learned Model
- Results
418 Event/Scene Classes
Badminton
Bocce
Croquet
Polo
Remark Tags are not used during testing
428 Event/Scene Classes
Rockclimbing
Rowing
Sailing
Snow boarding
43Learned model O
C
S
O
T
X
R
Z
Ar
NF
Nr
Nt
D
44Learned model O
45Learned model O
46Learned model O
47Learned model O
48Learned model R
Athlete
C
S
Grass
O
T
X
R
Z
Horse
Ar
NF
Nr
Nt
D
49Learned model S
C
S
O
T
X
R
Z
Ar
NF
Nr
Nt
D
50Classification
Annotation
Segmentation
class Polo
Sky
Tree
Athlete
Athlete Horse Grass Trees Sky Saddle
Horse
Horse
Horse
Horse
Horse
Horse
Grass
51Classification
Annotation
Segmentation
8 way classification 54
52Classification
Annotation
Segmentation
Influence of Unlabeled images in learning
Effect of noise in tags
53Effect of multiple features
Classification
Annotation
Segmentation
54Classification
Annotation
Segmentation
Alipr Li et al 03
Corr LDA Blei et al 03
55Classification
Annotation
Segmentation
56Effect of top-down class context
Model w/o top-down class
Full Model
Horse
57Effect of top-down class context
Model w/o top-down class
Full Model
Horse
58Effect of top-down class context
Model w/o top-down class
Full Model
59Effect of top-down class context
60Effect of top-down class context
61Effect of top-down class context
62Effect of top-down class context
63Learning
Model
Small of automatically initialized images
Large of uninitialized images
Recognition Experiment
64Future Work
Top down context
Object vs Object
65Thank
Prof. Silvio Savarese , Juan Carlos Niebles,
Chong Wang, Barry Chai, Min Sun, Bangpeng Yao,
Hao Su, Jia Deng, anonymous reviewers And You