Title: Seeing the World Behind the Image
1Seeing the World Behind the Image
Spatial Layout for 3D Scene Understanding
- Derek Hoiem
- July 10, 2007
- Robotics Institute
- Carnegie Mellon University
Thesis Committee Alexei A. Efros, Martial
Hebert, Rahul Sukthankar, Takeo Kanade, William
Freeman
2Scene Understanding
3The World Behind the Image
43D Spatial Layout
SKY
VERTICAL
VERTICAL
SUPPORT
- Description of 3D Surfaces
- Occlusion Relationships
- Camera Viewpoint Objects
53D Spatial Layout
- Description of 3D Surfaces
- Occlusion Relationships
- Camera Viewpoint Objects
63D Spatial Layout
Car
Person
Car
- Description of 3D Surfaces
- Occlusion Relationships
- Camera Viewpoint Objects
7Recent Work in 3D
Oliva Torralba 2001
Saxena, Chung Ng 2005
Torralba, Murphy Freeman 2003
8Our Main Challenge
- Recovering 3D geometry from single 2D projection
- Infinite number of possible solutions!
9Our World is Structured
Abstract World
Image Credit (left) F. Cunin and M.J. Sailor,
UCSD
10Early Work in 3D Scene Understanding
Guzman 1968
Ohta Kanade 1978
- Hansen Riseman 1978 (VISIONS)
- Barrow Tenenbaum 1978 (Intrinsic Images)
- Brooks 1979 (ACRONYM)
- Marr 1982 (2½ D Sketch)
11Learn the Structure of the World
12Infer Most Likely Scene
Unlikely
Likely
13Description of 3D Surfaces
- Goal Label image into 7 Geometric Classes
- Support
- Vertical
- Planar facing Left (?), Center ( ), Right (?)
- Non-planar Solid (X), Porous or wiry (O)
- Sky
?
14Use All Available Cues
Color, texture, image location
Vanishing points, lines
Texture gradient
15Get Good Spatial Support
50x50 Patch
50x50 Patch
16Image Segmentation
- Single segmentation wont work
- Solution multiple segmentations
17Labeling Segments
For each segment - Get P(good segment data)
P(label good segment, data)
18Image Labeling
Labeled Segmentations
Labeled Pixels
19Confidences from Logistic Adaboost with Decision
Trees
Gray?
High in Image?
No
Yes
No
Yes
High in Image?
Many Long Lines?
Smooth?
Green?
No
No
Yes
Yes
No
Yes
Yes
No
Blue?
Very High Vanishing Point?
Yes
No
Yes
No
P(label good segment, data)
Ground Vertical Sky
Collins et al. 2002
20Surface Confidence Maps
Input Image
Most Likely Labels
Vertical
Sky
Support
21Surface Estimates Outdoor
Avg. Accuracy Main Class 88 Subclass 62
Input Image
Ground Truth
Our Result
22Surface Estimates Indoor
Avg. Accuracy Main Class 93 Subclass 76
Input Image
Ground Truth
Our Result
23Automatic Photo Popup
Labeled Image
Fit Ground-Vertical Boundary with Line Segments
Form Segments into Polylines
Cut and Fold
Final Pop-up Model
Hoiem Efros Hebert 2005
24Robot Navigation
Nabbe Hoiem Hebert Efros 2006
25Robot Navigation
26Image
Ground Truth
27Occlusion Reasoning is Necessary
Ground Truth
3D Model
28Recover Major Occlusions
29Prior Work Finding Boundaries
NCuts Segmentation
Input Image
Pb Boundaries
NCuts Cour et al. 2004
Pb Martin et al. 2002
30Segmentation into Physical Boundaries
31Prior Work Figure/Ground Assignment
- Line labeling approach
- Focus on junctions
Guzman 1968
also Clowes 1971, Huffman 1971, Waltz 1975, ,
Saund 2006
32Prior Work Figure/Ground Assignment
Input Image
Figure/Ground Goal
Pb Boundaries
Human Boundaries
Figure/Ground Accuracy
Shapemes CRF
Pb Boundaries 68.9
Human Boundaries 78.3
Boundary Shape Cues
Continuity/Junction Cues
Ren et al. 2006
33Recover Major Occlusions
Occlusion Boundaries
Inferred Depth
34Start with Oversegmentation
Occlusion boundary?
Initial Segmentation
352D Cues for Occlusions
Region Color and Texture
Boundaries Strength and Continuity
362D Junctions
2
1
3
2D Boundary T-Junction
Image
373D Surface Clues for Occlusions
Support
Planar
Porous
Sky
Solid
2
3
1
Geometric T-Junction
Surface Labels
383D Depth Cues for Occlusion
Surfaces
Initial Boundaries
Depth Underestimate
Depth Overestimate
39Illustration of Depth Range
SKY
SUPPORT
Depth (Min)
Image
Depth (Max)
40Gradual Occlusion Inference
?
Initial Segmentation
Final Boundaries
Initial Depth (Min)
Initial Depth (Max)
41Gradual Occlusion Inference
P(occlusion)
Soft Boundary Map
Stage 1 Result
42Gradual Occlusion Inference
P(occlusion)
Soft Boundary Map
Stage 1 Result
43Gradual Occlusion Inference
P(occlusion) CRF(continuity, closure)
Soft-Max Boundary Map
Stage 2 Result
44Gradual Occlusion Inference
P(occlusion) CRF(continuity, closure, surfaces)
Stage 3 Result
Soft-Max Boundary Map
45Final Estimate
Depth (Min)
Boundaries, Foreground/Background, Contact
Depth (Max)
46Evaluation
- Training 50 images
- Testing 250 images (50 quantitative)
47Occlusion vs. Non-Occlusion
48Foreground/Background Accuracy
Ours
Edge/Region Cues 3D Cues With CRF
Stage 1 58.7 71.7
Stage 2 65.4 75.6 77.3
Stage 3 68.2 77.1 79.9
Ren et al. 2006, Corel Images
Shapemes CRF
Pb Boundaries 68.9
Human Boundaries 78.3
49Occlusion Result
Depth (Min)
Depth (Max)
Boundaries, Foreground/Background, Contact
50Occlusion Result
Depth (Min)
Boundaries, Foreground/Background, Contact
Depth (Max)
513D Model with Occlusions
3D Model without Occlusion Reasoning
3D Model with Occlusion Reasoning
52Recovering Viewpoint and Objects
Objects
3D Surfaces
Viewpoint
53Results of a 2D Pedestrian Detector
True Detection
False Detections
Missed
Missed
True Detections
Detector from Dalal Triggs 2005
542D Contextual Reasoning
Kumar Hebert 2005
Torralba Murphy Freeman 2004
- Winn Shotton 2006
- Fink Perona 2003
- Carbonetto Freitas Banard 2004
- He Zemel Cerreira-Perpiñán 2004
55Reasoning within the 3D Scene
Close
Not Close
56Camera Viewpoint
57Object Size ? Camera Viewpoint
Input Image
Loose Viewpoint Prior
58Object Size ? Camera Viewpoint
Input Image
Loose Viewpoint Prior
59Object Size ? Camera Viewpoint
Object Position/Sizes
Viewpoint
60Object Size ? Camera Viewpoint
Object Position/Sizes
Viewpoint
61Object Size ? Camera Viewpoint
Object Position/Sizes
Viewpoint
62Object Size ? Camera Viewpoint
Object Position/Sizes
Viewpoint
63Camera Viewpoint ??Object Height
Input Image
2D Object Heights
3D Object Heights
64Viewpoint from Scene Matching
LabelMe with Viewpoint Annotations
Input Image
65What does surface and viewpoint say about objects?
Image
P(object)
66What does surface and viewpoint say about objects?
Image
P(surfaces)
P(viewpoint)
P(object surfaces, viewpoint)
P(object)
67Input to Our Algorithm
Surface Estimates
Viewpoint Initial
Object Detection
Local Car Detector
Local Ped Detector
Surfaces
68Exact Inference over Tree with Belief Propagation
Viewpoint
?
Local Object Evidence
Local Object Evidence
Objects
...
o1
on
Local Surface Evidence
Local Surface Evidence
Local Surfaces
s1
sn
69Improved Viewpoint Estimate
Viewpoint Initial
Viewpoint Final
Likelihood
Likelihood
Horizon
Height
Horizon
Height
70Improved Object Estimate
Car TP / FP Ped TP / FP
Initial (Local)
Final (Global)
Car Detection
4 TP / 1 FP
4 TP / 2 FP
Ped Detection
4 TP / 0 FP
3 TP / 2 FP
71Experiments on LabelMe Dataset
- Testing with LabelMe dataset
- Cars as small as 14 pixels
- Peds as small as 36 pixels
72More Tasks ? Better Detection
Local Detector from Murphy et al. 2003
Car Detection
Pedestrian Detection
All Information Objects View
Objects Geom Objects Only
All Information Objects View
Objects Geom Objects Only
Detection Rate
Detection Rate
False Positives Per Image
False Positives Per Image
Hoiem Efros Hebert 2006
73Good Detectors Become Better
Local Detector from Dalal-Triggs 2005
Car Detection
Pedestrian Detection
All Information Objects Only
All Information Objects Only
74Better Detectors ? Better Viewpoint
Using 2005 Local Detector
Using 2003 Local Detector
Horizon Prior
Median Error
8.5
3.8
3.0
90 Bound
75More is Better
- More objects ? Better viewpoint
estimates - Detect Cars Only 7.3 Error
- Detect Peds Only 5.0 Error
- Detect Both 3.8 Error
Better viewpoint ? Better object
detection 10 fewer false positives at same
detection rate
76Results
Car TP / FP Ped TP / FP
Initial 6 TP / 1 FP
Final 9 TP / 0 FP
77Results
Car TP / FP Ped TP / FP
Initial 3 TP / 3 FP
Final 5 TP / 1 FP
78Putting Objects in Perspective
Ped
Ped
Car
79Geometrically Coherent Image Interpretation
Surface Maps
Support
Viewpoint/Size Reasoning
Viewpoint and Objects
80Geometrically Coherent Image Interpretation
Surface Maps
Depth, Boundaries
Support
Boundaries
Horizon, Object Maps
Horizon, Object Maps
Viewpoint/Size Reasoning
Viewpoint and Objects
81Geometrically Coherent Image Interpretation
Input
Surfaces
Occlusion Boundaries
Viewpoint and Objects
82Geometrically Coherent Image Interpretation
Input
Surfaces
Occlusion Boundaries
Viewpoint and Objects
83Geometrically Coherent Image Interpretation
Input
Surfaces
Occlusion Boundaries
Viewpoint and Objects
84Geometrically Coherent Image Interpretation
Input
Surfaces
Occlusion Boundaries
Viewpoint and Objects
85Next Steps
- More robust and comprehensive high level
reasoning - Learn perceptual similarity and general
appearance models
86Conclusions
- One image contains much 3D information
- Learn statistical models of the structure of our
world from training images - Important aspects of approach
- Use all available cues
- Delay decisions
- Think of vision as one 3D scene understanding
problem
87Video
88Thank you
- Acknowledgements
- Committee Alyosha, Martial, Rahul, Takeo, and
Bill - Practice Presentation Srinivas, Tom, Alex
89(No Transcript)
90Vision as Scene Understanding
Ohta Kanade 1978
- Guzman (SEE), 1968
- Hansen Riseman (VISIONS), 1978
- Barrow Tenenbaum 1978
- Brooks (ACRONYM), 1979
- Marr (2 ½ D sketch), 1982
- Ohta Kanade, 1978
91Vision as Scene Understanding
Guzman 1968
Ohta Kanade 1978
92Results
Car TP / FP Ped TP / FP
93Failures
94Failures Reflections, Rare Viewpoint
Input Image
Ground Truth
Our Result
95Results
Car TP / FP Ped TP / FP
Initial 1 TP / 23 FP
Final 0 TP / 10 FP
Local Detector from Murphy-Torralba-Freeman 2003
96Results
Car TP / FP Ped TP / FP
Initial 1 TP / 5 FP
Final 5 TP / 2 FP
97How do we get robust scene priors?
Hill
Standing on Step
98(No Transcript)
99How to find occluding contours?
100Other slides
101Overview of Our Algorithm
Input Image
Multiple Segmentations
Surface Estimates
Final Labels
Learned Models
102Estimating surface properties
- We want to know
- Is a segment is good?
- If so, what is the surface label?
- Learn these likelihoods from training images
P(good segment data)
P(label good segment, data)
103Results
Input Image
Ground Truth
Our Result
104Results
Input Image
Ground Truth
Our Result
105Average Accuracy
Main Class 88.1 Subclasses 61.5
106Experiments Input Image
107Experiments Ground Truth
108Experiments Our Result
109Surface Estimates Paintings
Input Image
Our Result
110Object Pasting
Lalonde et al. 2007
111Object Pasting
Before
After
112Object Pasting
Before
After
113Are Surfaces Enough?
114(No Transcript)
1153D Surface Clues for Occlusions
Support
Planar
Porous
Sky
Solid
2
1
3
2D Boundary T-Junction
Surface Labels