Title: It
1Its a 3D World!Toward a Qualitative
Representation of a Scene
- Alyosha Efros
- Carnegie Mellon University
2The Problem
- Recovering 3D structure from single 2D projection
- Infinite number of possible solutions!
from Sinha and Adelson 1993
3Geometry school disambiguate
J.J. Gibsons Ecological Optics the actively
exploring organism
43D Data Capture
- Stereo
- Structure from Motion
- Laser Range Scanning
- Etc.
3D Point Cloud
5The World Behind the Image
Hoiem, Efros, Hebert, Automatic Photo Pop-up,
SIGGRAPH05
6Our World is Structured
Abstract World
Image Credit (left) F. Cunin and M.J. Sailor,
UCSD
7Pattern Recognition school learn!
- recognize full scenes
- Scene gist TorralbaOliva,2001
- 32x32 images Torralba et al,2007
- (3232)2563 huge space!
- recognize all objects in scene
- scan (rectangular) templates
- recognize each object independently
8Local Object Detection
True Detection
False Detections
Missed
Missed
True Detections
Local Detector Dalal-Triggs 2005
9What the Detector Sees
102D Context is not enough
Close
Not Close
11Geometrically Coherent Image Interpretation
- Derek Hoiem, Alyosha Efros, Martial Heber
12Recognizing (qualitative) Geometry
- Goal learn labeling of image into 7 Geometric
Classes - Support (ground)
- Vertical
- Planar facing Left (?), Center ( ), Right (?)
- Non-planar Solid (X), Porous or wiry (O)
- Sky
?
13Learn from labeled data
- 300 outdoor images from Google Image Search
14Weak Geometric Cues
15Hypothesizing Regions
- Naïve Idea 1 segment the image
- Chicken Egg problem
- Naïve Idea 2 multiple segmentations
- Decide later which segments are good
16Labeling Segments
Using Boosted Decision Tree classifier Trained on
labeled data
17Image Labeling
Labeled Segmentations
Learned from training images
Labeled Pixels
18No Hard Decisions
Support
Vertical
Sky
V-Center
V-Right
V-Porous
V-Solid
V-Left
19Labeling Results
20How robust is it?
21Shadow/Reflection Failures
Input image
Ground Truth
Our Result
22Catastrophic Failures
Input image
Ground Truth
Our Result
23Average Accuracy
Main Class 88.1 Subclasses 61.5
24Understanding a Scene
- Biedermans Relations among Objects in a
Well-Formed Scene (1981)
Support Size Position
- Interposition
- Likelihood of Appearance
Object Support
Object Size
25Object Support
26Object Size ? Camera Viewpoint
Object Position/Sizes
Viewpoint
27More Chickens, More Eggs
28Input to Our Algorithm
Surface Estimates
Viewpoint Prior
Object Detection
Local Car Detector
Local Ped Detector
Surfaces Hoiem-Efros-Hebert 2005
Local Detector Dalal-Triggs 2005
29Scene Parts Are All Interconnected
Objects
3D Surfaces
Viewpoint
30Helping Object Detection
Image
P(object)
31Helping Object Detection
Image
P(surfaces)
P(viewpoint)
P(object surfaces, viewpoint)
P(object)
32Qualitative Results
Car TP / FP Ped TP / FP
Initial 2 TP / 3 FP
Final 7 TP / 4 FP
Local Detector from Murphy-Torralba-Freeman 2003
33Qualitative Results
Car TP / FP Ped TP / FP
Initial 1 TP / 14 FP
Final 3 TP / 5 FP
Local Detector from Murphy-Torralba-Freeman 2003
34Qualitative Results
Car TP / FP Ped TP / FP
Initial 1 TP / 23 FP
Final 0 TP / 10 FP
Local Detector from Murphy-Torralba-Freeman 2003
35Qualitative Results
Car TP / FP Ped TP / FP
Initial 0 TP / 6 FP
Final 4 TP / 3 FP
Local Detector from Murphy-Torralba-Freeman 2003
36Top View
Ped
Ped
Car
37What next?
38The Challenges
39Challenges
40Summary
- 3D is important
- Size, Support, Occlusion, etc. all inherently
3D phenomena! - But exact 3D not needed
- Coarse surface layout
- Coarse depth layering
- Coarse viewpoint inference
- Human seem to have only qualitative 3D
41Automatic Photo Pop-up
Geometric Labels
Original Image
42More Pop-ups
43The Music Video
44Thank you
Questions?