Title: Ecological Statistics and Visual Grouping
1Ecological Statistics and Visual Grouping
- Jitendra Malik
- U.C. Berkeley
2Collaborators
- David Martin
- Charless Fowlkes
- Xiaofeng Ren
3From Images to Objects
"I stand at the window and see a house, trees,
sky. Theoretically I might say there were 327
brightnesses and nuances of colour. Do I have
"327"? No. I have sky, house, and trees." --Max
Wertheimer
4Grouping factors
5Critique
- Predictive power
- Factors for complex, natural stimuli ?
- How do they interact ?
- Functional significance
- Why should these be useful or confer some
evolutionary advantage to a visual organism? - Brain mechanisms
- How are these factors implemented given what we
know about V1 and higher visual areas?
6(No Transcript)
7Our approach
- Creating a dataset of human segmented images
- Measuring ecological statistics of various
Gestalt grouping factors - Using these measurements to calibrate and
validate approaches to grouping
8Natural Images arent generic signals
- Edges/Filters/Coding Ruderman 1994/1997,
Olshausen/Field 1996, Bell/Sejnowski 1997,
Hateren/Schaaf 1998, Buccigrossi/Simoncelli 1999,
Alvarez/Gousseau/Morel 1999, Huang/Mumford 1999 - Range Data Huang/Lee/Mumford 2000
9Brunswik Kamiya 1953
- Unification of two important theories of
perception - Statistical/Bayesian formulation, dueto
Helmholtz Likelihood Principle - Gestalt Psychology
- Attempted an empirical proof of the Gestalt
grouping rule of proximity - 892 separations
- Ahead of his time.
- Now we have the tools to do this.
Egon Brunswik (1903-1955)
10Outline
- Collect Data
- Learn Local Boundary Model (Low-Level Cues)
- Learn Pixel Affinity Model (Mid-Level Cues)
- Discussion and Conclusion
11(No Transcript)
12(No Transcript)
13(No Transcript)
14Protocol
- You will be presented a photographic image.
Divide the image into some number of segments,
where the segments represent things or parts
of things in the scene. The number of segments
is up to you, as it depends on the image.
Something between 2 and 30 is likely to be
appropriate. It is important that all of the
segments have approximately equal importance. - Custom segmentation tool
- Subjects obtained from work-study program (UC
Berkeley undergraduates)
15(No Transcript)
16(No Transcript)
17Segmentations are Consistent
Perceptual organization forms a tree
Image
BG
L-bird
R-bird
bush
far
grass
beak
body
beak
body
head
eye
eye
head
Two segmentations are consistent when they can
be explained by the same segmentation tree (i.e.
they could be derived from a single perceptual
organization).
- A,C are refinements of B
- A,C are mutual refinements
- A,B,C represent the same percept
- Attention accounts for differences
18(No Transcript)
19Dataset Summary
- 30 subjects, age 19-23
- 17 men, 13 women
- 9 with artistic training
- 8 months
- 1,458 person hours
- 1,020 Corel images
- 11,595 Segmentations
- 5,555 color, 5,554 gray, 486 inverted/negated
20Gray, Color, InvNeg Datasets
- Explore how various high/low-level cues affect
the task of image segmentation by subjects - Color full color image
- Gray luminance image
- InvNeg inverted negative luminance image
21Color
Gray
InvNeg
22InvNeg
23Color
Gray
InvNeg
24Outline
- Collect Data
- Learn Local Boundary Model (Low-Level Cues)
- The first step in human vision finding edges
- Required for all segmentation algorithms
- Learn Pixel Affinity Model (Mid-Level Cues)
- Discussion and Conclusion
25Dataflow
Pb
Image
Boundary Cues
Cue Combination
Brightness
Model
Color
Texture
Challenges texture cue, cue combination Goal
learn the posterior probability of a boundary
Pb(x,y,?) from local information only
26(No Transcript)
27Brightness and Color Features
- 1976 CIE Lab colorspace
- Brightness Gradient BG(x,y,r,?)
- ?2 difference in L distribution
- Color Gradient CG(x,y,r,?)
- ?2 difference in a and b distributions
28Texture Feature
- Texture Gradient TG(x,y,r,?)
- ?2 difference of texton histograms
- Textons are vector-quantized filter outputs
29Cue Combination Models
- Classification Trees
- Top-down splits to maximize entropy, error
bounded - Density Estimation
- Adaptive bins using k-means
- Logistic Regression, 3 variants
- Linear and quadratic terms
- Confidence-rated generalization of AdaBoost
(SchapireSinger) - Hierarchical Mixtures of Experts (JordanJacobs)
- Up to 8 experts, initialized top-down, fit with
EM - Support Vector Machines (libsvm, ChangLin)
- Gaussian kernel, ?-parameterization
- Range over bias, complexity, parametric/non-parame
tric
30Computing Precision/Recall
- Recall Pr(signaltruth) fraction of ground
truth found by the signal - Precision Pr(truthsignal) fraction of signal
that is correct - Always a trade-off between the two
- Standard measures in information retrieval (van
Rijsbergen XX) - ROC from standard signal detection the wrong
approach - Strategy
- Detector output (Pb) is a soft boundary map
- Compute precision/recall curve
- Threshold Pb at many points t in 0,1
- Recall Pr(Pbgttseg1)
- Precision Pr(seg1Pbgtt)
31Classifier Comparison
Goal
More Noise
More Signal
32ROC vs. Precision/Recall
Truth
P N
P TP FP
N FN TN
Signal
ROC Curve Hit Rate TP / (TPFN) False Alarm
Rate FP / (FPTN) PR Curve Precision TP /
(TPFP) Recall TP / (TPFN)
/
/
/
/
33Cue Calibration
- All free parameters optimized on training data
- All algorithmic alternatives evaluated by
experiment - Brightness Gradient
- Scale, bin/kernel sizes for KDE
- Color Gradient
- Scale, bin/kernel sizes for KDE, joint vs.
marginals - Texture Gradient
- Filter bank scale, multiscale?
- Histogram comparison L1, L2, L?, ?2, EMD
- Number of textons, Image-specific vs. universal
textons - Localization parameters for each cue
34Calibration Example Number of Textons for the
Texture Gradient
35Calibration Example 2 Image-Specific vs.
Universal Textons
36Boundary Localization
Non-Boundaries
Boundaries
TG
(1) Fit cylindrical parabolas to raw oriented
signal to get local shape (Savitsky-Golay)
(2) Localize peaks
37Dataflow
Pb
Image
Optimized Cues
Cue Combination
Brightness
Model
Color
Texture
38Classifier Comparison
39Cue Combinations
40Alternate Approaches
- Canny Detector
- Canny 1986
- MATLAB implementation
- With and without hysteresis
- Second Moment Matrix
- Nitzberg/Mumford/Shiota 1993
- cf. Förstner and Harris corner detectors
- Used by Konishi et al. 1999 in learning framework
- Logistic model trained on full eigenspectrum
41Pb Images
Canny
2MM
Us
Human
Image
42Pb Images II
Canny
2MM
Us
Human
Image
43Pb Images III
Canny
2MM
Us
Human
Image
44Two Decades of Boundary Detection
45Findings
- A simple linear model is sufficient for cue
combination - All cues weighted approximately equally in
logistic - Proper texture edge model is not optional for
complex natural images - Texture suppression is not sufficient!
- Significant improvement over state-of-the-art in
boundary detection - Pb(x,y,?) useful for higher-level processing
- Empirical approach critical for both cue
calibration and cue combination
46Spatial priors on image regions and contours
47Good Continuation
- Wertheimer 23
- Kanizsa 55
- von der Heydt, Peterhans Baumgartner 84
- Kellman Shipley 91
- Field, Hayes Hess 93
- Kapadia, Westheimer Gilbert 00
-
- Parent Zucker 89
- Heitger von der Heydt 93
- Mumford 94
- Williams Jacobs 95
-
48Outline of Experiments
- Prior model of contours in natural images
- First-order Markov model
- Test of Markov property
- Multi-scale Markov models
- Information-theoretic evaluation
- Contour synthesis
- Good continuation algorithm and results
49Contour Geometry
- First-Order Markov Model
- ( Mumford 94, Williams Jacobs 95 )
- Curvature white noise ( independent from
position to position ) - Tangent t(s) random walk
- Markov property the tangent at the next
position, t(s1), only depends on the previous
tangent t(s)
t(s1)
s1
t(s)
s
50Test of Markov Property
Segment the contours at high-curvature positions
51Prediction Exponential Distribution
- If the first-order Markov property holds
- At every step, there is a constant probability p
that a high curvature event will occur - High curvature events are independent from step
to step - ? Then the probability of finding a segment of
length k with no high curvature is (1-p)k
52Empirical Distribution
Exponential ?
53Empirical Distribution Power Law
Probability density
Contour segment length
54Power Laws in Nature
- Power Laws widely found in nature
- Brightness of stars
- Magnitude of earthquakes
- Population of cities
- Word frequency in natural languages
- Revenue of commercial corporations
- Connectivity in Internet topology
-
- Usually characterized by self-similarity and
multi-scale phenomena
55Multi-scale Markov Models
- Assume knowledge of contour orientation at
coarser scales
t(s1)
s1
2nd Order Markov P( t(s1) t(s) , t(1)(s1)
) Higher Order Models P( t(s1) t(s) ,
t(1)(s1), t(2)(s1), )
t(s)
s
56Information Gain in Multi-scale
14.6
of total entropy ( at order 5 )
H( t(s1) t(s) , t(1)(s1), t(2)(s1), )
57Contour Synthesis
58Multi-scale in Natural Images
- Arbitrary viewing distance
- Multi-scale in object shape
59Conditioned on Object Size
Probability density
Contour segment length
60Distribution of Region Convexity
61Multi-scale Contour Completion
- Coarse-to-Fine
- Coarse-scale completes large gaps
- Fine-scale detects details
- Completed contours at coarser scales are used in
the higher-order Markov models of contour prior
for finer scales - P( t(s1) t(s) , t(1)(s1), )
62Multi-scale Example
coarse scale
fine scale w/o multi-scale
fine scale w/ multi-scale
input
63Comparison same number of edge pixels
Our result
Canny
64Comparison same number of edge pixels
Our result
Canny
65Outline
- Collect and Validate Data
- Learn Local Boundary Model (Low-Level Cues)
- Learn Pixel Affinity Model (Mid-Level Cues)
- Good representation for segmentation algorithms
- Keeps segmentation in a probabilistic framework
- Discussion and Conclusion
66Dataflow
EstimatedAffinity
Image
Region Cues
E
Segment
Edge Cues
- Eij affinity between pixels i and j
- Representation for graph-theoretic segmentation
algorithms - Minimum Spanning Trees - Zahn 1971, Urquhart 1982
- Spectral Clustering - Scott/Longuet-Higgins 1990,
Sarkar/Boyer 1996 - Graph Cuts - Wu/Leahy 1993, Shi/Malik 1997,
Felzenszwalb/Huttenlocher 1998,
Gdalyahu/Weinshall/Werman 1999 - Matrix Factorization - Perona/Freeman 1998
- Graph Cycles - Jermyn/Ishikawa 2001
67Pixel Affinity Cues
- Patch similarity (?3)
- Edges strength of intervening contour (?3)
- Image plane distance
- All cues calibrated withrespect to training data
- Goal Learn affinity function from the
datasetusing these 7 cues
68Dataflow
EstimatedAffinity (E)
Image
Region Cues
Segment
Edge Cues
69Two Evaluation Methods
Gij
Eij
- Precision-Recall of same-segment pairs
- Precision is Pr(Gij1Eijgtt)
- Recall is Pr(EijgttGij1)
- Mutual Information between E and G
70Mutual information
where x is a cue and y is indicator of being in
same segment
71Individual Features
Gradients
Patches
72The Distance Cue
cf. Bunskwik Kamiya 1953
73Feature Pruning
Top-Down
Bottom-Up
4 Good cues Texture edge/patch, Color patch,
Brightness edge 2 Poor cues Color edge,
Brightness patch
74Affinity Model vs. Humans
75Results
- Common Wisdom Use patches only / Use edges only
Finding Use both. - Common Wisdom Must use patches for texture
Finding Not true. - Common Wisdom Color is a powerful grouping cue
Finding True, but texture is better - Common Wisdom Brightness patches are a poor
cue Finding True (shadows) - Common Wisdom Proximity is a (Gestalt) grouping
cue Finding Proximity is a result, not a cause
of grouping
76Outline
- Collect and Validate Data
- Learn Local Boundary Model (Low-Level Cues)
- Learn Pixel Affinity Model (Mid-Level Cues)
- Discussion and Conclusion
77Contribution
- Provide a mathematical foundation for the
grouping problem in terms of the ecological
statistics of natural images. - "When you can measure what you are speaking about
and express it in numbers, you know something
about it but when you cannot measure it, when
you cannot express it in numbers, your knowledge
is of the meager and unsatisfactory kind." --Lord
Kelvin