Title: Pattern Theory: the Mathematics of Perception
1Pattern Theory the Mathematics of Perception
- Prof. David Mumford
- Division of Applied Mathematics
- Brown University
- International Congress of Mathematics Beijing,
2002
2Outline of talk
- I. Background history, motivation, basic
definitions - A basic example Hidden Markov Models and
speech and extensions - The natural degree of generality Markov
Random Fields and vision applications - IV. Continuous models image processing via
PDEs, self-similarity of images and random
diffeomorphisms
URL www.dam.brown.edu/people/mumford/Papers
/ICM02powerpoint.pdf or /ICM02proceedings.pdf
3Some History
- Is there a mathematical theory underlying
intelligence? - 40s Control theory (Wiener-Pontrjagin), the
output side driving a motor with noisy feedback
in a noisy world to achieve a given state - 70s ARPA speech recognition program
- 60s-80s AI, esp. medical expert systems,
modal, temporal, default and fuzzy logics and
finally statistics - 80s-90s Computer vision, autonomous land
vehicle
4Statistics vs. Logic
- Plato If Theodorus, or any other geometer,
were prepared to rely on plausibility when
he was doing geometry, he'd be worth absolutely
nothing.
- Gauss Gaussian distributions, least squares ?
relocating lost Ceres from noisy incomplete data - Control theory the Kalman-Wiener-Bucy filter
- AI Enhanced logics lt Bayesian belief networks
- Vision Boolean combinations of features lt
Markov random fields
- Graunt counting corpses in
medieval London
5What you perceive is not what you hear
- ACTUAL SOUND
- The ?eel is on the shoe
- The ?eel is on the car
- The ?eel is on the table
- The ?eel is on the orange
- PERCEIVED WORDS
- The heel is on the shoe
- The wheel is on the car
- The meal is on the table
- The peel is on the orange
(Warren Warren, 1970)
Statistical inference is being used!
6Why is this old man recognizable from a cursory
glance?
His outline is lost in clutter, shadows and
wrinkles except for one ear, his face is
invisible. No known algorithm will find him.
7The Bayesian Setup, I
8The Bayesian Setup, II
- This is called the posterior distribution on xh
- Sampling Pr(xo,xh?), synthesis is the acid
test of the model
- The central problem of Statistical learning
theory - The complexity of the model and the
Bias-Variance dilemma - Minimum Description LengthMDL,
- Vapniks VC dimension
9A basic example HMMs and speech recognition
I. Setup
10A basic example HMMs and speech recognition
II. Inference by dynamic programming
(c) Optimizing the ?s done by EM algorithm,
valid for any exponential model
11Continuous and discrete variables in perception
- Perception locks on to discrete labels, and the
- world is made up of discrete objects/events
- High kurtosis is natures universal signal of
- discrete events/objects in space-time.
- Stochastic process with i.i.d. increments has
- jumps iff the kurtosis k of its increments is
gt 3.
12A typical stochastic process with jumps
Xt stochastic process with independent
increments, then
13Ex. daily log-price changes in a sample of stocks
Note fat power law tails
N.B. vertical axis is log of probability
14Particle filtering
- Compiling full conditional probability tables is
usually impractical.
15Estimating the posterior distribution on optical
flow in a movie (from M.Black)
Horizontal flow
16(follow window in red)
Horizontal flow
17 Horizontal flow
18 Horizontal flow
19 Horizontal flow
20 Horizontal flow
21No process is truly Markov
- Speech has longer range patterns than phonemes
triphones, words, sentences, speech acts, - PCFGs probabilistic context free grammars
almost surely finite, labeled, random branching
processes - Forest of random trees Tn, labels xv on
vertices, leaves in 11 corresp with
observations sm, prob. p1(xvkxv) on children,
p2(smxm) on observations.). - Unfortunate fact nature is not so obliging,
longer range constraints force context-sensitive
grammars. But how to make these stochastic??
22Grammar in the parsed speech of Helen, a 2 ½
year old
23Grammar in images (G. Kanisza)contour completion
24Markov Random Fields the natural degree of
generality
- Time ?linear structure of dependencies
- space/space-time/abstract situations ? general
graphical structure of dependencies
The Markov property xv, xw are conditionally
independent, given xS , if S separates v,w in
G. Hammersley-Clifford the converse.
25A simple MRF the Ising model
sk1,l-1
sk1,l
sk1,l1
sk,l
sk,l1
sk,l-1
sk-1,l-1
sk-1,l
sk-1,l1
26The Ising model and image segmentation
27A state-of-the-art image segmentation algorithm
(S.-C. Zhu)
Input Segmentation
Synthesis from model
I p(
I W)
Hidden variables describe segments and their
texture, allowing both slow and abrupt intensity
and texture changes (See also Shi-Malik)
28Texture synthesis via MRFs
On left a cheetah hide In middle, a sample
from the Gaussian model with identical second
order statistics On right, a sample from
exponential model reproducing 7 filter marginals
using
29Monte Carlo Markov Chains
Basic idea use artificial thermal dynamics to
find minimum energy (maximum probability) states
30Bayesian belief propagation and the Bethe
approximation
- Can find modes of MRFs on trees using dynamic
programming
- Bayesian belief propagation finding the modes
of the Bethe - approximation with dynamic programming
31Continuous models Ideblurring and denoising
- Observe noisy, blurred image I,
- seek to remove noise, enhance edges
simultaneously!
32An example Bela Bartok enhanced via the
Nitzberg-Shiota filter
33Continuous models II images and scaling
- The statistics of images of natural scenes
appear to be a fixed point under block-averaging
renormalization, i.e. - Assume N?N images of natural scenes have a
certain probability distribution form N/2?N/2
images by a window or by 2?2 averages get the
same marginal distribution!
34Scale invariance has many implications
- Intuitively, this is what we call clutter
the mathematical explanation of why vision is
hard
35Three axioms for natural images
1. Scale invariance
3. Local image patches are dominated by
preferred geometries edges, bars, blobs as
well as blue sky blank patches (D.Marr,
B.Julesz, A.Lee).
It is not known if these axioms can be exactly
satisfied!
36Empirical data on image filter responses
Probability distributions of 1 and 2 filters,
estimated from natural image data. a) Top plot
is for values of horizontal first difference of
pixel values middle plot is for random 0-mean
8x8 filters. Vertical axis in top 2 plots is
log(prob.density). b) Bottom plot shows level
curves of Joint prob.density of vert.differences
at two horizontally adjacent pixels. All are
highly non-Gaussian!
37Mathematical models for random images
38Continuous models IIIrandom diffeomorphisms
- The patterns of the world include shapes,
structures which recur with distortions
e.g. alphanumeric characters, faces, anatomy - Thus the hidden variables must include (i)
clusters of similar shapes, (ii) warpings
between shapes in a cluster - Mathematically need a metric on (i) the space
of diffeomorphisms Gk of ?k, or (ii) the space
of shapes Sk in ?k (open subsets with smooth
bdry) - Can use diffusion to define a probability measure
on Gk .
39Metrics on Gk, I
40Metrics on Gk, II
Note linear in u, so u can be a generalized
function!
41Geodesics in the quotient space S2
- S2 has remarkable structure
- Weak Hilbert manifold
- Medial axis gives it a cell decomposition
- Geometric heat equation defines a deformation
retraction - Diffusion defines probability measure
- (Dupuis-Grenander-Miller, Yip)
42Geodesics in the quotient space of landmark
points gives a classical mechanical
system(Younes)
43Outlook for Pattern Theory
- Finding a rich class of stochastic models
adequate - for duplicating human perception yet tractable
- (vision remains a major challenge)
- Finding algorithms fast enough to make
inferences - with these models (Monte Carlo? BBP ?
- competing hypothesis particles?)
- Underpinnings for a better biological theory of
- neural functioning e.g. incorporating particle
- filtering? grammar? warping? feedback?
URL www.dam.brown.edu/people/mumford/Papers
/ICM02powerpoint.pdf or /ICM02proceedings.pdf
44(No Transcript)
45A sample of Graunts data