Title: Pattern Recognition
1Pattern Recognition
- Expectation-Maximization
- Case Studies
2Case Study 1 Object Tracking
- S. McKenna, Y. Raja, and S. Gong, "Tracking color
objects using adaptive mixture models", Image and
Vision Computing, vol. 17, pp. 225-231, 1999.
3Problem
- Tracking color objects in real-time assuming
- Varying illumination
- Varying viewing geometry
- Varying camera parameters
- Current Approaches
- Use non-parametric models based on histograms
(requires lots of data). - Use only one Gaussian whose parameters are
adapted over time.
4Example Track Faces
5Approach
- A statistical model is proposed for modeling
color distributions over time. - The model is based on adaptive Gaussian mixtures
for real-time tracking color objects. - Initialization use a predetermined generic
object color model to initialize (or
re-initialize) the tracker. - Tracking model adapts and improves its
performance by becoming specific to the observed
conditions.
6Example Color Distributions
7Model Adaptation
search window
Pr-1(x/O)
Pr(x/O)
frame r-1
frame r
frame r1
8Assumptions
- The number of components is kept fixed (i.e.,
determined using a fixed data set). - Implication the number of components needed to
accurately model an object's color does not
change significantly with changing viewing
conditions.
9Color Mixture Models
if P(O/xi)gtT then xi belongs to O
10Use EM to estimate mixture parameters
O
11Use EM to estimate mixture parameters
12Initialization of mixture's parameters
13Example Color Distributions
14Color Representation System
- RGB values are converted to the HSI
(Hue-Saturation-Intensity) system. - Only the H and S components are used (I is
discarded to obtain invariance to the intensity
of ambient illumination). - Pixels corresponding to low S values and very
high I values were discarded (i.e., not reliable
to measure H).
15Why Adapting Color Mixture Models?
- A non-adaptive model has given good results in
the past under large rotations in depth, changes
of scale, and partial occlusions. - To deal with large changes in illumination
conditions, an adaptive model is required.
16Adaptive Color Mixtures (contd)
Changes due to various illumination and viewing
conditions!
17Model Adaptation
search window
Pr-1(x/O)
Pr(x/O)
frame r-1
frame r
frame r1
18Adaptive Color Mixtures (contd)
19Adaptive Color Mixtures (contd)
Important
20Estimates from frame r
21Adaptive Estimates
22Adaptive Color Mixtures (contd)
(Final Equations see Appendix A)
23Selective Adaptation
24Experiment no adaptation
(non-adaptive model moving camera)
25Experiment adaptation
search windows
(adaptive model moving camera)
26Experiment adapt at each frame
(adapt at each frame moving camera)
27Experiment Selective Adaptation
(selective adaptation moving camera)
28Experiment no adaptation
(no adaptation moving camera)
29Experiment Selective Adaptation
(adaptation moving camera)
30Extensions
- Use several cues, especially when color becomes
unreliable. - Adaptive modeling of background scene colors.
- P(O/xi) , P(B/xi)
- Adaptive number of mixture components.
31Case Study 2 Background Modeling
- C. Stauffer and E. Grimson, "Adaptive background
mixture models for real-time tracking", IEEE
Computer Vision and Pattern Recognition
Conference, Vol.2, pp. 246-252, 1998
32Problem
- Real-time segmentation and tracking of moving
objects in image sequences.
33Requirements
- A robust moving object detection system should be
able to handle - Variations in lighting (i.e., gradual and sudden)
- Multiple moving objects.
- Moving scene clutter (i.e., tree branches, sea
waves, etc.) - Other arbitrary changes in the scene (i.e.,
parked cars, camera oscillations, etc.)
34Traditional Approaches for Background Modeling
- The most common technique for moving object
detection is based on background subtraction - (1) Subtract a model of the background from the
current frame. - (2) Threshold the difference image.
background model
current frame
result of subtraction
35Traditional Approaches for Background Modeling
(contd)
- We can assume
- static or a time-varying background.
- fixed or moving camera.
- here fixed camera, varying background.
- Non-adaptive background models have serious
limitations... - How to obtain a good background model?
36Traditional Approaches for Background Modeling
- Frame differencing
- The estimated background in just the previous
frame. - Works for certain object speeds and frame rates.
- Very sensitive to threshold.
absolute difference
Low threshold
high threshold
37Traditional Approaches for Background Modeling
(contd)
- A standard method of adaptive background modeling
is based on averaging the images (or taking the
median) over time - The background must be visible most of the time.
- Objects must move continuously.
- Not robust when the scene contains multiple,
slowly moving objects. - Cannot distinguish shadows from moving objects.
38Traditional Approaches for Background Modeling
(contd)
(median)
39Approach Model pixel values using Mixtures of
Gaussians
after 2 minutes
specularities
green
red
flickering
40Approach
- (1) Model the values of each pixel as a mixture
of Gaussians. - (2) Classify each pixel as background or
foreground. - (3) Group foreground pixels using connected
components and track from frame to frame using a
multiple hypothesis tracker. - (4) Adapt the model parameters over time to deal
with - Lighting changes
- Repetitive motions of scene elements (e.g.,
swaying trees) - Slow-moving objects
- Introducing or removing objects from the scene
(i.e., parked cars)
41Modeling pixel values using Mixtures of Gaussians
42Modeling pixel values using Mixtures of
Gaussians (contd)
(i.e., R,G,B are independent with same variance!)
43Pixel classification as background
- Each pixel is modeled as a mixture of Gaussians.
- Evaluate each Gaussian (i.e., using its
persistence and variance) to determine if it
represents the "background process". - Pixel values that do not fit the background
Gaussians are considered foreground.
44Estimating/Updating the parameters of the model
- Each new observation is integrated into the model
using standard learning rules (using the EM
algorithm for every pixel would be costly). - Every pixel value, Xt, is checked against the
existing K Gaussian distributions to find the one
that represents it most. - A match is defined as a pixel value within 2.5s
of a distribution (i.e., each pixel has
essentially its own threshold).
45Estimating/Updating the parameters of the model
(contd)
- If a match is found, the prior probabilities of
each mixture model are updated as follows
exponential forgetting
46Estimating/Updating the parameters of the model
(contd)
- The parameters of the matched Gaussian i are
updated as follows
47Estimating/Updating the parameters of the model
(contd)
- If a match is not found, the least probable
distribution is replaced with a distribution with
the current pixel value as its mean value, an
initial high variance, and a low prior weight.
48Determining the background Gaussians
- Determine which Gaussians from the mixture
represent the background processes. - Observations
- Moving objects are expected to produce more
variance than a static (background) object -
VARIANCE - There should be more data supporting the
background distributions because they are
repeated, whereas pixel values from different
objects are often not the same color - PERSISTANCE
49Determining the background Gaussians (contd)
- The following heuristic is used to determine the
"background" Gaussians - Choose the Gaussians which have most
supporting evidence and the least variance
50Determining the background Gaussians (contd)
- To implement this idea, the Gaussians are ordered
by the value of p/s (i.e., p is prior
probability). - The first B distributions are chosen as the
background model, where - where
- T(background_pixels) / (total pixels)
51Grouping and Tracking
- Foreground pixels are grouped into different
regions using connected components. - Connected components are tracked from frame to
frame. - A pool of Kalman filters are used to track the
connected components (see paper for more details).
52Experiments and results
- The system was tested continuously for 16 months
(24 hrs/day through rain and snow). - Processing power
- 11-13 frames per second
- Each frame was160 x 120 pixels.
- http//www.ai.mit.edu/projects/vsam
53Results simple pedestrian/vehicle
classification using aspect ratio