Using Multi-Modality to Guide Visual Tracking - PowerPoint PPT Presentation

About This Presentation

Title:

Using Multi-Modality to Guide Visual Tracking

Description:

Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick P rez, Michel Gangnet, Andrew Blake – PowerPoint PPT presentation

Number of Views:119

Avg rating:3.0/5.0

Slides: 38

Provided by: Patric720

Category:

more less

Transcript and Presenter's Notes

Title: Using Multi-Modality to Guide Visual Tracking

1
Using Multi-Modality to Guide Visual Tracking

Jaco Vermaak
Cambridge University Engineering Department
Patrick Pérez, Michel Gangnet, Andrew Blake
Microsoft Research Cambridge

Paris, December 2002
2
Introduction

Visual tracking difficult changes in pose and
illumination, occlusion, clutter, inaccurate
models, high-dimensional state spaces, etc.
Tracking can be aided by combining information in
multiple measurement modalities
Illustrated here on head tracking using
Sound and contour measurements
Colour and motion measurements

3
General Tracking
4
Tracking Equations

Objective recursive estimation of the filtering
distribution
General solution
Prediction step
Filtering/update step
Problem generally no analytic solutions
available

5
Particle Filter Tracking

Monte Carlo implementation of general recursions.
Filtering distribution represented by
samples/particles with associated importance
weights
Proposal step new particles proposed from a
suitable proposal distribution
Reweighting step particles reweighted with
importance weights
Resampling step multiply particles with high
importance weights and eliminate those with low
importance weights.

6
Particle Filter Building Blocks

Sampling from conditional density
Resampling
Reweighting with positive function

7
Particle Filter Implementation

Requires specification of
System configuration and state space
Likelihood model
Dynamical model for state evolution
State proposal distribution
Particle filter architecture

8
Head Tracking using Sound and Contour Measurements
9
Problem Formulation

Objective track the head of a person in a video
sequence using audio and image cues
Audio time delay of arrival (TDOA) measurements
at microphone pair orthogonal to optical axis of
camera
Image edge events along normal lines to a
hypothesised contour
Complimentary modalities audio good for
(re)initialisation image good for fine
localisation

10
System Configuration
camera
microphone pair
image plane
11
Model Ingredients

Low-dimensional state space similarity transform
applied to a reference template
Dynamical prior integrated Langevin equation,
i.e. second-order Markov kernel
Multi-modal data likelihoods
Sound based likelihood TDOA at mic. pair
Contour based likelihood edge events

12
Contour Likelihood

Input maxima of projected luminance gradient
along normals
( such events on normal)

13
Contour Likelihood

Advantages
Low computational cost
Robust to illumination changes
Drawbacks
Fragile because of narrow support (especially
with only similarity transform on a fixed shape
space)
Sensitive to background clutter
Extension
Multiply gradient by inter-frame difference to
reduce influence of background clutter

14
Inter-Frame Difference
Without frame difference
With frame difference
15
Audio Likelihood

Input positions of peaks in generalised
cross-correlation function (GCCF)
Reverberation leads to multiple peaks

TDOA
16
Audio Likelihood

Deterministic mapping from Time Delay of Arrival
(TDOA) to bearing angle (microphone calibration)
to X-coordinate in image plane (camera
calibration)
Audio likelihood follows in similar manner to
contour likelihood
Likelihood assumes a uniform clutter model

17
Particle Filter Architecture

Layered sampling first X-position and sound
likelihood then rest
X-position proposal mixture of diffusion
dynamics and sound proposal
To admit jumps from proposal X-dynamics have to
be augmented with an uniform component

18
Examples

Effect of inter-frame difference
Conversational ping-pong

19
Examples

Conversational ping-pong and sound based
reinitialisation

20
Head Tracking using Colour and Motion Measurements
21
Problem Formulation

Objective detect and track the head of a single
person in a video sequence taken from a
stationary camera
Modality fusion
Motion and colour measurements are complementary
Motion when the object is moving colour is
unreliable
Colour when the object is stationary motion
information disappears
Automatic object detection and tracker
initialisation using motion measurements
Individualisation of the colour model to the
object
Initialised with a generic skin colour model
Adapted to object colour during periods of
motion motion model acts as anchor

22
Object Description and Motion

Head modelled as an ellipse that is free to
translate and scale in the image
Binary indicator variable to signal whether
object is present in the image or not, so object
state becomes
State components assumed to have independent
motion models
Indicator discrete Markov chain
Position and scale Langevin motion with uniform
initialisation

23
Image Measurements

Measurements taken on a regular filter grid
Measurement vector

24
Observation Likelihood Model

Measurements at gridpoints assumed to be
independent
Unique background (object absent) likelihood
model for each gridpoint
All gridpoints covered by the object share the
same foreground likelihood model
At each gridpoint the measurements are also
assumed to be independent
Note that the background motion model is shared
by all the gridpoints

25
Colour Likelihood Model

Normalised histograms for both foreground and
background colour likelihood models
Background models trained on a sequence without
objects
Foreground models trained on a set of labelled
face images
Histogram models supplied with a small uniform
component to prevent numerical problems
associated with empty bins

26
Motion Likelihood Model

Background frame-difference measurements
empirically found to be gamma distributed
Foreground frame-difference depends on magnitude
of motion, number and orientation of foreground
edges, etc.
Modelling these effects accurately is difficult
In general if the object is moving foreground
frame-difference measurements are substantially
larger than those for background
Thus a two-component uniform distribution is
adopted for the foreground frame-difference
measurements (outlier model)

27
Particle Proposal

Three stages of operation
Birth object first enters scene proposal should
detect object and spawn particles in the object
region
Alive object persists in scene proposal should
allow object to be tracked, whether it is
stationary or moves around
Death object leaves scene proposal should kill
particles associated with the object
Form of particle proposal

empirical probability of object being alive
28
Particle Proposal

Indicator proposal
Birth only allowed if there is no object
currently in the scene
All particles alive are subjected to a fixed
death probability
State proposal
Langevin dynamics if object is alive
Gaussian birth proposal parameters from
detection module

29
Object Detection

Object region detected by probabilistic
segmentation of the horizontal and vertical
projections of the frame-difference measurements
Region location and size determine parameters for
birth proposal distribution

30
Colour Model Adaptation

Why
Generic skin colour model may be too broad for
accurate localisation
Model sensitive to colour changes due to changes
in pose and illumination
When
Object present and moving largest variations in
colour expected
Motion likelihood anchors particles around
moving object
How
Gradual avoid fitting to the background
enforced with prior
Stochastic EM contribution of particles
proportional to likelihood

31
Colour Model Adaptation

Unknown parameters normalised bin values for
object hue and saturation histograms
EM Q-function for MAP estimation
No analytic solution but particle approximation
yields
Monte Carlo approximation only performed over
particles that are currently alive

32
Colour Model Adaptation

Dirichlet prior used for parameter updates
Prior centred on old parameter values
Variance controlled by multiplicative constant
Update rule for normalised bin counts becomes

33
What Happens?
particle histograms
weighted average histogram
34
Implementation

Colour model adaptation iterations occur between
particle prediction and particle reweighting in
standard particle filter
Stochastic EM algorithm initialised with
parameters from previous time step
A single stochastic EM iteration is sufficient at
each time step
Number of particles is fixed to 100
Non-optimised algorithm runs at 15fps on standard
desktop PC

35
Examples

No adaptation tracker gets stuck on
skin-coloured carpet in the background
Adaptation tracker successfully adapts to
changes in pose and illumination and lock is
maintained
No motion likelihood tracker fails, illustrating
need for anchor likelihood

36
Examples

Tracking is successful despite substantial
variations in pose and illumination and the
subject temporarily leaving the scene
Particles are killed when the subject leaves the
scene upon re-entering the individualised colour
model allows lock to be re-established within a
few frames

37
The End

Write a Comment

User Comments (0)