Using Multi-Modality to Guide Visual Tracking - PowerPoint PPT Presentation

About This Presentation
Title:

Using Multi-Modality to Guide Visual Tracking

Description:

Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick P rez, Michel Gangnet, Andrew Blake – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 38
Provided by: Patric720
Category:

less

Transcript and Presenter's Notes

Title: Using Multi-Modality to Guide Visual Tracking


1
Using Multi-Modality to Guide Visual Tracking
  • Jaco Vermaak
  • Cambridge University Engineering Department
  • Patrick Pérez, Michel Gangnet, Andrew Blake
  • Microsoft Research Cambridge

Paris, December 2002
2
Introduction
  • Visual tracking difficult changes in pose and
    illumination, occlusion, clutter, inaccurate
    models, high-dimensional state spaces, etc.
  • Tracking can be aided by combining information in
    multiple measurement modalities
  • Illustrated here on head tracking using
  • Sound and contour measurements
  • Colour and motion measurements

3
General Tracking
4
Tracking Equations
  • Objective recursive estimation of the filtering
    distribution
  • General solution
  • Prediction step
  • Filtering/update step
  • Problem generally no analytic solutions
    available

5
Particle Filter Tracking
  • Monte Carlo implementation of general recursions.
  • Filtering distribution represented by
    samples/particles with associated importance
    weights
  • Proposal step new particles proposed from a
    suitable proposal distribution
  • Reweighting step particles reweighted with
    importance weights
  • Resampling step multiply particles with high
    importance weights and eliminate those with low
    importance weights.

6
Particle Filter Building Blocks
  • Sampling from conditional density
  • Resampling
  • Reweighting with positive function

7
Particle Filter Implementation
  • Requires specification of
  • System configuration and state space
  • Likelihood model
  • Dynamical model for state evolution
  • State proposal distribution
  • Particle filter architecture

8
Head Tracking using Sound and Contour Measurements
9
Problem Formulation
  • Objective track the head of a person in a video
    sequence using audio and image cues
  • Audio time delay of arrival (TDOA) measurements
    at microphone pair orthogonal to optical axis of
    camera
  • Image edge events along normal lines to a
    hypothesised contour
  • Complimentary modalities audio good for
    (re)initialisation image good for fine
    localisation

10
System Configuration
camera
microphone pair
image plane
11
Model Ingredients
  • Low-dimensional state space similarity transform
    applied to a reference template
  • Dynamical prior integrated Langevin equation,
    i.e. second-order Markov kernel
  • Multi-modal data likelihoods
  • Sound based likelihood TDOA at mic. pair
  • Contour based likelihood edge events

12
Contour Likelihood
  • Input maxima of projected luminance gradient
    along normals
  • ( such events on normal)

13
Contour Likelihood
  • Advantages
  • Low computational cost
  • Robust to illumination changes
  • Drawbacks
  • Fragile because of narrow support (especially
    with only similarity transform on a fixed shape
    space)
  • Sensitive to background clutter
  • Extension
  • Multiply gradient by inter-frame difference to
    reduce influence of background clutter

14
Inter-Frame Difference
Without frame difference
With frame difference
15
Audio Likelihood
  • Input positions of peaks in generalised
    cross-correlation function (GCCF)
  • Reverberation leads to multiple peaks

TDOA
16
Audio Likelihood
  • Deterministic mapping from Time Delay of Arrival
    (TDOA) to bearing angle (microphone calibration)
    to X-coordinate in image plane (camera
    calibration)
  • Audio likelihood follows in similar manner to
    contour likelihood
  • Likelihood assumes a uniform clutter model

17
Particle Filter Architecture
  • Layered sampling first X-position and sound
    likelihood then rest
  • X-position proposal mixture of diffusion
    dynamics and sound proposal
  • To admit jumps from proposal X-dynamics have to
    be augmented with an uniform component

18
Examples
  • Effect of inter-frame difference
  • Conversational ping-pong

19
Examples
  • Conversational ping-pong and sound based
    reinitialisation

20
Head Tracking using Colour and Motion Measurements
21
Problem Formulation
  • Objective detect and track the head of a single
    person in a video sequence taken from a
    stationary camera
  • Modality fusion
  • Motion and colour measurements are complementary
  • Motion when the object is moving colour is
    unreliable
  • Colour when the object is stationary motion
    information disappears
  • Automatic object detection and tracker
    initialisation using motion measurements
  • Individualisation of the colour model to the
    object
  • Initialised with a generic skin colour model
  • Adapted to object colour during periods of
    motion motion model acts as anchor

22
Object Description and Motion
  • Head modelled as an ellipse that is free to
    translate and scale in the image
  • Binary indicator variable to signal whether
    object is present in the image or not, so object
    state becomes
  • State components assumed to have independent
    motion models
  • Indicator discrete Markov chain
  • Position and scale Langevin motion with uniform
    initialisation

23
Image Measurements
  • Measurements taken on a regular filter grid
  • Measurement vector

24
Observation Likelihood Model
  • Measurements at gridpoints assumed to be
    independent
  • Unique background (object absent) likelihood
    model for each gridpoint
  • All gridpoints covered by the object share the
    same foreground likelihood model
  • At each gridpoint the measurements are also
    assumed to be independent
  • Note that the background motion model is shared
    by all the gridpoints

25
Colour Likelihood Model
  • Normalised histograms for both foreground and
    background colour likelihood models
  • Background models trained on a sequence without
    objects
  • Foreground models trained on a set of labelled
    face images
  • Histogram models supplied with a small uniform
    component to prevent numerical problems
    associated with empty bins

26
Motion Likelihood Model
  • Background frame-difference measurements
    empirically found to be gamma distributed
  • Foreground frame-difference depends on magnitude
    of motion, number and orientation of foreground
    edges, etc.
  • Modelling these effects accurately is difficult
  • In general if the object is moving foreground
    frame-difference measurements are substantially
    larger than those for background
  • Thus a two-component uniform distribution is
    adopted for the foreground frame-difference
    measurements (outlier model)

27
Particle Proposal
  • Three stages of operation
  • Birth object first enters scene proposal should
    detect object and spawn particles in the object
    region
  • Alive object persists in scene proposal should
    allow object to be tracked, whether it is
    stationary or moves around
  • Death object leaves scene proposal should kill
    particles associated with the object
  • Form of particle proposal

empirical probability of object being alive
28
Particle Proposal
  • Indicator proposal
  • Birth only allowed if there is no object
    currently in the scene
  • All particles alive are subjected to a fixed
    death probability
  • State proposal
  • Langevin dynamics if object is alive
  • Gaussian birth proposal parameters from
    detection module

29
Object Detection
  • Object region detected by probabilistic
    segmentation of the horizontal and vertical
    projections of the frame-difference measurements
  • Region location and size determine parameters for
    birth proposal distribution

30
Colour Model Adaptation
  • Why
  • Generic skin colour model may be too broad for
    accurate localisation
  • Model sensitive to colour changes due to changes
    in pose and illumination
  • When
  • Object present and moving largest variations in
    colour expected
  • Motion likelihood anchors particles around
    moving object
  • How
  • Gradual avoid fitting to the background
    enforced with prior
  • Stochastic EM contribution of particles
    proportional to likelihood

31
Colour Model Adaptation
  • Unknown parameters normalised bin values for
    object hue and saturation histograms
  • EM Q-function for MAP estimation
  • No analytic solution but particle approximation
    yields
  • Monte Carlo approximation only performed over
    particles that are currently alive

32
Colour Model Adaptation
  • Dirichlet prior used for parameter updates
  • Prior centred on old parameter values
  • Variance controlled by multiplicative constant
  • Update rule for normalised bin counts becomes

33
What Happens?
particle histograms
weighted average histogram
34
Implementation
  • Colour model adaptation iterations occur between
    particle prediction and particle reweighting in
    standard particle filter
  • Stochastic EM algorithm initialised with
    parameters from previous time step
  • A single stochastic EM iteration is sufficient at
    each time step
  • Number of particles is fixed to 100
  • Non-optimised algorithm runs at 15fps on standard
    desktop PC

35
Examples
  • No adaptation tracker gets stuck on
    skin-coloured carpet in the background
  • Adaptation tracker successfully adapts to
    changes in pose and illumination and lock is
    maintained
  • No motion likelihood tracker fails, illustrating
    need for anchor likelihood

36
Examples
  • Tracking is successful despite substantial
    variations in pose and illumination and the
    subject temporarily leaving the scene
  • Particles are killed when the subject leaves the
    scene upon re-entering the individualised colour
    model allows lock to be re-established within a
    few frames

37
The End
Write a Comment
User Comments (0)
About PowerShow.com