MultiLevel Particle Filter Fusion of Features and Cues for AudioVisual Person Tracking

About This Presentation

Title:

MultiLevel Particle Filter Fusion of Features and Cues for AudioVisual Person Tracking

Description:

3D location estimated from window position and width ... 'Scout' tracks scan the room, scored on FG detection proximity ( detection ... – PowerPoint PPT presentation

Number of Views:102

Avg rating:3.0/5.0

Slides: 14

Provided by: Keni

Category:

more less

Transcript and Presenter's Notes

Title: MultiLevel Particle Filter Fusion of Features and Cues for AudioVisual Person Tracking

1
Multi-Level Particle Filter Fusion of Features
and Cues for Audio-VisualPerson Tracking

Keni Bernardin, Tobias Gehrig, Rainer
Stiefelhagen
Universität Karlsruhe
keni_at_ira.uka.de, tgehrig_at_ira.uka.de,
stiefel_at_ira.uka.de
8.5.2007

2
Particle Filter-based Fusion

Uses low-level features
foreground segmentation maps
image pixel colors
and high level cues
upper body detections
Person regions from blob tracking
SLOC estimates
The algorithm keeps one separate particle filter
per person track.

A particle represents one point on a person. Only
75 particles used per track. Particles are scored
on observed features and penalized on proximity
to other tracks (track repulsion)
Sampling Importance Resampling (SIR)
Propagation 2 sets of particles with low/high
dynamics (max 3738)
No specific room knowledge used (dimensions,
objects, background)

3
Upper Body Detectors

Boosted classifier cascades based on
Haar-features to detect upper body region (only
corner cameras).
Only standard cascades implemented in OpenCV. No
adaptation or tuning to CHIL rooms.
Entire image is scanned for all corner cameras.
This is time consuming factor! (approx. 10-12fps,
640x480). Without detections RT factor 1.48

Inside rectangle used to build upper body
histogram, outside rectangle for initial
background histogram (used by scout trackers)
Reprojection of detections to 3D scene
3D location estimated from window position and
width
Location uncertainty also computed, expressed as
covariance matrix
Similar procedure for detected person regions
(top cam) and SLOC estimates.

4
Person Regions Foreground

Adaptive background modeling (10 learn frames,
run-on), foreground segmentation using fixed
threshold (all camera views)
Person region tracking (only top cam)
Extraction of FG blobs
Initialization/Deletion of person models
(x,y,Radius) based on FG blob support
EM-adaptation of model parameters based on
spatial overlap
Reprojection to 3D scene
RT factor 0.9 (could be much faster).

5
Upper Body Color Histograms

Modified HSV cone for more compact color
histograms
v var (max 10 bins)
s satvar (max 10 bins)
h huesatvar (max 16 bins)
Adaptation of upper body histograms upon
(matching) upper body detection or SLOC estimate
(mahalanobis distance using 3D detections
location and covariance matrix).
Views where no detection was made use average
histogram of other views

Continuous adaptation of top view histogram and
of all backgrounds
Upper body histogram filtering independently for
all views using background
Hfilt minmax(Hbody) (1 minmax(Hbg))

6
SLOC Estimates

JPDAF acoustic tracker output used
Acoustic estimates 3D position and localization
uncertainty (covariance matrix) used to score
particles
Similar to detections Adaptation of matching
tracks color histograms upon SLOC estimate
Simulation of upper body classifier cascade
detection window for corner views
Simulation of person region detection circle for
top view

SLOC estimates are used as high level cues. Just
as other features, they serve to score particles
and therefore initialize or maintain tracks,
update positions, etc (Feature-level fusion of
modalities)

7
Track Creation / Deletion

Scout tracks scan the room, scored on FG
detection proximity ( detection color) person
region overlap
A scout track is validated when
Particle spread small (std deviation lt 60cm)
AND
Average particle score above activation threshold
AND
Upper body histograms (sampled at projected
particle positions) balanced across all corner
views (bhattacharyya distance) AND dissimilar to
histogram of neighboring background (sampled from
60cm circle around track center) in respective
view
Tracks are invalidated when
Particle spread gt 90cm OR
Average (body color detection person region)
score below deactivation threshold

8
JPDAF Acoustic Tracker
JPDAF-based Acoustic Tracker

GCC-Phat based time delay estimation
one observation vector per microphone array using
only time delays above threshold
tracking of multiple targets using JPDAF
only observations inside validation region of a
target are associated with that target

internally maintained IEKFs updated according to
the probability that the observation originated
from respective targets
current active speaker selected based on the
volume of the error ellipsoid given by the state
error covariance matrix
new targets are created when an observation
cannot be associated with the existing targets
and deleted when they do not initialize within a
given time or have not been active for some time

9
UKA Top-View AV Tracker

Visual tracking
Adaptive foreground segmentation with fixed
threshold
Person models (x,y,radius)
Expectation-Maximization approach for assignment
of blobs to models and model parameter update
Creation of new model when unmatched FG blob
exists, deletion of models which are unsupported
(high time delay ensures greater stability)
Acoustic Source localization Output of the Joint
Probabilistic Data Association Filter System
Audio-Visual Fusion Fusion is done at decision
level using a 3-state finite state machine, which
selects or averages the video and audio tracks.