FaceTrack:%20Tracking%20and%20summarizing%20faces%20from%20compressed%20video - PowerPoint PPT Presentation

About This Presentation
Title:

FaceTrack:%20Tracking%20and%20summarizing%20faces%20from%20compressed%20video

Description:

A face-based video summary can help users decide if they want to download the whole video ... the matrices are expanded to show how the states are updated ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 33
Provided by: And693
Category:

less

Transcript and Presenter's Notes

Title: FaceTrack:%20Tracking%20and%20summarizing%20faces%20from%20compressed%20video


1
FaceTrack Tracking and summarizing faces from
compressed video
  • Hualu Wang, Harold S. Stone, Shih-Fu Chang
  • Dept. of Electrical Engineering, Columbia
    University
  • NEC Research Institute

Presentation by Andy Rova School of Computing
Science Simon Fraser University
2
Introduction
  • FaceTrack
  • System for both tracking and summarizing faces in
    compressed video data
  • Tracking
  • Detect faces and trace them through time in video
    shots
  • Summarizing
  • Cluster the faces across video shots and
    associate them with different people
  • Compressed video
  • Avoids the costly overhead of decoding prior to
    face detection

3
System Overview
  • The FaceTrack systems goals are related to ideas
    discussed in previous presentations
  • A face-based video summary can help users decide
    if they want to download the whole video
  • The summary provides good visual indexing
    information for a database search engine

4
Problem definition
  • The goal of the FaceTrack system is to take an
    input video sequence and generate a list of
    prominent faces that appear in the video, and
    determine the time periods where each of the
    faces appears

5
General Approach
  • Track faces within shots
  • Once tracking is done, group faces across video
    shots into faces of different people
  • Output a list of faces for each sequence
  • For each face, list shots where it appears, and
    when
  • Face recognition is not performed
  • Very difficult in unconstrained videos due to the
    broad range of face sizes, numbers, orientations
    and lighting conditions

6
General Approach
  • Try to work in the compressed domain as much as
    possible
  • MPEG-1 and MPEG-2 videos
  • Used in applications such as digital TV and DVD
  • Macroblocks and motion vectors can be used
    directly in tracking
  • Greater computational speed compared to decoding
  • Can always decode select frames down to the pixel
    level for further analysis
  • For example, grouping faces across shots

7
MPEG Review
  • 3 types of frame data
  • Intra-frames (I-frames)
  • Forward predictive frames (P-frames)
  • Bidirectional predictive frames (B-frames)
  • Macroblocks are coding units which combine pixel
    information via DCT
  • Luminance and chrominance are separated
  • P-frames and B-frames are subjected to motion
    compensation
  • Motion vectors are found and their differences
    are encoded

8
System Diagram
9
Face Tracking
  • Challenges
  • Locations of detected faces may not be accurate,
    since the face detection algorithm works on 16x16
    macroblocks
  • False alarms and misses
  • Multiple faces cause ambiguities when they move
    close to each other
  • The motion approximated by the MPEG motion
    vectors may not be accurate
  • A tracking framework which can handle these
    issues in the compressed domain is needed

10
The Kalman Filter
  • A linear, discrete-time dynamic system is defined
    by the following difference equations
  • We only have access to a sequence of measurements
  • Given this noisy observation data, the problem is
    to find the optimal estimate of the unknown
    system state variables

11
The Kalman Filter
  • The filter is actually an iterative algorithm
    which keeps taking in new observations
  • The new states are successively estimated
  • The error of the prediction of is called the
    innovation
  • The innovation is amplified by a gain matrix and
    used as a correction for the state prediction
  • The corrected prediction is the new state estimate

12
The Kalman Filter
  • In the FaceTrack system, the state vector of
    the Kalman filter is the kinematic information of
    the face
  • position, velocity (and sometimes acceleration)
  • The observation vector is the position of
    the detected face
  • May not be accurate
  • The Kalman filter lets the system predict and
    update the position and parameters of the faces

13
The Kalman Filter
  • The FaceTrack system uses a 0.1 second time
    interval for state updates
  • This corresponds to every I-frame and P-frame for
    typical MPEG GOP structure
  • GOP Group Of Pictures frame structure
  • For example, IBBPBBP

14
The Kalman Filter
  • For I-frames, the face detector results are used
    directly
  • For P-frames, the face detector results are more
    prone to false alarms
  • Instead, P-frame face locations are predicted
    based on the MPEG motion vectors (approximately)
  • These locations are then fed into the Kalman
    filter as observations
  • (in contrast with previous trackers, which
    assumed that the motion-vector calculated
    locations were correct alone)

15
The Face Tracking Framework
  • How to discriminate new faces from previous ones
    during tracking?
  • The Mahalanobis distance is a quantitative
    indicator of how close the new observation is to
    the prediction
  • This can help separate new faces from existing
    tracks if the Mahalanobis distance is greater
    than a certain threshold, then the newly detected
    face is unlikely to belong to a particular
    existing track

16
The Face Tracking Framework
  • In the case where two faces move close together,
    Mahalanobis distance alone cannot keep track of
    multiple faces
  • Case where a face is missed or occluded
  • Hypothesize the continuation of the face track
  • Case of false alarm or faces close together
  • Hypothesize creation of a new track
  • The idea is to wait for new observation data
    before making the final decision about a track

17
Intra-shot Tracking Challenges
  • Multiple hypothesis method

18
Kalman Motion Models
  • The Kalman filter is a framework which can model
    different types of motion, depending on the
    system matrices used
  • Several models were tested for the paper, with
    varying results
  • Intuition who pays to research object tracking?
  • The military!
  • Hence many tracking models are based on
    trajectories that are unlike those that faces in
    video will likely exhibit
  • For example, in most commercial video, a human
    face will not maneuver like a jet or missile

19
Kalman Motion Models
  • Four motion models were tested for FaceTrack
  • Constant Velocity (CV)
  • Constant Acceleration (CA)
  • Correlated Acceleration (AA)
  • Variable Dimension (VDF)
  • The testing was done against ground truth
    consisting of manually identified face centers in
    each frame

20
Kalman Motion Models
  • Rather than go through the whole process in exact
    detail, the next several slides are an
    illustration of the differences between the CV
    and CA models
  • Also, the matrices are expanded to show how the
    states are updated

21
Constant Velocity (CV) Model
22
Constant Velocity (CV) Model
simplify
23
Constant Velocity (CV) Model
simplify
24
Constant Acceleration (CA) Model
Acceleration is now added to the state vector,
and is explicitly modeled as constants disturbed
by random noises
25
Constant Acceleration (CA) Model
26
The Correlated Acceleration Model
  • Replaces constant accelerations with a AR(1)
    model
  • AR(1) First order autoregressive
  • A stochastic process where the immediately
    previous value has an effect on the current value
    (plus some random noise)
  • Why?
  • There is a strong negative autocorrelation
    between the accelerations of consecutive frames
  • Positive accelerations tend to be followed by
    negative accelerations
  • Implies that faces tend to stabilize

27
The Variable Dimension Filter
  • A system that switches between CV (constant
    velocity) and CA (constant acceleration) modes
  • The dimension of the state vector changes when a
    maneuver is detected, hence VDF
  • Developed for tracking highly maneuverable
    targets (probably military jets)

28
Comparison of Motion Models
average tracking error
tracking runs (first 16)
29
Comparison of Motion Models
  • Why does CV perform best?
  • Small sampling interval justifies viewing face
    motion as piecewise linear movements
  • The face cannot achieve very high accelerations
    (as opposed to a jet fighter)
  • AA also performs well because it fits the nature
    of the face motion well
  • Commercial video faces exhibit few persistent
    accelerations (negative autocorrelation)

30
Summarization Across Shots
  • Select representative frames for tracked faces
  • Large, frontal-view faces are best
  • Decode representative frames into the pixel
    domain
  • Use clustering algorithms to group the faces into
    different persons
  • Make use of domain knowledge
  • For example, people do not usually change clothes
    within a news segment, but often do change
    outfits within a sitcom episode

31
Simulation Results
32
Conclusions Future Research
  • The FaceTrack is an effective face tracking (and
    summarization) architecture, within which
    different detection and tracking methods can be
    used
  • Could be updated to use new face detection
    algorithms or improved motion models
  • Based on the results, the CV and AA motion models
    are sufficient for commercial face motion
  • Summarization techniques need the most
    development, followed by optimizing tracking for
    adverse situations
Write a Comment
User Comments (0)
About PowerShow.com