Extracting%20features%20from%20spatio-temporal%20volumes%20(STVs)%20for%20activity%20recognition

About This Presentation
Title:

Extracting%20features%20from%20spatio-temporal%20volumes%20(STVs)%20for%20activity%20recognition

Description:

Note that all the extrema are detected irrespective of their spatial and temporal extents ... Local extrema of these curvatures can therefore be used to ... –

Number of Views:207
Avg rating:3.0/5.0
Slides: 41
Provided by: dheerajs
Category:

less

Transcript and Presenter's Notes

Title: Extracting%20features%20from%20spatio-temporal%20volumes%20(STVs)%20for%20activity%20recognition


1
Extracting features from spatio-temporal volumes
(STVs) for activity recognition
  • Dheeraj Singaraju
  • Reading group 06/29/06

2
Motivation for dealing with STVs
  • Optical flow based methods would be able to
    capture only first order motion.
  • Methods that use HMMs deal with single point
    trajectories that carry only motion information
    and no spatial information
  • We aim at a direct scheme for event detection
    and classification that does not require feature
    tracking, segmentation or computation of optical
    flow
  • We want to detect points in the space-time
    volume which have significant local variation in
    both space and time.

3
Approaches that we shall discuss
  • On Space-Time Interest Points Ivan Laptev
  • Local image features provide compact and abstract
    representations of images, eg corners
  • Extend the concept of a spatial corner detector
    to a spatio-temporal corner detector
  • Actions as Objects A Novel Action Represenation
    Alper Yilmaz and Mubarak Shah
  • Concepts of differential geometry Extract
    features from the STV based on local variations
    in curvatures of points on the volume
  • The curvatures show invariance to rotation and
    translation

4
Detecting interest points in space
  • An image can be modeled by
    its linear scale representation
    as follows
  • To look for interest points one analyzes the
    matrix of 2nd moments

A more familiar form of the matrix
5
Detecting interest points in space (contd.)
  • We want to choose corners in the image since they
    have significant spatial variation.
  • We therefore detect positive maxima of the
    following function
  • How do we detect interest points in space-time ?

6
Results of detecting interest points in space
  • Detecting interest points in space gives interest
    points in the stationary background also
  • We want to find interest points that have
    information in the space as well as the temporal
    domain.

7
Detecting interest points in space-time
  • A spatio-temporal image sequence can be modeled
    by its linear scale representation
    as follows
  • Note that there are different scales for the
    spatial and the temporal scale, i.e. and
    respectively

8
Detecting interest points in space-time (contd.)
  • To look for interest points one analyzes the
    matrix of 2nd moments
  • We therefore look for the maxima of the
    following spatio-temporal corner function

9
Results of detecting interest points in the STV
  • Consider a synthetic sequence of a ball moving
    towards a wall and colliding with it
  • An interest point is detected at the collision
    point

10
Results of detecting interest points in the STV
  • Consider a synthetic sequence of 2 balls moving
    towards each other
  • Different interest points are calculated at
    different spatial and temporal scales

coarser scale
11
Effects of scales on interest point detection
Long temporal events are detected for large
values of while short events are detected
for small values of
Long spatial events are detected for large values
of while short events are detected for
small values of
12
Scale selection in space-time
  • We consider a prototype event modeled by a
    spatio-temporal Gaussian blob
  • The scale space representation of f is hence
    given by

13
Scale selection in space-time (contd.)
  • We want to find a differential operator that
    assumes simultaneous extrema over spatial and
    temporal scales that are characteristic of this
    Gaussian prototype event
  • To recover the spatio-temporal extent of f, we
    consider second order derivatives of L normalized
    by the scales as
  • By solving for the fact that the above normalized
    2nd order derivatives assume maxima at scales
    and we get a 1, b ¼,
    c ½ and d ¾.

14
Scale selection in space-time (contd.)
  • We therefore define a normalized spatio-temporal
    Laplace operator as follows
  • The following plots show that the zero crossings
    correspond to the maxima that are detected at
    and

15
Scale adapted space time interest points
  • So far we have found events that are local
    extrema in the space time volume at a particular
    choice of space and time scales
  • We would like to detect interest points that are
    extrema over the space time volume as well as
    over the scale of the scale-normalized Laplace
    operator
  • The reason for doing so is that different events
    would in general have different spatial and
    temporal extents

16
Algorithm for detecting interest points
17
Results on a previously used synthetic example
Note that all the extrema are detected
irrespective of their spatial and temporal extents
DOUBT Why are these points not detected as
interest points ?
18
Results of the algorithm on real seq.
Note that events of all spatial and temporal
extents are captured. The size of the circle
shows the spatial extent of the event
19
Results of interest pt. detection
Note that the regularity and extent of the
spatio-temporal interest points is actually
representative of the true events in time
20
Classification of events
  • Every interest point is described by its local
    spatio-temporal neighbor and we compare
    neighborhoods of events to classify events
  • The neighborhood of an interest point is defined
    by evaluating the following event descriptors

This normalization guarantees the invariance of
the derivative response to image scaling
21
Classification of events (contd.)
  • To compare two events, we compute the Mahalanobis
    distance between their descriptors as
  • To detect similar events in the given data, we
    apply k-means clustering to the event descriptors
    and thus detect groups of interest points with
    similar spatio-temporal neighbourhoods
  • Once the cluster centers are evaluated from the
    training data, given a new event, we evaluate its
    distance from the cluster centers. If the
    distance from all the centers is above a
    threshold we declare it as a background event.

22
Results of classification
23
Recognizing gaits
  • We extract the following features from the
    spatio-temporal volume
  • Positions of the interest points
  • The corresponding scales
  • The class of interest points
  • We introduce a state for the model determined by
    the vector
    , where the variables are
  • Position of person in the image
  • His/her size
  • Frequency of the gait
  • Phase of the gait at current moment
  • Temporal variations of

24
Recognizing gaits (contd.)
  • We then have the following model for walking
  • Such a model helps handle translations as well as
    uniform rescaling in the image and the temporal
    domain

25
Recognizing gaits (contd.)
  • Given a model state X, a current time , a
    length of time window , and a set of data
    features detected from the recent time window

    , the match between the model and the data is
    defined by a weighted sum of distances h between
    the model features and the data
    features .
  • is a data feature minimizing the
    distance h for a given and is the variance
    for the exponential function.

26
Recognizing gaits (contd.)
  • To find the best match between the model and the
    data, we search for the model state that
    minimizes

27
Summary of the approach
  • An interest point detector is developed that
    finds local image features that show high
    variation of the image values in space and in
    time
  • The spatio-temporal extents of detected events
    can be estimated by using a normalized Laplacian
    operator
  • The neighborhoods of the events are described
    using scale invariant spatio-temporal descriptors
  • Different actions are then compared by checking
    for the matches between the event descriptors

28
Actions as objects Action sketches
  • This methods analyzes the spatio-temporal volume
    by using the differential geometric surface
    properties such as peaks, pits, valleys and
    ridges
  • The authors claim that these are important action
    descriptors as they capture both spatial and
    temporal properties
  • These descriptors are related to the convex and
    concave parts of the object contours and/or to
    the maxima in the spatio-temporal curvature of a
    trajectory, and are hence view invariant.

29
STV a collection of contours
  • In this approach the spatio-temporal volume is
    really a hollow solid object whose boundaries are
    defined by the contours of the boundaries of a
    person in every image frame.
  • It is assumed that the STV can be considered as a
    manifold, which helps us to consider small
    neighborhoods around a point to be nearly flat.
  • Since the STV is really the time evolution of a
    contour, we can define a 2D parametric
    representation by considering arc length s of the
    contour and time t.

30
STV a collection of contours (contd.)
t varying, s fixed
s varying, t fixed
The STV is a continuous representation in the
normalized time scale and it
does not require ay time warping for matching two
sequences of different lengths.
31
Action descriptors
  • We want to compute action descriptors that
    correspond to changes in direction, speed and
    shape of parts of contour
  • Changes in these quantities are reflected on the
    surface of the STV and can be computed using
    differential geometry by identifying different
    landmarks.
  • These landmarks can be classified by basis of the
    local curvatures at points on the STV

32
Action descriptors (contd.)
  • Differential geometry gives us the concept of
    Gaussian Curvature K and Mean Curvature H that
    can be evaluated at points on the manifold of the
    STV. These curvatures exhibit invariance to
    algebraic transformations such as translation and
    rotation.
  • Local extrema of these curvatures can therefore
    be used to identify interest points for
    describing actions

33
Action descriptors (contd.)
  • The following table shows the different surface
    types and their associated curvatures

34
Analysis of action descriptors
  • We consider three types of contours concave
    contours, convex contours and straight contours
  • The following contours generate typical landmarks
    in the spatial-temporal volume
  • Straight contour ridge, valley or flat surface
  • Convex contour peak, ridge or saddle ridge
  • Concave contour pit, valley or saddle valley

Shapes generated from straight contours
35
STVs corresponding to hand motion
The STV generated by a hand staying stable. Such
a motion (or lack of it) creates a ridge
36
STVs corresponding to hand motion
The STV created by a hand that first moves
downwards and then upwards. Note that a saddle
ridge is created at the point of change of motion
37
Properties of the event descriptors
  • The landmarks discussed so far are essentially
    produced due to stable motion or change in stable
    motion.
  • The stability of motion enforces that the STV
    is smooth enough so that one can consider valid
    local planar neighborhoods at points
  • Some of the landmarks are related to the
    curvature of the point trajectories and body
    contours as follows

38
View invariance of event descriptors
  • Since the landmarks are associated with extrema
    of local curvatures, even when the view changes
    the transformed landmarks are extrema in the new
    STV
  • DOUBT Not very confident about the
    derivation of the above
  • Due to this view invariance, comparing two STV
    volumes is equivalent to checking if there is a
    valid Fundamental Matrix relating the set of
    event descriptors in 2 given action volumes.

Derived formula relating curvatures of
corresponding points in 2 different views
39
Comparing two actions
  • We check if a linear system of the following kind
    is satisfied by the event descriptors in both the
    actions
  • This boils down to checking if the last singular
    value of A is 0. From a set of possible matches
    between the input action sketch and the known
    action sketches, we select the action with the
    minimum matching score

40
Summary of the approach
  • Using concepts of differential geometry, extract
    interest points action sketches that have local
    spatiotemporal information by virtue of being
    local extrema of curvatures in space-time
  • These event descriptors are associated with
    uniform motion or stable changes in uniform
    motion
  • Since the action sketches are view invariant,
    comparing 2 actions is equivalent to checking if
    there is a valid Fundamental Matrix relating the
    positions of the action sketches for the
    individual actions.
Write a Comment
User Comments (0)
About PowerShow.com