Extracting%20features%20from%20spatio-temporal%20volumes%20(STVs)%20for%20activity%20recognition presentation

About This Presentation

Title:

Extracting%20features%20from%20spatio-temporal%20volumes%20(STVs)%20for%20activity%20recognition

Description:

Note that all the extrema are detected irrespective of their spatial and temporal extents ... Local extrema of these curvatures can therefore be used to ... –

Number of Views:207

Avg rating:3.0/5.0

Slides: 41

Provided by: dheerajs

Learn more at: http://www.vision.jhu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Extracting%20features%20from%20spatio-temporal%20volumes%20(STVs)%20for%20activity%20recognition

1
Extracting features from spatio-temporal volumes
(STVs) for activity recognition

Dheeraj Singaraju
Reading group 06/29/06

2
Motivation for dealing with STVs

Optical flow based methods would be able to
capture only first order motion.
Methods that use HMMs deal with single point
trajectories that carry only motion information
and no spatial information
We aim at a direct scheme for event detection
and classification that does not require feature
tracking, segmentation or computation of optical
flow
We want to detect points in the space-time
volume which have significant local variation in
both space and time.

3
Approaches that we shall discuss

On Space-Time Interest Points Ivan Laptev
Local image features provide compact and abstract
representations of images, eg corners
Extend the concept of a spatial corner detector
to a spatio-temporal corner detector
Actions as Objects A Novel Action Represenation
Alper Yilmaz and Mubarak Shah
Concepts of differential geometry Extract
features from the STV based on local variations
in curvatures of points on the volume
The curvatures show invariance to rotation and
translation

4
Detecting interest points in space

An image can be modeled by
its linear scale representation
as follows
To look for interest points one analyzes the
matrix of 2nd moments

A more familiar form of the matrix
5
Detecting interest points in space (contd.)

We want to choose corners in the image since they
have significant spatial variation.
We therefore detect positive maxima of the
following function
How do we detect interest points in space-time ?

6
Results of detecting interest points in space

Detecting interest points in space gives interest
points in the stationary background also
We want to find interest points that have
information in the space as well as the temporal
domain.

7
Detecting interest points in space-time

A spatio-temporal image sequence can be modeled
by its linear scale representation
as follows
Note that there are different scales for the
spatial and the temporal scale, i.e. and
respectively

8
Detecting interest points in space-time (contd.)

To look for interest points one analyzes the
matrix of 2nd moments
We therefore look for the maxima of the
following spatio-temporal corner function

9
Results of detecting interest points in the STV

Consider a synthetic sequence of a ball moving
towards a wall and colliding with it
An interest point is detected at the collision
point

10
Results of detecting interest points in the STV

Consider a synthetic sequence of 2 balls moving
towards each other
Different interest points are calculated at
different spatial and temporal scales

coarser scale
11
Effects of scales on interest point detection
Long temporal events are detected for large
values of while short events are detected
for small values of
Long spatial events are detected for large values
of while short events are detected for
small values of
12
Scale selection in space-time

We consider a prototype event modeled by a
spatio-temporal Gaussian blob
The scale space representation of f is hence
given by

13
Scale selection in space-time (contd.)

We want to find a differential operator that
assumes simultaneous extrema over spatial and
temporal scales that are characteristic of this
Gaussian prototype event
To recover the spatio-temporal extent of f, we
consider second order derivatives of L normalized
by the scales as
By solving for the fact that the above normalized
2nd order derivatives assume maxima at scales
and we get a 1, b ¼,
c ½ and d ¾.

14
Scale selection in space-time (contd.)

We therefore define a normalized spatio-temporal
Laplace operator as follows
The following plots show that the zero crossings
correspond to the maxima that are detected at
and

15
Scale adapted space time interest points

So far we have found events that are local
extrema in the space time volume at a particular
choice of space and time scales
We would like to detect interest points that are
extrema over the space time volume as well as
over the scale of the scale-normalized Laplace
operator
The reason for doing so is that different events
would in general have different spatial and
temporal extents

16
Algorithm for detecting interest points
17
Results on a previously used synthetic example
Note that all the extrema are detected
irrespective of their spatial and temporal extents
DOUBT Why are these points not detected as
interest points ?
18
Results of the algorithm on real seq.
Note that events of all spatial and temporal
extents are captured. The size of the circle
shows the spatial extent of the event
19
Results of interest pt. detection
Note that the regularity and extent of the
spatio-temporal interest points is actually
representative of the true events in time
20
Classification of events

Every interest point is described by its local
spatio-temporal neighbor and we compare
neighborhoods of events to classify events
The neighborhood of an interest point is defined
by evaluating the following event descriptors

This normalization guarantees the invariance of
the derivative response to image scaling
21
Classification of events (contd.)

To compare two events, we compute the Mahalanobis
distance between their descriptors as
To detect similar events in the given data, we
apply k-means clustering to the event descriptors
and thus detect groups of interest points with
similar spatio-temporal neighbourhoods
Once the cluster centers are evaluated from the
training data, given a new event, we evaluate its
distance from the cluster centers. If the
distance from all the centers is above a
threshold we declare it as a background event.

22
Results of classification
23
Recognizing gaits

We extract the following features from the
spatio-temporal volume
Positions of the interest points
The corresponding scales
The class of interest points
We introduce a state for the model determined by
the vector
, where the variables are
Position of person in the image
His/her size
Frequency of the gait
Phase of the gait at current moment
Temporal variations of

24
Recognizing gaits (contd.)

We then have the following model for walking
Such a model helps handle translations as well as
uniform rescaling in the image and the temporal
domain

25
Recognizing gaits (contd.)

Given a model state X, a current time , a
length of time window , and a set of data
features detected from the recent time window

, the match between the model and the data is
defined by a weighted sum of distances h between
the model features and the data
features .
is a data feature minimizing the
distance h for a given and is the variance
for the exponential function.

26
Recognizing gaits (contd.)

To find the best match between the model and the
data, we search for the model state that
minimizes

27
Summary of the approach

An interest point detector is developed that
finds local image features that show high
variation of the image values in space and in
time
The spatio-temporal extents of detected events
can be estimated by using a normalized Laplacian
operator
The neighborhoods of the events are described
using scale invariant spatio-temporal descriptors
Different actions are then compared by checking
for the matches between the event descriptors

28
Actions as objects Action sketches

This methods analyzes the spatio-temporal volume
by using the differential geometric surface
properties such as peaks, pits, valleys and
ridges
The authors claim that these are important action
descriptors as they capture both spatial and
temporal properties
These descriptors are related to the convex and
concave parts of the object contours and/or to
the maxima in the spatio-temporal curvature of a
trajectory, and are hence view invariant.

29
STV a collection of contours

In this approach the spatio-temporal volume is
really a hollow solid object whose boundaries are
defined by the contours of the boundaries of a
person in every image frame.
It is assumed that the STV can be considered as a
manifold, which helps us to consider small
neighborhoods around a point to be nearly flat.
Since the STV is really the time evolution of a
contour, we can define a 2D parametric
representation by considering arc length s of the
contour and time t.

30
STV a collection of contours (contd.)
t varying, s fixed
s varying, t fixed
The STV is a continuous representation in the
normalized time scale and it
does not require ay time warping for matching two
sequences of different lengths.
31
Action descriptors

We want to compute action descriptors that
correspond to changes in direction, speed and
shape of parts of contour
Changes in these quantities are reflected on the
surface of the STV and can be computed using
differential geometry by identifying different
landmarks.
These landmarks can be classified by basis of the
local curvatures at points on the STV

32
Action descriptors (contd.)

Differential geometry gives us the concept of
Gaussian Curvature K and Mean Curvature H that
can be evaluated at points on the manifold of the
STV. These curvatures exhibit invariance to
algebraic transformations such as translation and
rotation.
Local extrema of these curvatures can therefore
be used to identify interest points for
describing actions

33
Action descriptors (contd.)

The following table shows the different surface
types and their associated curvatures

34
Analysis of action descriptors

We consider three types of contours concave
contours, convex contours and straight contours
The following contours generate typical landmarks
in the spatial-temporal volume
Straight contour ridge, valley or flat surface
Convex contour peak, ridge or saddle ridge
Concave contour pit, valley or saddle valley

Shapes generated from straight contours
35
STVs corresponding to hand motion
The STV generated by a hand staying stable. Such
a motion (or lack of it) creates a ridge
36
STVs corresponding to hand motion
The STV created by a hand that first moves
downwards and then upwards. Note that a saddle
ridge is created at the point of change of motion
37
Properties of the event descriptors

The landmarks discussed so far are essentially
produced due to stable motion or change in stable
motion.
The stability of motion enforces that the STV
is smooth enough so that one can consider valid
local planar neighborhoods at points
Some of the landmarks are related to the
curvature of the point trajectories and body
contours as follows

38
View invariance of event descriptors

Since the landmarks are associated with extrema
of local curvatures, even when the view changes
the transformed landmarks are extrema in the new
STV
DOUBT Not very confident about the
derivation of the above
Due to this view invariance, comparing two STV
volumes is equivalent to checking if there is a
valid Fundamental Matrix relating the set of
event descriptors in 2 given action volumes.

Derived formula relating curvatures of
corresponding points in 2 different views
39
Comparing two actions

We check if a linear system of the following kind
is satisfied by the event descriptors in both the
actions
This boils down to checking if the last singular
value of A is 0. From a set of possible matches
between the input action sketch and the known
action sketches, we select the action with the
minimum matching score

40
Summary of the approach

Using concepts of differential geometry, extract
interest points action sketches that have local
spatiotemporal information by virtue of being
local extrema of curvatures in space-time
These event descriptors are associated with
uniform motion or stable changes in uniform
motion
Since the action sketches are view invariant,
comparing 2 actions is equivalent to checking if
there is a valid Fundamental Matrix relating the
positions of the action sketches for the
individual actions.

Write a Comment

User Comments (0)

About PowerShow.com