Recognizing Action at a Distance - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Recognizing Action at a Distance

Description:

Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 34
Provided by: efros
Category:

less

Transcript and Presenter's Notes

Title: Recognizing Action at a Distance


1
Recognizing Action at a Distance
  • A.A. Efros, A.C. Berg, G. Mori, J. Malik
  • UC Berkeley

2
Looking at People
Far field
Near field
  • 3-pixel man
  • Blob tracking
  • vast surveillance literature
  • 300-pixel man
  • Limb tracking
  • e.g. Yacoob Black, Rao Shah, etc.

3
Medium-field Recognition
4
Appearance vs. Motion
5
Goals
  • Recognize human actions at a distance
  • Low resolution, noisy data
  • Moving camera, occlusions
  • Wide range of actions (including non-periodic)

6
Our Approach
  • Motion-based approach
  • Non-parametric use large amount of data
  • Classify a novel motion by finding the most
    similar motion from the training set
  • Related Work
  • Periodicity analysis
  • Polana Nelson Seitz Dyer Bobick et al
    Cutler Davis Collins et al.
  • Model-free
  • Temporal Templates Bobick Davis
  • Orientation histograms Freeman et al Zelnik
    Irani
  • Using MoCap data Zhao Nevatia, Ramanan
    Forsyth

7
Gathering action data
  • Tracking
  • Simple correlation-based tracker
  • User-initialized

8
Figure-centric Representation
  • Stabilized spatio-temporal volume
  • No translation information
  • All motion caused by persons limbs
  • Good news indifferent to camera motion
  • Bad news hard!
  • Good test to see if actions, not just
    translation, are being captured

9
Remembrance of Things Past
  • Explain novel motion sequence by matching to
    previously seen video clips
  • For each frame, match based on some temporal
    extent

input sequence
Challenge how to compare motions?
10
How to describe motion?
  • Appearance
  • Not preserved across different clothing
  • Gradients (spatial, temporal)
  • same (e.g. contrast reversal)
  • Edges/Silhouettes
  • Too unreliable
  • Optical flow
  • Explicitly encodes motion
  • Least affected by appearance
  • but too noisy

11
Spatial Motion Descriptor
Image frame
Optical flow
12
Spatio-temporal Motion Descriptor


Sequence A
S


Sequence B
t
13
Football Actions matching
Input Sequence
Matched Frames
input
matched
14
Football Actions classification
10 actions 4500 total frames 13-frame motion
descriptor
15
Classifying Ballet Actions
16 Actions 24800 total frames 51-frame motion
descriptor. Men used to classify women and vice
versa.
16
Classifying Tennis Actions
6 actions 4600 frames 7-frame motion
descriptor Woman player used as training, man as
testing.
17
Classifying Tennis
  • Red bars show classification results

18
Querying the Database
input sequence
database
19
2D Skeleton Transfer
  • We annotate database with 2D joint positions
  • After matching, transfer data to novel sequence
  • Ajust the match for best fit

Input sequence
Transferred 2D skeletons
20
3D Skeleton Transfer
  • We populate database with rendered stick figures
    from 3D Motion Capture data
  • Matching as before, we get 3D joint positions
    (kind of)!

Input sequence
Transferred 3D skeletons
21
Do as I Do Motion Synthesis
input sequence
synthetic sequence
  • Matching two things
  • Motion similarity across sequences
  • Appearance similarity within sequence (like
    VideoTextures)
  • Dynamic Programming

22
Do as I Do
Source Motion
Source Appearance
3400 Frames
Result
23
Do as I Say Synthesis
run walk left swing walk
right jog
run
jog
swing
walk right
walk left
synthetic sequence
  • Synthesize given action labels
  • e.g. video game control

24
Do as I Say
  • Red box shows when constraint is applied

25
Actor Replacement
SHOW VIDEO
26
Conclusions
  • In medium field action is about motion
  • What we propose
  • A way of matching motions at coarse scale
  • What we get out
  • Action recognition
  • Skeleton transfer
  • Synthesis Do as I Do Do as I say
  • What we learned?
  • A lot to be said for the little guy!

27
Thank You
28
Smoothness for Synthesis
  • is action similarity between source and
    target
  • is appearance similarity within target
    frames
  • For every source frame i, find best target frame
  • by maximizing following cost function
  • Optimize using dynamic programming

29
The Database Analogy
30
Conclusions
  • Action is about motion
  • Purely motion-based descriptor for actions
  • We treat optical flow
  • Not as measurement of pixel displacement
  • But as a set of noisy features that are carefully
    smoothed and aggregated
  • Can handle very poor, noisy data

31
Cool Video, Attempt II
32
(No Transcript)
33
Comparing motion descriptors




t
Write a Comment
User Comments (0)
About PowerShow.com