Title: Toward Learning MixtureofParts Pictorial Structures
1Toward Learning Mixture-of-Parts Pictorial
Structures
School of Electrical Engineering and Computer
Science Oregon State University
2Talk Objectives
- Overview OSU Digital Scout Project
- Describe problem of initial formation labeling
- Representational and inference challenges
- Mixture-of-Parts Pictorial Structures
- Model definition
- Inference
- Opportunities for learning
- Parameters and structure
- Speedup Learning
- Active Learning
- Transfer Learning
3The OSU Digital Scout Project
Objective compute semantic interpretations of
football video
High-level interpretation of play
Raw video
- Professional/college teams spend many hours
attaching semantic tags to video for DB access - We want to make this process much more automatic
- Support computer assisted strategic analysis of
opponents
Previous Work S. Intille. Visual Recognition
of Multi-Agent Action. PhD Thesis, MIT, 1999.
4Raw Video Data
- Obtained several games worth of home field video
from OSU football team - Once video file per play
- Exact same video used by coaches
- Video shot by single fixed location at top of
Reser stadium - Camera is constantly panning and zooming
5Registered Video Data
- Semantic interpretation requires registration of
video data to football field coordinates - Developed robust registration approach Hess
Fern, CVPR07
planar homography
6Problem Formation Labelling
- We consider a subproblem of full play
interpretation - Given initial registered video frame of a play
- Output offensive formation
- types and locations of 11 offensive players
player locations types
Thousands of possible formations
7Challenges in Formation Labelling
- Player appearances nearly identical
- Appearance not useful for inferring player type
- Difficult to robustly segment individual players
- part detector style approaches are difficult to
apply
8Challenges in Formation Labelling
Different formations can differ in subtle ways
9Problem Constraints
- A number of hard constraints imposed by rule book
- Exactly 11 players
- Exactly 7 players on line and 4 players behind
line - Exactly 1 quarterback and 1 center
- Location of center is at midfield or hash line
10Problem Constraints
- Soft constraints on relative spatial locations of
players - Constraints strongly depend on the set of player
types
11Previous Attempt
S. Intille. Visual Recognition of Multi-Agent
Action. PhD Thesis, MIT, 1999.
- Intille used KB of hard constraints to cast as a
SAT-like problem - Constraints near, to the left of, bit of
vertical space between, etc. - Simplified problem by hand-labelling the field
locations of the 11 players - Only tried to infer player types
- Failed to get the approach to work well and was
abandoned in previous work
12Structured Output Representations
- Infer type location for all of 11 players
- ti ?QBS, QB, C, LG, RG, LTE, . . . , 34 types
- li ?(0,0),(0,1),, (n,m), pixel location
- Our representation must capture
- Hard joint constraints among types
- Soft joint constraints among locations
conditioned on types and image data
22 output variables
- Possible to encode constraints via standard
discrete factor-graph models (e.g. CRFs, weighted
CSPs, ILP, etc.) - Such encodings appear problematic wrt
off-the-shelf inference techiques (?) - Domains of variables are huge many values
- Large factors (e.g. exactly 7 line type
players) - Location constraints are inherently numeric
13Pictorial Structures
- Offensive formations can be viewed as multi-part
articulated objects (parts correspond to players) - Pictorial structure models have been successful
for multi-part objects in computer vision - Local part appearance models
- Deformable connections
- Joint estimation of part locations
node values are part locations
simply pairwisegraphical models
Courtesy Fischler Elschlager
14(No Transcript)
15- When edge structure forms a tree can use DP to
compute map in O(nh2) time - n - of parts, h - of pixels
- h2 is often impractical
- If in addition dij(. , .) is a Mahalanobis
distance then can do computation in O(nh) time!
16Pictorial Structures for Football
- For a fixed set of player types, locations can be
well approximated by pictorial structure - But part sets (i.e. player types) varies across
plays - Cant use standard pictorial structures for our
problem - Can we still leverage benefits of pictorial
structures?
17Mixture of Parts Pictorial Structures (MoPPS)
- Captures constraints on legal part sets via pv
- Captures spatial constraints among parts via f
18MoPPS Inference
- Find MAP estimate of most likely set of parts and
their locations
- Worst case evaluate pictorial structure of each
legal part set - Requires over an hour of processing for our
problem - Need a structured MoPPS representation that can
be exploited for fast inference - We use a MoPPS Tree
19MoPPS Tree Representation
- Pictorial structure for a legal part set is
projection of global tree onto part set
20MoPPS Tree for Football
- 34 parts in model (one for each possible player
type) - Includes local observation models
- Includes pairwise spatial constraints
- Also provide constraints for evaluating legal
part sets
21MoPPS Tree Inference
- Becomes combinatorial optimization over legal
part sets - We use Branch-and-Bound Search (BBS)
22Branch-and-Bound Search
- Search nodes are part sets
- Internal nodes represent sets of legal part sets
- Leaves are legal part sets
- While solution not found
- Expand least node according to ordering relation
- Computer upper and lower bound
- Prune any dominated node
23Lower Bound Computations
- Monotonicity adding to a set of parts will never
result in reduced cost - Simply compute pictorial structure match of tree
projected on parts in search node - Can improve on this by adding cost for missing
parts
24Upper Bound Computations
- Match entire MoPPS tree to image data
- Use as a heuristic for quickly finding legal
completion of current part set - Cost of completion is upper bound
25MoPPS Tree Parameters for Football
- 34 parts, 3200 legal formations
- 16 basic player types plus subtypes
- Connections modeled as Gaussian overideal
location relative to parent player - Parameters manually set using training images
- Observation model uses two independent components
- based on background
model - based on color
histogramming
26Background Model
- Register lots of video to field model
- Learn kernel density estimate of color at each
pixel
27(No Transcript)
28(No Transcript)
29Results
30Anytime Behavior Correct
- Exhaustive search requires close to an hour
- Greedy search is fast but achieves only 80
accuracy - Mean-squared location error less than a yard
31Directions Learning MoPPS Models
- Successfully hand-coded a MoPPS model
- Was quite time consuming to get parameters right
- Motivates supervised structure and parameter
learning - MoPPS model takes average of 4 minutes per play
- Still too slow for weekly volume of game video
- Motivates speedup learning
- MoPPS model will sometimes need to be
relearned/adapted to different sets of video - Want to reduce labelling effort
- Motivates active and transfer learning
32Structure and Parameter Learning
- Goal learn structure and parameters of MoPPS
tree from labelled data - Assume hard constraints on legal part sets
provided - There are algorithms for learning the structure
of pictorial structures - Can easily modify to learn MoPPS tree
- Easy to combine with generative parameter learning
33Structure and Parameter Learning
- Issue pure generative parameter learning will
not likely be sufficient - Hand-coded model incorporate reward terms to
make up for deficiencies in generative
observation model - Suggests augmenting generative model with
discriminatively trained components - Issue inference time of 4 minutes makes most
generative training methods quite expensive - Suggests using approaches that do not perform
full joint inference for each parameter update
34Speedup Learning
- How can we speedup branch-and-bound search?
- There are a number of interesting settings
- Setting 1
- Given a MoPPS model upper/lower bound functions
- Learn an effective search space operators
- Setting 2
- Given a MoPPS model search space
- Learn more accurate upper/lower bound functions
- Setting 3
- Given a MoPPS model search space possibly
bounds - Learn an effective priority queue ranking
function
35Active Model Calibration
- Want to minimize labelling effort for new video
set - Active learning and/or semi-supervised
- Want to leverage experience with previous videos
- Transfer learning
- How can we combine these two paradigms for label
efficient active model calibration? - User interface is also critical
- Very rough idea
- Assume fixed model structure
- Learn prior on parameters from previous data sets
- Use prior for regularization and example
selection
36Summary and Future Work
- New structured output challenge problem
- We will provide labelled data set
- Can off-the-shelf structured learning approaches
work - Suggests investigating lesser studied directions
- Speedup learning
- Active calibration
- On the horizon
- Applying to defensive formations
- Full temporal play interpretation
- Mining strategic knowledge
- Strategic planning
37The
Digital
Scout
Project
http//eecs.oregonstate.edu/football