Title: Learning Layered Motion Segmentations of Video
1Learning Layered Motion Segmentations of Video
UNIVERSITY OF OXFORD
- M. Pawan Kumar
- Philip Torr
- Andrew Zisserman
2Aim
- Given a video, to learn a model for the object
Input Video
Output Model
- Model should (ideally)
- describe the object completely and accurately
- handle self-occlusion
- be learnt in an unsupervised manner
3Motivation
- Object Recognition and Segmentation
- Current object recognition methods often learn a
model manually - Hand-labelling position of parts OR
- Manually segmenting training images
Leibe and Schiele, DAGM 04
Borenstein and Ullman, ECCV 02
4Motivation
- Problem Such supervised methods are
manually - intensive and practically
infeasible - Solution
- Use readily available data such as videos
- Automatically learn models which can be used to
- perform object recognition.
5Challenges
Self Occlusion
Articulation
Lighting
Motion Blur
c(y) diag(a) c(x) b
c(y) ?c(y-m(t)) dt
6Using a Generative Model
- Parameters ?
- Segments (mattes appearance)
- Layering
- Transformations ?Tt
- Lighting parameters a and b
- Motion parameters m obtained using ?Tt-1 and ?Tt
Latent Image
per segment per frame
7Learning the Model
- Given a video D we need to learn all model
parameters ? - Segments (mattes appearance)
- Layering
- Transformations
- Lighting and motion blur parameters
- We define the posterior Pr(? D)
- This measures how well the generated frames
match the - observed data
- We learn the best model by maximizing Pr(?
D)
8Previous Work
- Sprite-based approach
- Jojic and Frey ICCV 01
- Williams and Titsias Neural Computation 04
- Restricted to translation, rotation
- Greedy optimisation
- Spatial continuity not considered
- Motion blur, lighting not handled
9Outline
- Model Description
- Learning the Model
- Initial Estimate
- Refining Mattes
- Updating appearance
- Refining Transformation
- Results
10Model Description
- Mattes of segments represented as binary masks.
- Appearance of part RGB value per point
- ? T translation, rotation and anisotropic
scale factors
11Layering
- Layer number li for segment pi
- For non-overlapping segments li lj
12Layering
- Layer number li for segment pi
- For non-overlapping segments li lj
13Energy of the model
- Pr(? D) Pr(D ?) Pr(?)
- Energy ? -log (Pr(D ?))
- Maximize Pr(? D) implies Minimize ?
- ? Appearance Boundary
14Appearance
Appearance measures consistency of observed and
generated RGB values over the entire video
sequence
Generated Frames
-
-
-
-
Observed Frames
Appearance
15Boundary
Boundary gives preference to parts that are
separated by edges in most frames
x
y
- If intensity of x and y are
- similar, penalty is more.
- If intensity of x and y are
- different, penalty is less.
Penalty on Energy ?
16Our Approach
- 1) An initial estimate of ? is obtained by
dividing the scene - into rigidly moving components.
- 2) Mattes are optimised using graph cuts.
- 3) Appearance parameters are updated.
- 4) Transformation, lighting, motion blur are
re-estimated.
17Outline
- Model Description
- Learning the Model
- Initial Estimate
- Refining Mattes
- Updating appearance
- Refining Transformation
- Results
181. Initial Estimate
Divide
Rectangular patches fi e.g. 3x3
Frame n
Track
Reconstructed Frame n1
19Tracking Patches
Frame n
Patch fk
Transformation tk
n1
n2
n3
nj
?(tk) 0.6
nk
MRF over patches
Frame n1
20Tracking Patches
Frame n
Patch fk
Transformation tk
n1
n2
n3
?(tk) 0.9
nj
nk
MRF over patches
Frame n1
21Tracking Patches
Frame n
Patch fk
Transformation tk
?(tk) 0.7
n1
n2
n3
nj
nk
MRF over patches
Frame n1
22Tracking Patches
Frame n
n1
n2
n3
nj
nk
?(tj,tk) d1jk if rigid motion
Frame n1
23Tracking Patches
Frame n
n1
n2
n3
nj
nk
?(tj,tk) d2 otherwise
jk
Frame n1
24Tracking Patches
- Pr(t) ??(ti) ??(ti,tj)
- Inference using belief propagation
- Time complexity
- Speed-up using Distance Transforms
- Felzenszwalb and Huttenlocher, NIPS 2004
- Memory requirements
- Coarse-to-fine strategy
- Vogiatzis et al., BMVC 2004
- Multiple coarse labels chosen instead of best one
25Coarse-to-fine Strategy
n1
n2
n3
nj
Similar labels
nk
Original MRF
26Coarse-to-fine Strategy
n1
n2
n3
nj
?(Ti) maxj ?(tj)
nk
Group similar labels into one representative label
27Coarse-to-fine Strategy
n1
n2
n3
?(Ti,Tj) maxk,l ?(tk,tl)
nj
nk
Solve the coarser MRF using Belief Propagation
28Coarse-to-fine Strategy
n1
n2
n3
Best Labels
nj
nk
Choose m best representative labels per site
29Coarse-to-fine Strategy
n1
n2
n3
nj
nk
Expand the labels to obtain a smaller MRF
30Tracking Patches
31Initial Estimate
Cluster rigidly moving points to obtain components
Frame n
Frame n1
Components
32Initial Estimate
- Cluster components based on appearance
(cross-correlation) - Smallest member of a cluster is a segment
Components
Segments
33- Object is not described completely
- Layering is not determined
We need to refine this estimate by minimizing ?
- Re-label surrounding points using
- consistency of motion
- consistency of texture
Form of ? suggests using Graph Cuts
34Graph Cuts
Consider the case of two segments.
ph
Cut
W(x1,ph)
x1
x2
x3
xj
W(xj,xk)
xk
xn
W(xn,pt)
pt
- W(xi,pj) appearance component
- W(xj,xk) boundary component
35Graph Cuts
ph
W(x1,ph)
x1
x2
x3
xj
W(xj,xk)
xk
xn
W(xn,pt)
pt
36Graph Cuts
- The energy ? is of the form ? D(fX) ? V(fX,fY)
- V is called regular if V(0,0) V(1,1) lt V(0,1)
V(1,0) - For LPS, V is regular.
- Theorem If V is regular, then the minimum cut
- minimizes energy ?
-Kolmogorov and Zabih, PAMI 04.
37Multi-way Graph Cuts
- Each cut assigns label pi and pi to points
- in binary matte of segment pi
- Number of cuts Number of parts
- Ideally, all cuts must be found simultaneously
- NP-hard problem
- ??-swap/ ?-expansion algorithm
38 ??-swap
Relabel
- One pair of parts is considered
- at a time.
- All other parts are kept fixed.
- Points belonging to one part
- can be re-labelled as the other
- part.
Fixed
39 ?-expansion
Refine
- Iteratively find graph cuts
- A cut corresponding to one
- part is considered at a time
- All other parts are kept fixed
- Theorem ?-expansion finds a strong local minima.
Fixed
40Outline
- Model Description
- Learning the Model
- Initial Estimate
- Refining Mattes
- Updating appearance
- Refining Transformation
- Results
412. Refining Mattes
Consider one segment at a time (along with its
neighbouring segments)
Neighbouring Segment
Segment to be refined
422. Refining Mattes
Apply ??-swap
?
?
Neighbouring Segment
Segment to be refined
432. Refining Mattes
Apply ??-swap
?
?
Neighbouring Segment
Segment to be refined
442. Refining Mattes
Apply ?-expansion
?
Neighbouring Segment
Segment to be refined
452. Refining Mattes
Apply ?-expansion
?
Neighbouring Segment
Segment to be refined
462. Refining Mattes
Apply ?-expansion
?
Neighbouring Segment
Refined Segment
Iterate over segments till energy ? cannot be
minimized further.
47iterations
Mattes
0
Frame 1
Frame 30
48iterations
Mattes
1
Frame 1
Frame 30
49iterations
Mattes
2
Frame 1
Frame 30
50iterations
Mattes
3
Frame 1
Frame 30
51iterations
Mattes
4
Frame 1
Frame 30
52iterations
Mattes
5
Frame 1
Frame 30
53iterations
Mattes
6
Frame 1
Frame 30
54iterations
Mattes
7
Frame 1
Frame 30
55iterations
Mattes
8
Frame 1
Frame 30
56iterations
Mattes
9
Frame 1
Frame 30
57Outline
- Model Description
- Learning the Model
- Initial Estimate
- Refining Mattes
- Updating appearance
- Refining Transformation
- Results
583. Updating Appearance
- Appearance of a point is the mean of RGB values
of all - visible points it projects onto.
4. Refining Transformations
- Transformations around initial estimate are
explored. - The transformation resulting in least SSD is
chosen.
593. Updating Appearance
- Appearance of a point is the mean of RGB values
of all - visible points it projects onto.
4. Refining Transformations
- Transformations around initial estimate are
explored. - The transformation resulting in least SSD is
chosen.
60Outline
- Model Description
- Learning the Model
- Initial Estimate
- Refining Mattes
- Updating appearance
- Refining Transformation
- Results
61Results
62Results Complex Motion
63Results Poor Quality Video
64Applications
- The learnt model is used for several applications
- Motion Segmentation
- Object Recognition
- Object Category Specific Segmentation
65Object Recognition
- Matching the model to still images
- Multiple shape exemplars and texture examples
- Extending Pictorial Structures for Object
Recognition BMVC 04
66Class-Specific Segmentation
- Global shape prior for graph cut based
segmentation - OBJ CUT CVPR 05
67Conclusions and Future Work
- We have presented a method for unsupervised
learning of - a generative model from videos.
- Applications for object recognition and
segmentation are - demonstrated.
- Method needs to be extended to handle various
visual - aspects.