Learning Layered Motion Segmentations of Video

About This Presentation

Title:

Learning Layered Motion Segmentations of Video

Description:

per frame. Latent Image. Learning the Model. Given a video D we need to learn all model parameters ... how well the generated frames match the. observed data ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 68

Provided by: peopleC

Learn more at: https://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning Layered Motion Segmentations of Video

1
Learning Layered Motion Segmentations of Video
UNIVERSITY OF OXFORD

M. Pawan Kumar
Philip Torr
Andrew Zisserman

2
Aim

Given a video, to learn a model for the object

Input Video
Output Model

Model should (ideally)
describe the object completely and accurately
handle self-occlusion
be learnt in an unsupervised manner

3
Motivation

Object Recognition and Segmentation
Current object recognition methods often learn a
model manually
Hand-labelling position of parts OR
Manually segmenting training images

Leibe and Schiele, DAGM 04
Borenstein and Ullman, ECCV 02
4
Motivation

Problem Such supervised methods are
manually
intensive and practically
infeasible
Solution
Use readily available data such as videos
Automatically learn models which can be used to
perform object recognition.

5
Challenges
Self Occlusion
Articulation
Lighting
Motion Blur
c(y) diag(a) c(x) b
c(y) ?c(y-m(t)) dt
6
Using a Generative Model

Parameters ?
Segments (mattes appearance)
Layering
Transformations ?Tt
Lighting parameters a and b
Motion parameters m obtained using ?Tt-1 and ?Tt

Latent Image
per segment per frame
7
Learning the Model

Given a video D we need to learn all model
parameters ?
Segments (mattes appearance)
Layering
Transformations
Lighting and motion blur parameters
We define the posterior Pr(? D)
This measures how well the generated frames
match the
observed data
We learn the best model by maximizing Pr(?
D)

8
Previous Work

Sprite-based approach
Jojic and Frey ICCV 01
Williams and Titsias Neural Computation 04

Restricted to translation, rotation
Greedy optimisation
Spatial continuity not considered
Motion blur, lighting not handled

9
Outline

Model Description
Learning the Model
Initial Estimate
Refining Mattes
Updating appearance
Refining Transformation
Results

10
Model Description

Layered Representation

Mattes of segments represented as binary masks.

Appearance of part RGB value per point

? T translation, rotation and anisotropic
scale factors

11
Layering

Layer number li for segment pi
For non-overlapping segments li lj

li gt lj

12
Layering

Layer number li for segment pi
For non-overlapping segments li lj

li lt lj

13
Energy of the model

Pr(? D) Pr(D ?) Pr(?)
Energy ? -log (Pr(D ?))
Maximize Pr(? D) implies Minimize ?
? Appearance Boundary

14
Appearance
Appearance measures consistency of observed and
generated RGB values over the entire video
sequence
Generated Frames
-
-
-
-
Observed Frames

Appearance
15
Boundary
Boundary gives preference to parts that are
separated by edges in most frames
x
y

If intensity of x and y are
similar, penalty is more.
If intensity of x and y are
different, penalty is less.

Penalty on Energy ?
16
Our Approach

1) An initial estimate of ? is obtained by
dividing the scene
into rigidly moving components.
2) Mattes are optimised using graph cuts.
3) Appearance parameters are updated.
4) Transformation, lighting, motion blur are
re-estimated.

17
Outline

Model Description
Learning the Model
Initial Estimate
Refining Mattes
Updating appearance
Refining Transformation
Results

18
1. Initial Estimate
Divide
Rectangular patches fi e.g. 3x3
Frame n
Track
Reconstructed Frame n1
19
Tracking Patches
Frame n
Patch fk
Transformation tk
n1
n2
n3

nj

?(tk) 0.6
nk

MRF over patches
Frame n1
20
Tracking Patches
Frame n
Patch fk
Transformation tk
n1
n2
n3

?(tk) 0.9
nj

nk

MRF over patches
Frame n1
21
Tracking Patches
Frame n
Patch fk
Transformation tk
?(tk) 0.7
n1
n2
n3

nj

nk

MRF over patches
Frame n1
22
Tracking Patches
Frame n
n1
n2
n3

nj

nk

?(tj,tk) d1jk if rigid motion
Frame n1
23
Tracking Patches
Frame n
n1
n2
n3

nj

nk

?(tj,tk) d2 otherwise
jk
Frame n1
24
Tracking Patches

Pr(t) ??(ti) ??(ti,tj)
Inference using belief propagation
Time complexity
Speed-up using Distance Transforms
Felzenszwalb and Huttenlocher, NIPS 2004
Memory requirements
Coarse-to-fine strategy
Vogiatzis et al., BMVC 2004
Multiple coarse labels chosen instead of best one

25
Coarse-to-fine Strategy
n1
n2
n3

nj

Similar labels
nk

Original MRF
26
Coarse-to-fine Strategy
n1
n2
n3

nj

?(Ti) maxj ?(tj)
nk

Group similar labels into one representative label
27
Coarse-to-fine Strategy
n1
n2
n3

?(Ti,Tj) maxk,l ?(tk,tl)
nj

nk

Solve the coarser MRF using Belief Propagation
28
Coarse-to-fine Strategy
n1
n2
n3

Best Labels
nj

nk

Choose m best representative labels per site
29
Coarse-to-fine Strategy
n1
n2
n3

nj

nk

Expand the labels to obtain a smaller MRF
30
Tracking Patches
31
Initial Estimate
Cluster rigidly moving points to obtain components
Frame n
Frame n1
Components
32
Initial Estimate

Cluster components based on appearance
(cross-correlation)
Smallest member of a cluster is a segment

Components
Segments
33

Object is not described completely
Layering is not determined

We need to refine this estimate by minimizing ?

Re-label surrounding points using
consistency of motion
consistency of texture

Form of ? suggests using Graph Cuts
34
Graph Cuts
Consider the case of two segments.
ph
Cut
W(x1,ph)
x1
x2
x3

xj

W(xj,xk)
xk

xn
W(xn,pt)
pt

W(xi,pj) appearance component
W(xj,xk) boundary component

35
Graph Cuts
ph
W(x1,ph)
x1
x2
x3

xj

W(xj,xk)
xk

xn
W(xn,pt)
pt
36
Graph Cuts

The energy ? is of the form ? D(fX) ? V(fX,fY)
V is called regular if V(0,0) V(1,1) lt V(0,1)
V(1,0)
For LPS, V is regular.
Theorem If V is regular, then the minimum cut
minimizes energy ?

-Kolmogorov and Zabih, PAMI 04.
37
Multi-way Graph Cuts

Each cut assigns label pi and pi to points
in binary matte of segment pi
Number of cuts Number of parts
Ideally, all cuts must be found simultaneously
NP-hard problem
??-swap/ ?-expansion algorithm

38
??-swap
Relabel

One pair of parts is considered
at a time.
All other parts are kept fixed.
Points belonging to one part
can be re-labelled as the other
part.

Fixed
39
?-expansion
Refine

Iteratively find graph cuts
A cut corresponding to one
part is considered at a time
All other parts are kept fixed
Theorem ?-expansion finds a strong local minima.

Fixed
40
Outline

Model Description
Learning the Model
Initial Estimate
Refining Mattes
Updating appearance
Refining Transformation
Results

41
2. Refining Mattes
Consider one segment at a time (along with its
neighbouring segments)
Neighbouring Segment
Segment to be refined
42
2. Refining Mattes
Apply ??-swap
?
?
Neighbouring Segment
Segment to be refined
43
2. Refining Mattes
Apply ??-swap
?
?
Neighbouring Segment
Segment to be refined
44
2. Refining Mattes
Apply ?-expansion
?
Neighbouring Segment
Segment to be refined
45
2. Refining Mattes
Apply ?-expansion
?
Neighbouring Segment
Segment to be refined
46
2. Refining Mattes
Apply ?-expansion
?
Neighbouring Segment
Refined Segment
Iterate over segments till energy ? cannot be
minimized further.
47
iterations
Mattes
0
Frame 1
Frame 30
48
iterations
Mattes
1
Frame 1
Frame 30
49
iterations
Mattes
2
Frame 1
Frame 30
50
iterations
Mattes
3
Frame 1
Frame 30
51
iterations
Mattes
4
Frame 1
Frame 30
52
iterations
Mattes
5
Frame 1
Frame 30
53
iterations
Mattes
6
Frame 1
Frame 30
54
iterations
Mattes
7
Frame 1
Frame 30
55
iterations
Mattes
8
Frame 1
Frame 30
56
iterations
Mattes
9
Frame 1
Frame 30
57
Outline

Model Description
Learning the Model
Initial Estimate
Refining Mattes
Updating appearance
Refining Transformation
Results

58
3. Updating Appearance

Appearance of a point is the mean of RGB values
of all
visible points it projects onto.

4. Refining Transformations

Transformations around initial estimate are
explored.
The transformation resulting in least SSD is
chosen.

59
3. Updating Appearance

Appearance of a point is the mean of RGB values
of all
visible points it projects onto.

4. Refining Transformations

Transformations around initial estimate are
explored.
The transformation resulting in least SSD is
chosen.

60
Outline

Model Description
Learning the Model
Initial Estimate
Refining Mattes
Updating appearance
Refining Transformation
Results

61
Results
62
Results Complex Motion
63
Results Poor Quality Video
64
Applications

The learnt model is used for several applications
Motion Segmentation
Object Recognition
Object Category Specific Segmentation

65
Object Recognition

Matching the model to still images
Multiple shape exemplars and texture examples
Extending Pictorial Structures for Object
Recognition BMVC 04

66
Class-Specific Segmentation

Global shape prior for graph cut based
segmentation
OBJ CUT CVPR 05

67
Conclusions and Future Work

We have presented a method for unsupervised
learning of
a generative model from videos.
Applications for object recognition and
segmentation are
demonstrated.
Method needs to be extended to handle various
visual
aspects.

Write a Comment

User Comments (0)

About PowerShow.com

Learning Layered Motion Segmentations of Video - PowerPoint PPT Presentation

Learning Layered Motion Segmentations of Video

per frame. Latent Image. Learning the Model. Given a video D we need to learn all model parameters ... how well the generated frames match the. observed data ... – PowerPoint PPT presentation