Learning Layered Motion Segmentations of Video - PowerPoint PPT Presentation

About This Presentation
Title:

Learning Layered Motion Segmentations of Video

Description:

per frame. Latent Image. Learning the Model. Given a video D we need to learn all model parameters ... how well the generated frames match the. observed data ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 68
Provided by: peopleC
Category:

less

Transcript and Presenter's Notes

Title: Learning Layered Motion Segmentations of Video


1
Learning Layered Motion Segmentations of Video
UNIVERSITY OF OXFORD
  • M. Pawan Kumar
  • Philip Torr
  • Andrew Zisserman

2
Aim
  • Given a video, to learn a model for the object

Input Video
Output Model
  • Model should (ideally)
  • describe the object completely and accurately
  • handle self-occlusion
  • be learnt in an unsupervised manner

3
Motivation
  • Object Recognition and Segmentation
  • Current object recognition methods often learn a
    model manually
  • Hand-labelling position of parts OR
  • Manually segmenting training images

Leibe and Schiele, DAGM 04
Borenstein and Ullman, ECCV 02
4
Motivation
  • Problem Such supervised methods are
    manually
  • intensive and practically
    infeasible
  • Solution
  • Use readily available data such as videos
  • Automatically learn models which can be used to
  • perform object recognition.

5
Challenges
Self Occlusion
Articulation
Lighting
Motion Blur
c(y) diag(a) c(x) b
c(y) ?c(y-m(t)) dt
6
Using a Generative Model
  • Parameters ?
  • Segments (mattes appearance)
  • Layering
  • Transformations ?Tt
  • Lighting parameters a and b
  • Motion parameters m obtained using ?Tt-1 and ?Tt

Latent Image
per segment per frame
7
Learning the Model
  • Given a video D we need to learn all model
    parameters ?
  • Segments (mattes appearance)
  • Layering
  • Transformations
  • Lighting and motion blur parameters
  • We define the posterior Pr(? D)
  • This measures how well the generated frames
    match the
  • observed data
  • We learn the best model by maximizing Pr(?
    D)

8
Previous Work
  • Sprite-based approach
  • Jojic and Frey ICCV 01
  • Williams and Titsias Neural Computation 04
  • Restricted to translation, rotation
  • Greedy optimisation
  • Spatial continuity not considered
  • Motion blur, lighting not handled

9
Outline
  • Model Description
  • Learning the Model
  • Initial Estimate
  • Refining Mattes
  • Updating appearance
  • Refining Transformation
  • Results

10
Model Description
  • Layered Representation
  • Mattes of segments represented as binary masks.
  • Appearance of part RGB value per point
  • ? T translation, rotation and anisotropic
    scale factors

11
Layering
  • Layer number li for segment pi
  • For non-overlapping segments li lj
  • li gt lj

12
Layering
  • Layer number li for segment pi
  • For non-overlapping segments li lj
  • li lt lj

13
Energy of the model
  • Pr(? D) Pr(D ?) Pr(?)
  • Energy ? -log (Pr(D ?))
  • Maximize Pr(? D) implies Minimize ?
  • ? Appearance Boundary

14
Appearance
Appearance measures consistency of observed and
generated RGB values over the entire video
sequence
Generated Frames
-
-
-
-
Observed Frames

Appearance
15
Boundary
Boundary gives preference to parts that are
separated by edges in most frames
x
y
  • If intensity of x and y are
  • similar, penalty is more.
  • If intensity of x and y are
  • different, penalty is less.

Penalty on Energy ?
16
Our Approach
  • 1) An initial estimate of ? is obtained by
    dividing the scene
  • into rigidly moving components.
  • 2) Mattes are optimised using graph cuts.
  • 3) Appearance parameters are updated.
  • 4) Transformation, lighting, motion blur are
    re-estimated.

17
Outline
  • Model Description
  • Learning the Model
  • Initial Estimate
  • Refining Mattes
  • Updating appearance
  • Refining Transformation
  • Results

18
1. Initial Estimate
Divide
Rectangular patches fi e.g. 3x3
Frame n
Track
Reconstructed Frame n1
19
Tracking Patches
Frame n
Patch fk
Transformation tk
n1
n2
n3

nj



?(tk) 0.6
nk


MRF over patches
Frame n1
20
Tracking Patches
Frame n
Patch fk
Transformation tk
n1
n2
n3

?(tk) 0.9
nj



nk


MRF over patches
Frame n1
21
Tracking Patches
Frame n
Patch fk
Transformation tk
?(tk) 0.7
n1
n2
n3

nj



nk


MRF over patches
Frame n1
22
Tracking Patches
Frame n
n1
n2
n3

nj



nk


?(tj,tk) d1jk if rigid motion
Frame n1
23
Tracking Patches
Frame n
n1
n2
n3

nj



nk


?(tj,tk) d2 otherwise
jk
Frame n1
24
Tracking Patches
  • Pr(t) ??(ti) ??(ti,tj)
  • Inference using belief propagation
  • Time complexity
  • Speed-up using Distance Transforms
  • Felzenszwalb and Huttenlocher, NIPS 2004
  • Memory requirements
  • Coarse-to-fine strategy
  • Vogiatzis et al., BMVC 2004
  • Multiple coarse labels chosen instead of best one

25
Coarse-to-fine Strategy
n1
n2
n3

nj



Similar labels
nk


Original MRF
26
Coarse-to-fine Strategy
n1
n2
n3

nj



?(Ti) maxj ?(tj)
nk


Group similar labels into one representative label
27
Coarse-to-fine Strategy
n1
n2
n3

?(Ti,Tj) maxk,l ?(tk,tl)
nj



nk


Solve the coarser MRF using Belief Propagation
28
Coarse-to-fine Strategy
n1
n2
n3

Best Labels
nj



nk


Choose m best representative labels per site
29
Coarse-to-fine Strategy
n1
n2
n3

nj



nk


Expand the labels to obtain a smaller MRF
30
Tracking Patches
31
Initial Estimate
Cluster rigidly moving points to obtain components
Frame n
Frame n1
Components
32
Initial Estimate
  • Cluster components based on appearance
    (cross-correlation)
  • Smallest member of a cluster is a segment

Components
Segments
33
  • Object is not described completely
  • Layering is not determined

We need to refine this estimate by minimizing ?
  • Re-label surrounding points using
  • consistency of motion
  • consistency of texture

Form of ? suggests using Graph Cuts
34
Graph Cuts
Consider the case of two segments.
ph
Cut
W(x1,ph)
x1
x2
x3

xj



W(xj,xk)
xk


xn
W(xn,pt)
pt
  • W(xi,pj) appearance component
  • W(xj,xk) boundary component

35
Graph Cuts
ph
W(x1,ph)
x1
x2
x3

xj



W(xj,xk)
xk


xn
W(xn,pt)
pt
36
Graph Cuts
  • The energy ? is of the form ? D(fX) ? V(fX,fY)
  • V is called regular if V(0,0) V(1,1) lt V(0,1)
    V(1,0)
  • For LPS, V is regular.
  • Theorem If V is regular, then the minimum cut
  • minimizes energy ?

-Kolmogorov and Zabih, PAMI 04.
37
Multi-way Graph Cuts
  • Each cut assigns label pi and pi to points
  • in binary matte of segment pi
  • Number of cuts Number of parts
  • Ideally, all cuts must be found simultaneously
  • NP-hard problem
  • ??-swap/ ?-expansion algorithm

38
??-swap
Relabel
  • One pair of parts is considered
  • at a time.
  • All other parts are kept fixed.
  • Points belonging to one part
  • can be re-labelled as the other
  • part.

Fixed
39
?-expansion
Refine
  • Iteratively find graph cuts
  • A cut corresponding to one
  • part is considered at a time
  • All other parts are kept fixed
  • Theorem ?-expansion finds a strong local minima.

Fixed
40
Outline
  • Model Description
  • Learning the Model
  • Initial Estimate
  • Refining Mattes
  • Updating appearance
  • Refining Transformation
  • Results

41
2. Refining Mattes
Consider one segment at a time (along with its
neighbouring segments)
Neighbouring Segment
Segment to be refined
42
2. Refining Mattes
Apply ??-swap
?
?
Neighbouring Segment
Segment to be refined
43
2. Refining Mattes
Apply ??-swap
?
?
Neighbouring Segment
Segment to be refined
44
2. Refining Mattes
Apply ?-expansion
?
Neighbouring Segment
Segment to be refined
45
2. Refining Mattes
Apply ?-expansion
?
Neighbouring Segment
Segment to be refined
46
2. Refining Mattes
Apply ?-expansion
?
Neighbouring Segment
Refined Segment
Iterate over segments till energy ? cannot be
minimized further.
47
iterations
Mattes
0
Frame 1
Frame 30
48
iterations
Mattes
1
Frame 1
Frame 30
49
iterations
Mattes
2
Frame 1
Frame 30
50
iterations
Mattes
3
Frame 1
Frame 30
51
iterations
Mattes
4
Frame 1
Frame 30
52
iterations
Mattes
5
Frame 1
Frame 30
53
iterations
Mattes
6
Frame 1
Frame 30
54
iterations
Mattes
7
Frame 1
Frame 30
55
iterations
Mattes
8
Frame 1
Frame 30
56
iterations
Mattes
9
Frame 1
Frame 30
57
Outline
  • Model Description
  • Learning the Model
  • Initial Estimate
  • Refining Mattes
  • Updating appearance
  • Refining Transformation
  • Results

58
3. Updating Appearance
  • Appearance of a point is the mean of RGB values
    of all
  • visible points it projects onto.

4. Refining Transformations
  • Transformations around initial estimate are
    explored.
  • The transformation resulting in least SSD is
    chosen.

59
3. Updating Appearance
  • Appearance of a point is the mean of RGB values
    of all
  • visible points it projects onto.

4. Refining Transformations
  • Transformations around initial estimate are
    explored.
  • The transformation resulting in least SSD is
    chosen.

60
Outline
  • Model Description
  • Learning the Model
  • Initial Estimate
  • Refining Mattes
  • Updating appearance
  • Refining Transformation
  • Results

61
Results
62
Results Complex Motion
63
Results Poor Quality Video
64
Applications
  • The learnt model is used for several applications
  • Motion Segmentation
  • Object Recognition
  • Object Category Specific Segmentation

65
Object Recognition
  • Matching the model to still images
  • Multiple shape exemplars and texture examples
  • Extending Pictorial Structures for Object
    Recognition BMVC 04

66
Class-Specific Segmentation
  • Global shape prior for graph cut based
    segmentation
  • OBJ CUT CVPR 05

67
Conclusions and Future Work
  • We have presented a method for unsupervised
    learning of
  • a generative model from videos.
  • Applications for object recognition and
    segmentation are
  • demonstrated.
  • Method needs to be extended to handle various
    visual
  • aspects.
Write a Comment
User Comments (0)
About PowerShow.com