Estimating 3D Body Pose using Uncalibrated Cameras - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Estimating 3D Body Pose using Uncalibrated Cameras

Description:

Very hard problem for large baselines. Motivation. Rosales ... [Wren et. al. 97] [Gavrila, Davis 96] [Ju et. al. 96] [Hogg 83] [Baumberg, Hogg 93,94] ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 42
Provided by: ValuedGate168
Category:

less

Transcript and Presenter's Notes

Title: Estimating 3D Body Pose using Uncalibrated Cameras


1
Estimating 3D Body Pose using Uncalibrated Cameras
Rómer Rosales, Matheen Siddiqui, Joni Alon, Stan
Sclaroff Image and Video Computing Group Computer
Science Department Boston University
Now at University of Toronto
2
Problem Definition
1
2
3
3D Body Pose
Uncalibrated cameras
3
Motivation
  • Calibrated setups
  • Rarely remain calibrated for extended periods
  • Tend to be expensive
  • Cameras may be mobile
  • Previous approaches
  • Need feature correspondence
  • Very hard problem for large baselines

4
Background Pose Estimation Via Tracking
  • Tracking match image to articulated model at
    each frame
  • Limitations
  • Manual Initialization
  • Heavy dependency on previous estimate
  • Non-linear optimization at each step
  • Projective ambiguities
  • Probabilistic, learned motion models create too
    strong a prior.

Example Tracking Systems
RehgMorris98 Bregler98
5
Step 1 From Visual Features to 2D Pose
Hypotheses
Hypotheses are generated independently for each
camera view using the Specialized Mappings
Architecture (SMA)
6
What Do the Hypotheses Look Like?
7
Specialized Mappings Architecture (SMA)
One-to-Many relationship
Forward problem is Easy
c
b
b
a
Poses (Output Space)
x
x
Observations (Input Space)
Forward problem (a.k.a. forward kinematics)
Inverse problem
Single views are used to infer 2D poses using SMA
NIPS 2001
8
Why Virtual Cameras?
Scale
Translation
Rotation
  • SMA generates pose hypotheses using Hu moments,
    which are invariant to image-plane translation,
    scale and rotation.
  • Thus, cameras recovered are not the real cameras
    but rather virtual

9
Step 2 From 2D Pose Hypotheses to 3D Pose and
Cameras
E-step
All hypotheses from all cameras
Assign importance to hypotheses. from cams.
P(yH,X,W)
X
Est. pose
H
M-step
Sk
W
w1
Generalized SFM (update X,W)
Hypothesis covariances
Est. camera parameters
w2
EM finds most consistent representation from
hypotheses
w3
10
Proposed ML Approach
  • Camera pose hypotheses are independent given
    the pose X and cameras W
  • Introduce map label (latent) variables y
  • Intractable, but can solve approximately via EM.

H
11
EM Algorithm for Estimating 3D Body Pose from
Virtual Cameras E-Step
  • E-Step
  • We use
  • where R is a computer graphics rendering function.

12
EM Algorithm for Estimating 3D Body Pose from
Virtual Cameras M-Step
  • M-Step
  • Leads to a generalized structure-from-motion
    problem
  • Alternate minimization between 3D pose and
    cameras.
  • Initialize 3D pose using factorization.

13
Multiple-View SMA Experiments(3 cameras)
Inputs (First camera)
Estimates (From the first camera viewpoint)
Example 1
Example 2
14
Example 3D Reconstruction (Synthetic inputs, not
in training set)
Left Estimate Right Ground-Truth
15
Quantitative Evaluation
(a)
(b)
(a) Optimal angular displacement between 3
cameras is 45 deg. (b) Significant increase in
accuracy between 2 and 3 views.
16
Summary
Summary
  • Multiple-view geometry and probabilistic learning
    in a single framework.
  • Probabilistic estimation of 3D articulated body
    pose and virtual cameras parameters.
  • No need for feature tracking (edges, corners,
    etc.).
  • Camera calibration is not required.
  • Wider camera baseline allows better
    reconstruction (usually a problem for structure
    from motion).
  • Estimation of only a single angular displacement
    parameter per virtual camera (instead of the
    usual 11) improves quality of 3D body pose
    estimates.

17
Future Directions
  • Probabilistic dependence of time (dynamical
    systems)
  • Feature selection via sampling
  • Adapting model to shape variations (transfer)
  • Specific morphology of subject
  • Generative model for detection segmentation
  • Self-calibrating environments with mobile cameras

18
(No Transcript)
19
Abstract
20
Key Practical Issues
  • Sufficient data to account for all configurations
    given the model
  • Number of specialized functions
  • Differences in cue data (training vs testing)
  • Discriminative power of visual features
  • What mapping function form?

21
Improving EM by knowledge of Inverse Map
  • Redefine posterior, with softmax
  • Why is this good?. More accurate estimate of
  • No longer have to assume form of
  • Comparison is done in input space

22
The Camera Matrix P
Homogeneous
In our special case
23
Whats so special about our special case
  • Recall feature invariances rotation,
    translation, and scaling.
  • SMA 2D reconstruction inherits same invariances
    (/-)
  • Camera model (implicit) for reconstruction is
    different from virtual camera model that captured
    sequences
  • Due to invariances, camera parameters are known,
    up to a single angular displacement.

24
Computer Vision Related Work Taxonomy Input and
output space
Y
Perona et. al. 00
Sidenbladh et. al. 00
Bregler, Malik 98
Barron, Kakadiaris 00
Gavrila, Davis 96
3d joints
Brand 99
Kakadiaris, Metaxas 96
Pentland, Horowitz 91
Rohr 93
Howe, et.al 99
Goncalves et.al 95
Rehg. Morris 95,97,98
2d joints
Yamamoto et,al. 91
Cham, Rehg 99
Ju et. al. 96
Davis et.al.99
Wren et. al. 97
Body parts
Fujiyoshi Lipton 99
Polana, Nelson 96,97
Black, Yacoob 97
Low level motion model (e.g actions)
Rosales,Sclaroff 98,99
Black 99
Blake et.at 95
Hogg 83 Baumberg, Hogg 93,94
Davis, Bobick 97
Isard, Black 98
Contours
Silhouette
X
Silhouette Features
2d joints
Flow field
Region based
Contours
25
Geometric Constraints
N number of joints 20 C number of cameras
4
minimize
2D location Estimate
2D location From SMA inference
Know yij and Pi. Solve for Xj
26
Summary
  • Specialized Mappings Architecture non-linear
    supervised learning.
  • One-to-many relationships
  • Use of inverse map
  • Learning and Inference algorithms for SMA
  • Application to pose estimation (body-hands) from
    single images
  • Tracking not needed, known disadvantages are
    avoided (initialization, iterative methods,
    sensitivity to quality of all frames, occlusion
    handling, multiple viewpoints)
  • Certain visual ambiguities can be solved by
    statistical modeling
  • Extension of SMA to multiple sources of
    hypotheses
  • Non iterative!! Depends almost exclusively on
    speed of function evaluation O(MN).

27
Summary
  • Specialized Mappings Architecture for non-linear
    supervised learning. Alternative for the more
    general problem of multiple function
    approximation from data
  • Certain visual ambiguities can be solved, even on
    single images, by statistical modeling
  • Application to pose estimation (body-hands) from
    single images
  • Tracking not needed, known disadvantages are
    avoided (initialization, iterative methods,
    sensitivity to quality of all frames, occlusion
    handling, multiple viewpoints)
  • Structure from point correspondence
  • Given segmentation linear inference time O(MN).
    Non iterative!! Depends almost exclusively on
    speed of function evaluation

(http\\www.cs.bu.edu/groups/ivc)
28
Extension Multiple-View SMA
  • Automatic camera positioning (mobile cameras)
  • Place cameras
  • Watch articulated object
  • Estimate relative positions
  • Cameras can work jointly to determine 3D
    structure
  • Previous approaches
  • Need feature correspondence
  • Too difficult for large baselines

29
Specialized Mappings Architecture (Simultaneous
Estimation of Partitions and Maps)
30
3D Body Pose through Virtual Cameras (Overview)
frame
3
2
camera
1
1
2
3
3D joints
4
31
SMA Model(.vs. usual tracking approach)
Pose
Pose
Visual Features
Tracking for Pose Estimation
32
Problem Definition
  • Given
  • Determine Camera locations and 3D structure

33
Problem Formulation(Probabilistic model
dependencies)
2D Pose hypotheses
H1
H2
H
HC

X
W
Camera params
3D Pose
Hypotheses labels
34
Problem Formulation II Estimation of 3D Pose X
and Camera Relative Locations W
  • Proposed ML solution
  • Set of camera pose hypotheses are independent
    given the pose X and cameras W
  • Let us introduce latent variable y
  • Intractable, but can solve approximately using
    alternating minimizations

35
Proposed ML Solution
3D Pose
Camera params
2D Pose hypotheses
Hypotheses Independence
36
(No Transcript)
37
Quantitative Evaluation I
38
Multiple-View Approach Step 2 From 2D Pose
Hypotheses to 3D Body Pose and Cameras
X
All hypotheses from all cameras
Est. pose
ML Estimation (EM)
H
Sk
W
w1
Hypothesis covariances
Est. camera parameters
w2
w3
39
Motivation
  • Automatic camera positioning (mobile cameras)
  • Place cameras
  • Watch articulated object
  • Estimate relative positions
  • Cameras can work jointly to determine 3D
    structure
  • Previous approaches
  • Need feature correspondence
  • Very hard problem for large baselines

40
Multiple-View SMA Experiments(3 cameras)
Inputs (First camera)
Estimates (From the first camera viewpoint)
41
Multiple-View SMA Experiments
Inputs (First camera)
Estimates (From the first camera viewpoint)
Write a Comment
User Comments (0)
About PowerShow.com