Title: A Nonobtrusive Head Mounted Face Capture System
1A Non-obtrusive Head Mounted Face Capture System
Chandan K. ReddyMasters Thesis Defense
Dr. George C. Stockman (Main Advisor) Dr. Frank
Biocca (Co-Advisor) Dr. Charles Owen Dr. Jannick
Rolland (External Faculty)
2Modes of Communication
- Text only - e.g. Mail, Electronic Mail
- Voice only e.g. Telephone
- PC camera based conferencing e.g. Web cam
- Multi-user Teleconferencing
- Teleconferencing through Virtual Environments
- Augmented Reality Based Teleconferencing
3Face-to-Face Communication
There is no landscape that we know as well as
the human face. The twenty-five-odd square inches
containing the features is the most intimately
scrutinized piece of territory in existence,
examined constantly, and carefully, with far more
than an intellectual interest. - by Gary
Faigin.
A well developed Face-to Face Communication
System will advance the state-of-art
teleconferencing systems
Face-to Face Communication System is part of
Teleportal project that is being developed in
MIND Lab at Michigan State University and ODA Lab
at University of Central Florida.
4Problem Definition
- Face Capture System ( FCS )
- Virtual View Synthesis
- Depth Extraction and 3D Face Modeling
- Head Mounted Projection Displays
- 3D Tele-immersive Environments
- High Bandwidth Network Connections
5Thesis Contributions
- Complete hardware setup for the FCS.
- Camera-mirror parameter estimation for the
optimal configuration of the FCS. - Generation of quality frontal videos from two
side videos - Reconstruction of texture mapped 3D face model
from two side views - Evaluation mechanisms for the generated frontal
views.
6Existing Face Capture Systems
FaceCap3d - a product from Standard Deviation
Optical Face Tracker a product from Adaptive
Optics
Courtesy
Advantages Freedom for Head Movements Drawbacks
Obstruction of the users Field of view Main
Applications Character Animation and Mobile
environments
7Existing Face Capture Systems
Courtesy
Sea of Cameras (UNC Chappel Hill)
National tele-immersion Initiative
Advantages No burden for the user Drawbacks
Highly equipped environments and restricted head
motion Main Applications Teleconferencing and
Collaborative work
8Virtual View Synthesis
- View Interpolation for Image Synthesis by Chen
and Williams 93 - View Morphing by Seitz and Dyer 96
- The Lumigraph by Gortler et al 96
- Light Field Rendering by Levoy and Hanrahan
96 - Stereo based View Synthesis by Kanade et al
99 - Dynamic View Morphing by Manning and Dyer 99
- Spatio-Temporal View Interpolation by Vedula
and Kanade 02
9Depth Extraction and Face Modeling
- Depth Extraction
- Structured Light
- Shape from Shading
- Structure from Stereo
- Structure from Motion
- Face Modeling
- A parametric model of human faces Parke 74
- 3D individualized head model from orthogonal
views - Ip and Yin 96 - Realistic facial expressions synthesized from
photographs - Pighin et al 98 - Face model from a video sequence of face images -
Lai and Cheng 01
10Head Mounted Displays and Tele-immersive
Environments
- Head Mounted Displays Ivan Sutherland 68
- VIDEOPLACE Kruger 85
- CAVES Cruz Neira 93
- Teleconferencing using a Sea of Cameras Fuchs
et al 94 - Head Mounted Projective Displays - Fischer 96
- Degenerate CAVES (Immersa Desk, Immersive Work
Bench) Czernuszenko et al 97 - Office of the Future Raskar et al 98
- MAGIC BOOK Billinghurst et al 01
- Mobile Displays Feiner 02
11Proposed Face Capture System
(F. Biocca and J. P. Rolland, Teleportal
face-to-face system, Patent Filed, 2000.)
Novel Face Capture System that is being
developed. Two Cameras capture the corresponding
side views through the mirrors
12Advantages
- Users field of view is unobstructed
- Portable and easy to use
- Gives very accurate and quality face images
- Can process in real-time
- Simple and user-friendly system
- Static with respect to human head
- Flipping the mirror cameras view the users
viewpoint
13Applications
- Mobile Environments
- Collaborative Work
- Multi-user Teleconferencing
- Medical Areas
- Distance Learning
- Gaming and Entertainment industry
- Others
14System Design
15Equipment Required
16Transmission using Internet2
- Over 190 universities working in partnership with
industry to develop Internet2 - Internet2 connections are capable of transmitting
full broadcast quality video streams between
remote collaborative sites using MPEG2 video
encoding and decoding technology - Suitable for High Band width channel applications
like Medical visualization, Tele-conferencing and
other applications that make use of enormous
amount of data - The Internet 2 test bed has been established
between the MIND Lab at Michigan State University
and ODA Lab at University of Central Florida
implemented using MPEG 2 video streams
17Optical Layout
- Three Components to be considered
- Camera
- Mirror
- Human Face
18Specification Parameters
- Camera
- Sensing area 3.2 mm X 2.4 mm (¼).
- Pixel Dimensions Image sensed is of dimensions
768 X 494 pixels. Digitized image size is 320 X
240 due to restrictions of the RAM size. - Focal Length(Fc) 12 mm (VCL 12UVM).
- Field of View (FOV) 15.2 0 X 11.4 0.
- Diameter (Dc) 12mm
- Fnumber (Nc) 1 -achieve maximum lightness.
- Minimum Working Distance (MWD)- 200 mm.
- Depth of Field (DOF) to be estimated
19Specification Parameters (Contd.)
- Mirror
- Diameter (Dm) / Fnumber (Nm)
- Focal Length (fm)
- Magnification factor (Mm)
- Radius of curvature (Rm)
- Human Face
- Height of the face to be captured (H 250mm)
- Width of the face to be captured (W 175 mm)
- Distances
- Distance between the camera and the mirror.
(Dcm150mm) - Distance between the mirror and the face. (Dmf
200mm)
20Estimation of the variable parameters
The Imaging Equation
The Diameter of the mirror Dm 26.3 2 /
(10.16 N)
21Optimal Design Calculations
22Customization of Cameras and Mirrors
- Off-the-shelf cameras
- Customizing camera lens is a tedious task
- Trade-off has to be made between the field of
view and the depth of field - Sony DXC LS1 with 12mm lens is suitable for our
application - Custom designed mirrors
- A plano-convex lens with 40mm diameter is coated
with black on the planar side. - The radius of curvature of the convex surface is
155.04 mm. - The thickness at the center of the lens is 5 mm.
- The thickness at the edge is 3.7 mm.
23Block diagram of the system
24Experimental setup
25Virtual Video Synthesis
26Problem Statement
Generating virtual frontal view from two side
views
27Data processing
- Two synchronized videos are captured in real-time
(30 frames/sec) simultaneously. - For effective capturing and processing, the data
is stored in uncompressed format. - Machine Specifications (Lorelei _at_
metlab.cse.msu.edu) - Pentium III processor
- Processor speed 746 MHz
- RAM Size 384 MB
- Hard Disk write Speed (practical) 9 MB/s
- MIL-LITE is configured to use 150 MB of RAM
28Data processing (Contd.)
- Size of 1 second video 30 320 240 3
- 6.59 MB
- Using 150 MB RAM, only 10 seconds video from two
cameras can be captured - Why does the processing have to be offline?
- Calibration procedure is not automatic
- Disk writing speed must be at least 14 MB/S.
- To capture 2 videos of 640 480 resolution, the
Disk writing speed must be at least 54 MB/S ???
29Structured Light technique
Projecting a grid on the frontal view of the face
A square grid in the frontal view appears as a
quadrilateral (with curved edges) in the real
side view
30Color Balancing
- Hardware based approach
- White balancing of the cameras
- Why this is more robust ? why not software
based ? - There is no change in the input camera
- Better handling of varying lighting conditions
- No pre - knowledge of the skin color is required
- No additional overhead
- Its enough if both cameras are color balanced
relatively
31Off-line Calibration Stage
Left Calibration Face Image
Right Calibration Face Image
Projector
Transformation Tables
32Calibration Procedure
- Capture the two side views with a grid projected
on the face from the two cameras placed near two
ears and store them in the corresponding images
(ILs,t and IRu,v) . - Take some grid intersection points and define
transform functions for determining the (s,t)
coordinates in the left image (IL) and (u,v)
coordinates in the right image (IR). - Apply bilinear interpolation technique to obtain
any points inside the grid coordinates. - Based on transformation functions construct two
transformation tables (one for the left image and
one for the right) which have index as (x,y) and
gives a corresponding (s,t) of IL and (u,v) of IR.
33Operational Stage
Right Face Image
Left Face Image
Transformation Tables
Right Warped Face Image
Left Warped Face Image
Mosaiced Face Image
34Generation of Virtual Frontal Views
- Get the two side views without a grid projected
on the face from the two cameras placed near two
ears (IL and IR). - Generate the (x,y) coordinate in the virtual
view, move to the corresponding location in the
transformation table and store the mapping
(Mpx,y) at that pixel value. - Reconstruct the (x,y) coordinates of the frontal
view (image V) with the help of Mpx,y and the
values of ILs,t and IRu,v. - Smooth the geometrical and lighting variations
across the vertical midline in V by applying a
linear (one-dimensional) filter. - Continue this reconstruction of Vx,y for every
frame of the videos to produce the final virtual
frontal video.
35Bilinear Mapping
- To get the corresponding (u,v) point inside the
quadrilateral
Computed by linearly interpolating by fraction of
u along the top and bottom edges of the
quadrilateral, and then linearly interpolating by
fraction v between the two interpolated points to
yield destination point
36Virtual video synthesis (Calibration phase)
37Virtual video synthesis (contd.)
38Virtual Frontal Video
39Comparison of the Frontal Views
First row Virtual frontal views Second row
Original frontal views
40Video Synchronization (Eye blinking)
First row Virtual frontal views Second row
Original frontal views
41Face Data through Head Mounted System
423D Face Model
43Coordinate Systems
- There are five coordinate systems in our
application - World Coordinate System (WCS)
- Face Coordinate System (FCS)
- Left Camera Coordinate system (LCCS)
- Right Camera Coordinate system (RCCS)
- Projector Coordinate System (PCS)
-
44Camera Calibration
- Conversion from 3D world coordinates to 2D camera
coordinates - Perspective Transformation Model
Eliminating the scale factor uj (c11 c31
uj) xj (c12 c32 uj) yj (c13 c33 uj) zj
c14 vj (c21 c31 vj) xj (c22 c32 vj) yj
(c23 c33 vj) zj c24
45Calibration sphere
- A sphere can be used for Calibration
- Calibration points on the sphere are chosen in
such a way that the - Azimuthal angle is varied in steps of 45o
- Polar angle is varied in steps of 30o
- The location of these calibration points is known
in the 3D coordinate System with respect to the
origin of the sphere - The origin of the sphere defines the origin of
the World Coordinate System
46Spherical to Cartesian coordinates
- The 3D coordinates are known in the spherical
coordinate system - The 3D location (Px, Py, Pz) in cartesian
coordinate system is defined as - (R, ?, ?) in the spherical coordinate system
- R Radius of the Sphere
- ? - Azimuthal angle in the xy-plane from the
x-axis. - ? - Polar angle from the z-axis. (also known as
"colatitude of P). - The range - 0 ? ? ? 2 ? and 0 ? ? ? ?
- Px R Sin (?) Cos (?)
- Py R Sin (?) Sin (?)
- Pz R Cos (?)
- Given (R, ?, ?) we can compute (Px, Py, Pz)
47Projector Calibration
- Similar to Camera Calibration
- 2D image coordinates can not be obtained directly
from a 2D image. - A Blank Image is projected onto the sphere
- The 2D coordinates of the calibration points on
the projected image are noted - More points can be seen from the projectors
point of view some points are common to both
camera views - Results appear to have slightly more errors when
compared to the camera calibration
483D Face Model Construction
- Why?
- To obtain different views of the face
- To generate the stereo pair to view it in the
HMPD - Steps required
- Computation of 3D Locations
- Customization of 3D Model
- Texture Mapping
49Computation of 3D points
- 3d point estimation using stereo
- Stereo between two cameras is not possible
because of the occlusion by the facial features - Hence two stereo pair computations
- Left camera and projector
- Right camera and projector
- Using stereo, compute 3D points of prominent
facial feature points in FCS
503D Generic Face Model
A generic face model with 395 vertices and 818
triangles Left front view and Right side view
51Texture Mapped 3D Face
52Evaluation
53Evaluation Schemes
- Evaluation of facial expressions and is not
studied extensively in literature - Evaluation can be done for facial alignment, face
recognition for static images - Lip and eye movements in a dynamic event
- Perceptual quality How are the moods conveyed?
- Two types of evaluation
- Objective evaluation
- Subjective evaluation
54Objective Evaluation
- Theoretical Evaluation
- No human feedback required
- This evaluation can give us a measure of
- Face recognition
- Face alignment
- Facial movements
- Methods applied
- Normalized cross correlation
- Euclidean distance measures
55Evaluation Images
5 frames were considered for objective
evaluation First row virtual frontal views
Second row original frontal views
56Normalized Cross-Correlation
- Regions considered for normalized
cross-correlation - ( Left Real image Right Virtual image)
57Normalized Cross-Correlation
- Let V be the virtual image and R be the real
image - Let w be the width and h be the height of the
images - The Normalized Cross-correlation between the two
images V and R is given by - where
58Normalized Cross-Correlation
59Euclidean Distance measures
- Euclidean distance between two points i and j is
given by - Let Rij be the euclidean distance between two
points i and j in the real image - Let Vij be the euclidean distance between two
points i and j in the virtual image - Dij Rij - Vij
60Euclidean Distance measures
61Subjective Evaluation
- Evaluates the human perception
- Measurement of quality of a talking face
- Factors that might affect
- Quality of the video
- Facial movements and expressions
- Synchronization of the two halves of the face
- Color and Texture of the face
- Quality of audio
- Synchronization of audio
- A preliminary study has been made to assess the
quality of the generated videos
62Conclusion and Future Work
Future Work
Conclusion
Time Domain
Static
Dynamic
Virtual Frontal Image
Virtual Frontal Video
2D
Texture Mapped 3D Face Model
3D Facial Animation
3D
63Summary
- Design and implementation of a novel Face Capture
System - Generation of virtual frontal view from two side
views in a video sequence - Extraction of depth information using stereo
method - Texture mapped 3D face model generation
- Evaluation of virtual frontal videos
64Future Work
- Online processing in real-time
- Automatic calibration
- 3D facial animation
- Subjective Evaluation of the virtual frontal
videos - Data compression while processing and
transmission - Customization of camera lenses
- Integration with a Head Mounted Projection Display
65Thank You
- Doubts,
- Queries
-
- Suggestions