Title: Stereo Vision
1Stereo Vision
- CS485/685 Computer Vision
- Prof. Bebis
2What is the goal of stereo vision?
- The recovery of the 3D structure of a scene using
two or more images of the 3D scene, each acquired
from a different viewpoint in space. - The term binocular vision is used when two
cameras are employed.
3Stereo setup
Cameras in arbitrary position and orientation
More convenient setup
4Stereo terminology
- Fixation point the point of
- intersection of the optical axes.
- Baseline the distance between
- the centers of projection.
- Conjugate pair any point in
- the scene that is visible in both
- cameras will be projected to a
- pair of image points in the two
- images (corresponding points).
5Stereo terminology (contd)
- Disparity the distance between
- corresponding points when the
- two images are superimposed.
- Disparity map the disparities of
- all points form the disparity map.
6Triangulation
- Determines the position of a point in space by
finding the intersection of the two lines passing
through the center of projection and the
projection of the point in each image.
7The two problems of stereo
- The correspondence problem.
- (2) The reconstruction problem.
8The correspondence problem
- Finding pairs of matched points such that each
point in the pair is the projection of the same
3D point.
9The correspondence problem (contd)
- Triangulation depends crucially on the solution
of the correspondence problem! - Ambiguous correspondences may lead to several
different consistent interpretations of the
scene.
10The reconstruction problem
- Given the corresponding points, we compute the
disparity map. - The disparity map can be converted to a 3D map of
the scene assuming that the stereo geometry is
known.
11Recovering depth (i.e., reconstruction)
- Consider recovering the position of P from its
projections pl and pr .
Pl P in left camera coordinates Pr P in right
camera coordinates
right camera Pr(Xr,Yr,Zr)
left camera Pl(Xl,Yl,Zl)
12Recovering depth (contd)
- Suppose that the two cameras are related by the
following transformation - Using Zr Zl Z and Xr Xl - T we have
-
-
- where dxl xr is the disparity (i.e., the
difference in the position between the
corresponding points in the two images)
i.e., (R, -T) aligns right camera with left
camera
13Stereo camera parameters
- Extrinsic parameters (R, T) describe the
relative position and orientation of the two
cameras. - Intrinsic parameters characterize the
transformation from image plane coordinates to
pixel coordinates, in each camera.
14Stereo camera parameters (contd)
- R and T can be determined from the extrinsic
parameters of each camera
(Tl,Rl) align left camera with world (Tr, Rr)
align right camera with world
(see homework)
15Correspondence problem
- Some points in each image will have no
corresponding points in the other image. - i.e., cameras might have different field of view.
- A stereo system must be able to determine the
image parts that should not be matched.
16Correspondence problem (contd)
- Two main approaches
-
- Intensity-based attempt to establish a
correspondence by matching image intensities. -
- Feature-based attempt to establish a
correspondence by matching a sparse sets of image
features.
17Intensity-based Methods
- Match image sub-windows between the two images
(e.g., using correlation).
18Correlation-based Methods
19Correlation-based Methods (contd)
- Normalized cross-correlation Normalize c(d) by
subtracting the mean and dividing by the standard
deviation.
20Correlation-based Methods (contd)
- The success of correlation-based methods depends
on whether the image window in one image exhibits
a distinctive structure that occurs infrequently
in the search region of the other image.
21Correlation-based Methods (contd)
- How to choose the window size W?
- Too small a window may not capture enough image
structure, and may be too noise sensitive (i.e.,
less distinctive, many false matches). - Too large a window makes matching less sensitive
to noise (desired) but also harder to match.
22Correlation-based Methods (contd)
- An adaptive searching window has been proposed in
the literature (i.e., adapt both shape and size). -
T. Kanade and M. Okutomi, A Stereo Matching
Algorithm with an Adaptive Window Theory and
Experiment, IEEE Transactions on Pattern Analysis
and Machine Intelligence, Vol 16 No. 9, 1994.
23Correlation-based Methods (contd)
- How to choose the size and location of R( pl)?
- If the distance of the fixating point from the
cameras is much larger than the baseline, the
location of R( pl) can be chosen to be the same
as the location of pl . - The size of R( pl) can be estimated from the
maximum range of disparities (or depths) we
expect to find in the scene.
24Wide Baseline Stereo Matching
- More powerful methods are needed when dealing
with - wide baseline stereo images (i.e., affine
invariant region matching).
T. Tuytelaars and L. Van Gool, "Wide Baseline
Stereo Matching based on Local, Affinely
Invariant Regions" , British Machine Vision
Conference, 2000.
25Feature-based Methods
- Look for a feature in an image that matches a
feature in the other. - Edge points
- Line segments
- Corners
- A set of geometric features are used for
matching.
26Feature-based Methods (contd)
- For example, a line feature descriptor could
contain - the length, l
- the orientation, ?
- the coordinates of the midpoint, m
- the average intensity along the line, i
- Similarity measures are based on matching feature
descriptors, e.g., line matching can be done as
follows - where w0, ..., w3 are weights
27Intensity-based vs feature-based approaches
- Intensity-based methods
- Provide a dense disparity map.
- Need textured images to work well.
- Sensitive to illumination changes.
- Feature-based methods
- Faster than correlation-based methods.
- Provide sparse disparity maps.
- Relatively insensitive to illumination changes.
28Structured lighting
- Feature-based methods cannot be used when objects
have smooth surfaces or surfaces of uniform
intensity. - Patterns of light can be projected onto the
surface of objects, creating interesting points
even in regions which would be otherwise smooth.
29Search Region R(pl)
- Finding corresponding points is time consuming!
- Can we reduce the size of the search region?
YES
30Epipolar Plane and Lines
- Epipolar plane the plane passing through the
centers of projection - and the point in the scene.
- Epipolar line the intersection of the epipolar
plane with the - image plane.
31Epipoles
Left epipole the projection of Or on the left
image plane. Right epipole the projection of
Ol on the right image plane.
32Epipoles (contd)
All epipolar lines go through the cameras epipole
33Epipoles (contd)
If the line through the center of projection is
parallel to one of the image planes, the
corresponding epipole is at infinity.
two epipoles at infinity
one epipole at infinity
34Epipolar constraint
- Given pl , P can lie anywhere on the ray from Ol
through pl . - The image of this ray in the right image image is
the epipolar line through the corresponding point
pr .
Mapping between points in the left image and
lines in the right image and vice versa.
35Importance of epipolar constraint
- The search for correspondences is reduced to 1D.
- Very effective for rejecting false matches due to
occlusion.
pl
R(pl)
Il
Ir
36Epipolar Lines Example
37Ordering of corresponding points
- Conjugate points along corresponding epipolar
lines have the same order in each image. - Exception when corresponding points lie on the
same epipolar plane and imaged from different
sides.
38Estimating epipolar geometry
- Given a point in one image, how do we find its
epipolar line in the other image? - Need to estimate the epipolar geometry which is
encoded by two matrices - Essential matrix
- Fundamental matrix
39Quick Review
- Cross product
- Homogeneous representation of lines
40Vector Cross Product
41Homogeneous (projective) representation of lines
- A line ax by c 0 is represented by the
homogeneous vector below (projective line) - Any vector represents the same line.
42Homogeneous (projective) representation of lines
- Duality in homogeneous (projective) coordinates,
points and lines are dual (we can interchange
their roles). - Some properties involving points and lines
- (1) The point x lies on the line iff xT l 0
- (2) Two lines l, m define a point p p l x m
- (3) Two points p, q define a line l l p
x q
43Homogeneous (projective) representation of
linesQuick Review
44Homogeneous (projective) representation of lines
45Homogeneous (projective) representation of lines
w0, therefore, they intersect at infinity!
46The essential matrix E
- The equation of the epipolar plane is given by
the following co-planarity condition (assume that
the world coordinate system is aligned with the
left camera)
normal to epipolar plane
47The essential matrix E (contd)
- Using Pr R(Pl - T) we have
- Expressing cross product as
- matrix multiplication we have
(RTPr)T(T x Pl) 0
where E RS is called the essential matrix.
(the matrix S is always rank deficient, i.e.,
rank(S) 2)
48The essential matrix E (contd)
- Using image plane coordinates
- The equation defines a mapping
between points and epipolar lines (in homogeneous
coordinates).
plfPl/Zl prfPr/Zr ZlZr
We will revisit this later
49The essential matrix E (contd)
- Properties of the essential matrix E
- (1) Encodes the extrinsic parameters only
- (2) Has rank 2
- (3) Its two nonzero singular values are equal
50The fundamental matrix F
- Suppose that Ml and Mr are the matrices of the
intrinsic parameters of the left and right
camera, then the pixel coordinates and
of pl and pr are - Using the above equations and
we have -
- where
is called the - fundamental matrix
51The fundamental matrix F (contd)
- Properties of the fundamental matrix
- Encodes both the extrinsic and intrinsic
parameters - (2) Has rank 2
52Estimating F (or E) eight-point algorithm
- We can estimate the fundamental matrix from point
correspondences only (i.e., without information
at all on the extrinsic or intrinsic camera
parameters). - Each correspondence leads to a homogeneous
equation of the form
or
53Estimating F (or E) the eight-point algorithm
(contd)
- We can determine the entries of the matrix F (up
to an unknown scale factor) by establishing - correspondences
- A is rank deficient (i.e., rank(A) 8)
- The solution is unique up to a scale factor
(i.e., proportional to the last column of V where
A UDVT ).
54Estimating F (or E) the eight-point algorithm
(contd)
(i.e., corresponding to the smallest singular
value)
55Estimating F (or E) the eight-point algorithm
(contd)
Uncorrected F
Corrected F
Epipolar lines must intersect at the epipole!
56Estimating F (or E) the eight-point algorithm
(contd)
Normalization
R. Hartley, "In Defense of the Eight-Point
Algorithm", IEEE Transactions on Pattern Analysis
and Machine Intelligence, 19(6) 580-593, 1997.
57Finding the epipolar lines
- The equation below defines a mapping between
points and epipolar lines
Reminder a point x lies on a line l iff xT l
0
58Finding the epipolar lines (contd)
59Locating the epipoles from F (or E)
60Locating the epipoles from F (or E) (contd)
The solution is proportional to the last column
of U in the SVD decomposition of F
61Rectification
Epipolar lines are colinear and parallel to
baseline
Epipolar lines are at arbitrary locations and
orientations
62Rectification (contd)
- Rectification is a transformation which makes
pairs of conjugate epipolar lines become
collinear and parallel to the horizontal axis
(i.e., baseline) - Searching for corresponding points becomes much
simpler for the case of rectified images
63Rectification (contd)
- Disparities between the
- images are in the x-direction
- only (i.e., no y disparity)
64Rectification Example
before rectification
after rectification
65Rectification (contd)
- Main steps
- (assuming knowledge of the extrinsic/intrinsic
stereo parameters) - (1) Rotate the left camera so
- that the epipolar lines become
- parallel to the horizontal axis
- (i.e., epipole is mapped to infinity).
-
66Rectification (contd)
- (2) Apply the same rotation to
- the right camera to recover the
- original geometry.
- (3) Align right camera with left camera using
rotation R. - (4) Adjust scale in both camera frames.
67Rectification (contd)
- Consider step (1) only (i.e., other steps are
easy) - Construct a coordinate system (e1, e2, e3)
centered at Ol . - Aligning it with image plane coordinate system.
-
68Rectification (contd)
- (1.1) e1 is a unit vector along the vector T
(baseline)
69Rectification (contd)
- (1.2) e2 must be perpendicular to e1 (i.e., cross
product of e1 with the optical axis)
z0,0,1T
70Rectification (contd)
- (1.3) choose e3 as the cross product of e1 and e2
71Rectification (contd)
The rotation matrix that maps the left epipole to
infinity is the transformation that aligns (e1,
e2, e3) with (i ,j, k)
72The reconstruction problem
- Both intrinsic and extrinsic parameters are
known we can solve the reconstruction problem
unambiguously by triangulation. - (2) Only the intrinsic parameters are known we
can solve the reconstruction problem only up to
an unknown scaling factor. - (3) Neither the extrinsic nor the intrinsic
parameters are available we can solve the
reconstruction problem only up to an unknown,
global projective transformation.
73(1) Reconstruction by triangulation
- Assumptions and problem statement
- (1) Both the extrinsic and intrinsic camera
parameters are known. -
- (2) Compute the location of the 3D points from
their projections pl and pr
74What is the solution?
- The point P lies at the intersection of the two
rays from Ol through pl and from Or through pr
75Practical issues
- The two rays will not intersect exactly in space
because of errors in the location of the
corresponding features. - Find the point that is closest to both rays
(i.e., midpoint P of line segment being
perpendicular to both ray)s.
76Parametric line representation
- The parametric representation of the line passing
through P1 and P2 is given by - The direction of the line is given by the vector
t(P2 - P1)
t ? 0,1
77Computing P
- Parametric representation of the line l passing
through Ol and pl - Parametric representation of the line r passing
through Or and pr
since
78Computing P (contd)
- Suppose the P1 and P2 are given by
- The parametric equation of the
- line passing through P1 and P2 is
- given by
- The desired point P (midpoint)
- is computed for c 1/2
l apl r TbRTpr
79Computing a0 and b0
- Consider the vector w orthogonal to both l and r
is given by - The line s going through P1 with
- direction w is given by
s P1cw or
80Computing a0 and b0 (contd)
- The lines s and r intersect at P2
- We can obtain a0 and b0 by
- solving the following system
- of equations
P1c0w P2 or
(assume s passes through P2 for c c0)
81(2) Reconstruction up to a scale factor
- Only the intrinsic camera parameters are known.
- We cannot recover the true scale of the viewed
scene since we do not know the baseline T - (recall that Z fT/d).
- Reconstruction is unique only up to an unknown
scaling factor. - This factor can be determined if we know the
distance between two points in the scene.
82(2) Reconstruction up to a scale factor (contd)
- Estimate E
- Estimate E using the 8-point algorithm.
- The solution is unique up to an unknown scale
factor. - Recover T (from E)
83(2) Reconstruction up to a scale factor (contd)
- To simplify the recovery of T, consider
where - Recover R (from E)
- It can be shown that
84(2) Reconstruction up to a scale factor (contd)
- The equations for recovering are
quadratic - The sign of is not fixed (from
SVD)
- The ambiguity will be resolved during 3D
reconstruction - - Only one combination of gives
consistent reconstructions (positive Zl,Zr)
85(2) Reconstruction up to a scale factor (contd)
86(2) Reconstruction up to a scale factor (contd)
87(2) Reconstruction up to a scale factor (contd)
Algorithm
Note the algorithm should not go through more
than 4 iterations
88(3) Reconstruction up to a projective
transformation
- More complicated - see book chapter