Stereo Vision

About This Presentation

Title:

Stereo Vision

Description:

University of Texas at Arlington. Image Projection Review ... Also called stereopsis. Key idea: Each point in an image corresponds to a line in the 3D world. ... – PowerPoint PPT presentation

Number of Views:110

Avg rating:3.0/5.0

Slides: 37

Provided by: vassilis

Category:

more less

Transcript and Presenter's Notes

Title: Stereo Vision

1

Lecture 25
Stereo Vision

CSE 4392/6367 Computer Vision Spring
2009 Vassilis Athitsos University of Texas at
Arlington
2
Image Projection Review

Let A , R , T ,
H .
P(A) R T A gives us the projection (in
world coordinates) of A on an image plane of what
focal length?

3
Image Projection Review

Let A , R , T ,
H .
P(A) R T A gives us the projection (in
world coordinates) of A on an image plane focal
length 1.
H(P(A)) gives us the pixel coordinates
corresponding to P(A). For simplicity, the focal
length is encoded in H.

4
Image-to-World Projection

Let A , R , T ,
H .
Given pixel location W (u, v), how can we get
the world coordinates of the corresponding
position on the image plane?

5
Image-to-World Projection

Let A ,R ,T ,
H .
Define G . G maps (x0, y0,
1)trans to (u, v, 1)trans.
(x0, y0) are the normalized image coordinates
corresponding to (u, v).

6
Image-to-World Projection

Let A , R , T ,
H .
Define G . G maps (x0, y0,
1)trans to (u, v, 1)trans.
(x0, y0) are the normalized image coordinates
corresponding to (u, v).
G-1 maps (u, v) to normalized image coordinates.

7
Image-to-World Projection

Define G . G maps (x0, y0,
1)trans to (u, v, 1)trans.
(x0, y0) are the normalized image coordinates
corresponding to (u, v).
G-1 maps (u, v) to normalized image coordinates
(x0, y0).
In camera coordinates, what is the z coordinate
of G-1(u, v)?

8
Image-to-World Projection

Define G . G maps (x0, y0,
1)trans to (u, v, 1)trans.
(x0, y0) are the normalized image coordinates
corresponding to (u, v).
G-1 maps (u, v) to normalized image coordinates
(x0, y0).
In camera coordinates, what is the z coordinate
of G-1(u, v)?
Remember, G-1 maps pixels into an image plane
corresponding to focal length ?

9
Image-to-World Projection

Define G . G maps (x0, y0,
1)trans to (u, v, 1)trans.
(x0, y0) are the normalized image coordinates
corresponding to (u, v).
G-1 maps (u, v) to normalized image coordinates
(x0, y0).
In camera coordinates, what is the z coordinate
of G-1(u, v)?
Remember, G-1 maps pixels into an image plane
corresponding to focal length 1.

10
Image-to-World Projection

Define G . G maps (x0, y0,
1)trans to (u, v, 1)trans.
(x0, y0) are the normalized image coordinates
corresponding to (u, v).
G-1 maps (u, v) to normalized image coordinates
(x0, y0).
In camera coordinates, what is the z coordinate
of G-1(u, v)? z -1.
Remember, G-1 maps pixels into an image plane
corresponding to focal length f 1.

11
Image-to-World Projection

Now we have mapped pixel (u, v) to image plane
position (x0, y0, -1).
Next step map image plane position to position
in the world.
First in camera coordinates.
What world position does image plane position
(x0, y0, -1) map to?

12
Image-to-World Projection

Now we have mapped pixel (u, v) to image plane
position (x0, y0, -1).
Next step map image plane position to position
in the world.
First in camera coordinates.
What world position does image plane position
(x0, y0, -1) map to?
(x0, y0, -1) maps to a line. In camera
coordinates, the line goes through the origin.
How can we write that line in camera coordinates?

13
Image-to-World Projection

(x0, y0, -1) maps to a line. In camera
coordinates, the line goes through the origin.
How can we write that line in camera coordinates?
Suppose that the line goes through point (x, y,
z). What equations does that point have to
satisfy?
x / x0 z / (-1) gt z x (-1)/x0.
y / y0 x / x0 gt y x y0/x0.
These equations define a line (y, z) f(x).
Borderline cases x0 0, y0 0.

14
Image-to-World Projection

(x0, y0, -1) maps to a line. In camera
coordinates, the line goes through the origin.
Suppose that the line goes through point (x, y,
z). What equations does that point have to
satisfy?
x / x0 z / (-1) gt z x (-1)/x0.
y / y0 x / x0 gt y x y0/x0.
These equations define a line (y, z) f(x).
Borderline cases x0 0, y0 0.
Given a point on this line, how do we map it to
world coordinates?

15
Image-to-World Projection

(x0, y0, -1) maps to a line. Suppose that the
line goes through point (x, y, z). What equations
does that point have to satisfy?
x / x0 z / (-1) gt z x (-1)/x0.
y / y0 x / x0 gt y x y0/x0.
These equations define a line (y, z) f(x).
Borderline cases x0 0, y0 0.
Given a point on this line, how do we map it to
world coordinates?
World-to-camera mapping of A is done by camera(A)
RTA.
Camera-to-world mapping is done by T-1 R-1
camera(A).

16
Stereo Vision

Also called stereopsis.
Key idea
Each point in an image corresponds to a line in
the 3D world.
To compute that line, we need to know the camera
matrix.
If the same point is visible from two images, the
two corresponding lines intersect in a single 3D
point.
Challenges
Identify correspondences between images from the
two cameras.
Compute the camera matrix.

17
A Simple Stereo Setup

Simple arrangement
Both cameras have same intrinsic parameters.
Image planes belong to the same world plane.
Then, correspondences appear on the same
horizontal line.
The displacement from one image to the other is
called disparity.
Disparity is proportional to depth.
External calibration parameters are not needed.

18
A Simple Stereo Setup

Assume that
Both cameras have pinholes at z0, y 0.
Both image planes correspond to f1, z -1.
Both cameras have the same intrinsic parameters
f, Sx, Sy, u0, v0.
Both camera coordinate systems have the same x,
y, z axes.
Cameras only differ at the x coordinate of the
pinhole.
Camera 1 is at (x1, 0, 0), camera 2 is at (x2, 0,
0).
Then
Suppose a point A is at (xA, yA, zA).
On camera 1, A maps to normalized image
coordinates

19
A Simple Stereo Setup

Assume that
Both cameras have pinholes at z0, y 0.
Both image planes correspond to f1, z -1.
Both cameras have the same intrinsic parameters
f, Sx, Sy, u0, v0.
Both camera coordinate systems have the same x,
y, z axes.
Cameras only differ at the x coordinate of the
pinhole.
Camera 1 is at (x1, 0, 0), camera 2 is at (x2, 0,
0).
Then
Suppose a point A is at (xA, yA, zA).
On camera 1, A maps to normalized image
coordinates
(x1A, y1A) ((xA x1) / zA, yA / zA)
On camera 2, A maps to normalized image
coordinates

20
A Simple Stereo Setup

Assume that
Both cameras have pinholes at z0, y 0.
Both image planes correspond to f1.
Both cameras have the same intrinsic parameters
f, Sx, Sy, u0, v0.
Both camera coordinate systems have the same x,
y, z axes.
Cameras only differ at the x coordinate of the
pinhole.
Camera 1 is at (x1, 0, 0), camera 2 is at (x2, 0,
0).
Then
Suppose a point A is at (xA, yA, zA).
On camera 1, A maps to normalized image
coordinates
(x1A, y1A) ((xA x1) / zA, yA / zA)
On camera 2, A maps to normalized image
coordinates
(x2A, y2A) ((xA x2) / zA, yA / zA)
(x1A x2A) ((xA x1) (xA x2)) / zA (x2
x1) / zA c / zA.
(x1A x2A) is called disparity. Disparity is
inversely proportional to zA.

21
A Simple Stereo Setup

Suppose a point A is at (xA, yA, zA).
On camera 1, A maps to normalized image
coordinates
(x1A, y1A) ((xA x1) / zA, yA / zA)
On camera 2, A maps to normalized image
coordinates
(x2A, y2A) ((xA x2) / zA, yA / zA)
(x1A x2A) ((xA x1) (xA x2)) / zA (x2
x1) / zA c / zA.
(x1A x2A) is called disparity. Disparity is
inversely proportional to zA.
If we know (x1A, y1A) and (x2A, y2A) (i.e., we
know the locations of A in each image), what else
do we need to know in order to figure out zA?

22
A Simple Stereo Setup

Suppose a point A is at (xA, yA, zA).
On camera 1, A maps to normalized image
coordinates
(x1A, y1A) ((xA x1) / zA, yA / zA)
On camera 2, A maps to normalized image
coordinates
(x2A, y2A) ((xA x2) / zA, yA / zA)
(x1A x2A) ((xA x1) (xA x2)) / zA (x2
x1) / zA c / zA.
(x1A x2A) is called disparity. Disparity is
inversely proportional to zA.
If we know (x1A, y1A) and (x2A, y2A) (i.e., we
know the locations of A in each image), what else
do we need to know in order to figure out zA?
We need to know c (x2 x1).

23
A More General Case

Suppose that we start with the simple system
Both cameras have pinholes at z0, y 0.
Both image planes correspond to f1, z-1.
Both cameras have the same intrinsic parameters
f, Sx, Sy, u0, v0.
Both camera coordinate systems have the same x,
y, z axes.
Cameras only differ at the x coordinate of the
pinhole.
Camera 1 is at (x1, 0, 0), camera 2 is at (x2, 0,
0).
Then we rotate by R and translate by T the whole
system.
To find point A, we just need to
go back to simple coordinates, by translating
back and rotating back. This is done via matrix ?

24
A More General Case

Suppose that we start with the simple system
Both cameras have pinholes at z0, y 0.
Both image planes correspond to f1, z-1.
Both cameras have the same intrinsic parameters
f, Sx, Sy, u0, v0.
Both camera coordinate systems have the same x,
y, z axes.
Cameras only differ at the x coordinate of the
pinhole.
Camera 1 is at (x1, 0, 0), camera 2 is at (x2, 0,
0).
Then we rotate by R and translate by T the whole
system.
To find point A, we just need to
go back to simple coordinates, by translating
back and rotating back. This is done via matrix
R-1 T-1.
Find simple(A) in the simplified coordinate
system.
Map A to original world coordinates. A ?

25
A More General Case

Suppose that we start with the simple system
Both cameras have pinholes at z0, y 0.
Both image planes correspond to f1, z-1.
Both cameras have the same intrinsic parameters
f, Sx, Sy, u0, v0.
Both camera coordinate systems have the same x,
y, z axes.
Cameras only differ at the x coordinate of the
pinhole.
Camera 1 is at (x1, 0, 0), camera 2 is at (x2, 0,
0).
Then we rotate by R and translate by T the whole
system.
To find point A, we just need to
go back to simple coordinates, by translating
back and rotating back. This is done via matrix
R-1 T-1.
Find simple(A) in the simplified coordinate
system.
simple(A) is just shorthand for the position of A
in the simplified system.
Map A to original world coordinates. A T R
simple(A).

26
The General Case

Given two calibrated cameras, and a corresponding
pair of locations, we compute two lines.
In the mathematically ideal case, the lines
intersect.
By finding the intersection, we compute where the
3D location is.

27
The General Case

Given two calibrated cameras, and a corresponding
pair of locations, we compute two lines.
In the mathematically ideal case, the lines
intersect.
In practice, they dont intersect because of
rounding/measurement errors (pixels are
discretized).
Best estimate for the 3D point is obtained by
Finding the shortest line segment that connects
the two lines.
Returning the midpoint of that segment.

28
Finding Connecting Segment

((P1 a1u1) (Q1 a2u2)) u1 0
((P1 a1u1) (Q1 a2u2)) u2 0
here stands for dot product.
P1 point on first line.
Q1 point on second line.
u1 unit vector parallel to first line.
u2 unit vector parallel to second line.
P1 a1u1 intersection of segment with first
line.
Q1 a2u2 intersection of segment with second
line.
Only unknowns are a1 and a2.
We have two equations, two unknowns, can solve.

29
Essential Matrix

We define a stereo pair given two cameras (in an
arbitrary configuration).
The essential matrix E of this stereo pair is a
matrix that has the following property
If W and W are homogeneous normalized image
coordinates in image 1 and image 2, and these
locations correspond to the same 3D point, then
(W)transpose E W 0.

30
Estimating the Essential Matrix

The essential matrix E of this stereo pair is a
matrix that has the following property
If W and W are homogeneous normalized image
coordinates in image 1 and image 2, and these
locations correspond to the same 3D point, then
(W)transpose E W 0.
E has size 3x3. To estimate E, we need to
estimate 9 unknowns.
Observations
A trivial and not useful exact solution is E 0.
If E is a solution, then cE is also a solution,
for any real number c. So, strictly speaking we
can only solve up to scale, and we only need to
estimate 8 unknowns.
To avoid the E0 solution, we impose an
additional constraint
sum(sum(E.E)) 1.

31
Using a Single Correspondence

Suppose (u1, v1, w1) in image plane 1 matches
(u2, v2, w2) in image plane 2.
Remember, (u1, v1, w1) and (u2, v2, w2) are given
in homogeneous normalized image coordinates.
We know that (u1, v1, w1) E (u2, v2,
w2)transpose 0.
Let E .
We obtain u1, v1, w1
u2, v2, w2 0 gt
u1e11v1e21w1e31, u1e12v1e22w1e32,
u1e13v1e23w1e33 u2, v2, w2 0 gt
u1u2e11v1u2e21w1u2e31u1v2e12v1v2e22w1v2e32u
1w2e13v1w2e23w1w2e33 0 gt
u1u2,v1u2,w1u2,u1v2,v1v2,w1v2,u1w2,v1w2,w1w2
e11,e21,e31,e12,e22,e32,e13,e23,e33 0

32
Using Multiple Correspondences

From previous slide if (u1, v1, w1) in image
plane 1 matches (u2, v2, w2) in image plane 2
u1u2,v1u2,w1u2,u1v2,v1v2,w1v2,u1w2,v1w2,w1w2
e11,e21,e31,e12,e22,e32,e13,e23,e33 0
If we have J correspondences
(u1,j, v1,j, w1,j) in image plane 1 matches
(u2,j, v2,j, w2,j) in image plane 2
Define
u1,1u2,1, v1,1u2,1, w1,1u2,1,
u1,1v2,1, v1,1v2,1, w1,1v2,1, u1,1w2,1,
v1,1w2,1, w1,1w2,1
u1,2u2,2, v1,2u2,2, w1,2u2,2,
u1,2v2,2, v1,2v2,2, w1,2v2,2, u1,2w2,2,
v1,2w2,2, w1,2w2,2
A u1,3u2,3, v1,3u2,3, w1,3u2,3, u1,3v2,3,
v1,3v2,3, w1,3v2,3, u1,3w2,3, v1,3w2,3,
w1,3w2,3
u1,Ju2,J, v1,Ju2,J, w1,Ju2,J,
u1,Jv2,J, v1,Jv2,J, w1,Jv2,J, u1,Jw2,J,
v1,Jw2,J, w1,Jw2,J

33
Using Multiple Correspondences

Using A from the previous slide, the following
holds
A Jx9 matrix.
Matrix of unknowns eij size 9x1.
Result a zero matrix of size Jx1.

This is a system of linear homogeneous equations,
that can be solved using SVD.
In Matlab
u, d, v svd(A, 0)
x v(, end)
After the above two lines, x is the 9x1 matrix of
unknowns.
This way, using multiple correspondences, we have
computed the essential matrix.
Strictly speaking, we have computed one out of
many essential matrices.
Solution up to scale.

34
Epipoles Epipolar Lines

In each image of a stereo pair, the epipole is
the pixel location where the pinhole of the other
camera is mapped.
Given a pixel in an image, where can the
corresponding pixel be in the other image?
The essential matrix defines a line.
All such lines are called epipolar lines, because
they always go through the epipole.
Why?

35
Epipoles Epipolar Lines

In each image of a stereo pair, the epipole is
the pixel location where the pinhole of the other
camera is mapped.
Given a pixel in an image, where can the
corresponding pixel be in the other image?
The essential matrix defines a line.
All such lines are called epipolar lines, because
they always go through the epipole.
Why?
Because for any pixel in image 1, the pinhole of
camera 1 is a possible 3D location.

36
Epipoles Epipolar Lines

In each image of a stereo pair, the epipole is
the pixel location where the pinhole of the other
camera is mapped.
Given a pixel in an image, where can the
corresponding pixel be in the other image?
The essential matrix defines a line.
All such lines are called epipolar lines, because
they always go through the epipole.
Given a pixel in one image, the epipolar line in
the other image can be computed using the
essential matrix.