Title: Correspondence and Pose Consistency
1Chapter 20
- Correspondence and Pose Consistency
- (Unknown author)
- Report by B.J. Guillot
- April 16, 2001
- Part I
- April 18, 2001
- Part II
2The Correspondence Problem
- Which image feature corresponds to which feature
on which object?
http//www.ai.mit.edu/projects/medical-vision/surg
ery/Images/girl_overlay_large.gif
3Modelbase
- A collection of geometric models of the objects
that that should be recognized
Pose
- Position and orientation for the object
Backprojection
- When a hypothetical pose is used to generate a
rendering of the object
4Extrinsic camera parameters
- Depends on the orientation of the camera
- 6 parameters (3 for rotation, 3 for translation)
Intrinsic camera parameters
- 5 parameters
- X-coord center of projection (in pixels), u0
- Y-coord center of projection (in pixels), v0
- Focal length (in pixels), f
- Aspect ratio, a
- Angle between optical axes, c
5Camera calibration (notes)
- Center of projection (COP) is usually at or near
the coordinate center of the image - Aspect ratio is generally close to 1.0
- Angle between optical axes is generally 90 degrees
6Base set
- Correspondence between a small number of object
features and a small number of image features - Once established, program can determine camera
constraints - Camera constraints can then be used to predict
other image features - Alignment algorithms generate the base set, and
are also known as pose consistency methods
7Frame group
- A group of features that can be used to yield a
camera hypothesis - There can be both object and image frame groups
- Most objects have many frame groups
- Popular frame groups include
- Three points
- Three directions (trihedral vertex) and a point
- Dihedral vertex (two directions emanating from a
shared origin) and a point - Directions obtained by using (portions of) line
segments - Clutter Image frame groups that come from noise
objects that are not of interest, and not in the
modelbase
8Frame group using directions
- Thresholded image using ARToolkit library
9- "One curious, and perhaps quite unimportant,
feature of the block had led to endless argument.
The monolith was 111/4 feet high, and 11/4 by 5
feet in cross-section. When its dimensions were
checked with great care, they were found to be in
the exact ratio 1 to 4 to 9 - the square of the
first three integers. --- 2001 A Space Odyssey
10Synthetic example
- Obj. frame group
- Image frame group
- P0(0,0,0) p0(320.0, 95.2)
- P1(4,0,0) p1(364.5, 118.1)
- P2(4,9,0) p2(364.5, 337.5)
- P3(0,9,0) p3(320.0, 355.9)
- P4(0,0,-1) p4(284.3, 97.6)
- P5(4,0,-1) p5(333.7,119.8)
- P6(4,9,-1) p6(333.7, 336.2)
- P7(0,9,-1) p7(284.3, 353.9)
11- Example image frame group using real data
- p0?(551,284) p4?(532,282)
- p1?(572,265) p5 not visible
- p2?(565,44) p6 not visible
- p3?(543,15) p7?(522,18)
12Camera models
- Calibrating perspective cameras is complicated
- Two simplifications (affine, projective cameras)
- Affine camera Model a perspective view as an
affine transformation followed by an orthographic
projection (Recall affine transformations can
effect rotation, scaling, shear, and translation) - Projective camera Model a perspective view as a
projective transformation followed by perspective
projection. (Recall projective transformations
map lines to lines but does not necessarily
preserve parallelism)
13Perspective transform is a subset of projective
transform
- A perspective transformation with center O,
mapping the plane P to the plane Q. The
transformation is not defined on the line L,
where P intersects the plane parallel to Q and
going through O.
http//www.geom.umn.edu/docs/reference/CRC-formula
s/node16.html
14Affine cameras
- A is a general affine transformation
- ? is an orthographic camera transformation
- Pi are the 3D model points (x,y,z)
- pi are the 2D image points (x,y)
- First two rows of matrix A can be determined
using 4 corresponding points (8 equations, 8
unknowns, linear)
15Affine cameras, continued
- Pro Mathematics are extremely simple (especially
if you choose your 3D model points to have lots
of zeros) - Pro Needs only 4 corresponding points
- Con Unacceptable results using even the
perfect synthetic data - Details Used i0, 1, 3, 4 on synthetic data
- i pi predicted pi actual
- 2 (365.5,378.8) (365.5, 337.5)
- 5 (328.8,120.5) (333.7,119.8)
- 7 (284.3,358.3) (284.3, 353.9)
16Projective cameras
- A is a general projective transformation
- ? is an perspective camera transformation
- Pi are the 3D model points (x,y,z)
- pi are the 2D image points (x,y)
- First three rows of matrix A can be determined
using 5 corresponding points (10 eqns, 10
unknowns, non-linear)
17Projective cameras, continuted
- Pro One would hope the added complexity yields
better results (B.J. did not verify) - Con Non-linear equations
- Con Needs 5 corresponding points rather than the
4 of the affine camera, or the 4 of the POSIT
algorithm (discussed in the first few weeks of
our class).
18Part 2
19Invariant
- Definition Constant, Unchanging. Unchanged by
specified mathematical or physical operations or
transformations. - Merriam-Webster dictionary http//www.m-w.com/hom
e.htm
20Affine invariants for coplanar points
- Pick three coplanar points (p0, p1, p2 and P0,
P1, P2) to specify a coordinate frame - Where pi are image points, Pi are model points
- ?i1, ?i2 describe the geometry of the object and
are independent of the view
21Affine invariant example
- Using our earlier synthetic example, we will
choose the coplanar points i0,3,4 to represent
a coordinate frame - Additionally, consider a 4th coplanar point (i7)
and a 5th point (i1). - P0(0,0,0) p0(320.0, 95.2)
- P3(0,9,0) p3(320.0, 355.9)
- P4(0,0,-1) p4(284.3, 97.6)
- P7(0,9,-1) p7(284.3, 353.9)
- p1(364.5, 118.1)
22Affine invariant example, continuted
- For p7, calculate ?1, ?2
- (Remember We dont know yet that p7 is p7)
Same technique for p1 yields ?1-10.2, ?21117.5
23Affine invariant example, continuted
- Check whether p7 or p1 might be the real p7
- Use the P0,P3,P4 model coordinate frame to
compute ?1, ?2 for P7. - 0 0?1(0-0) ?2(0-0)
- 9 0?1(9-0) ?2(0-0) ?11
- -10?1(0-0) ?2(-1-0) ?21
- It is clear the first choice ?1,0.98 better
matches the real P7 ?1,1, so we can say
p7(284.3, 97.6)
24Geometric Hashing
- Geometric hashing uses invariants to vote for
object hypothesis - As before, with 3 points used as a coordinate
frame, (?1, ?2) can be computed for every other
point on the model - A 2D accumulator array is set up that indexes
geometric space with ?1, ?2 coordinates
(simplified version of Hough transform) - Each element in the array corresponds to a
bucket in (?1,?2) invariant geometric space
25Geometric Hashing, continuted
- Diagram showing recognition step of algorithm
- Diagram shows trihedral vertex coordinate frames
rather than 3-point frames - Modified from http//mitpress.mit.edu/e-journals/V
idere/001/articles/Pennec/PennecVidereDemo/GeomHas
h.Recognition.html
26Geometric Hashing, continued
- Do not need to search over models at recognition
time (hash table can be preloaded i.e.,
interesting buckets can be indexed with an object
label) - Invariant bearing groups Groups of features that
carry information that is independent of object
pose and changes from object to object
27Geometric Hashing, continued
- Cons
- Difficult to choose the size of the buckets
- Hard to know what enough votes means
- Some danger that the table will get clogged