Title: Automatic Matching of MultiView Images
1Automatic Matching of Multi-View Images
- Ed Bremer
- University of Rochester
- 1 Mikolajczyk, K., Schmid, C., 2004, A
performance evaluation of local descriptors,
Submitted to PAMI, October 2004, - http//lear.inrialpes.fr/pubs/2004/MS04a
- 2 Mikolajczyk, K., Tuytelaars, T., Schmid, C.,
Zisserman, A., Matas, J., Schaffalitzky, F.,
Kadir, T., Van Gool, L., 2004, A comparison of
affine region detectors, Submitted to
International Journal of Computer Vision, August
2004, http//lear.inrialpes.fr/pubs/2004/MTSZMSKG0
4 - 3 Lowe, D., 2004. Distinctive Image Features
from Scale-Invariant Keypoints, International
Journal of Computer Vision, 60, 2 (2004), pp.
91-118. - 4 Matas, J., Chum, O., Urban, M., Pajdla,T.
2002. Robust Wide Baseline Stereo From Maximally
Stable Extremal Regions, Proc British Machine
Vision Conference BMVC2002, pages 384 393. - 5 Zisserman, A., Schaffalitzky, F., 2002,
Multi-view matching for unordered image sets, or
How do I organize my holiday snaps?,
Proceedings of the 7th European Conference on
Computer Vision, Copenhagen, Denmark, pages
414-431, vol 1. - 6 Baumberg, A., 2000, Reliable Feature Matching
Across Widely Separated Views, In Proc. CVPR
,pages 774-781. - 7 Mikolajczyk, K, Schmid, C., 2001, Indexing
based on scale invariant interest points, In
Proc. 8th ICCV, pages 525-531.
- Motivation
- Applications
- Process Components
- Region Detectors
- Descriptors
- Matching Criteria
- Performance Evaluation
- Conclusion Next Steps
- Multi-view/Multi-image Matching
- Multiple images of scene taken by single or
multiple cameras with different rotation, scale,
viewpoint and illumination
3D scene
- Applications
- detecting matching regions is used in all the
following - Image registration
- Super-resolution
- Stereo vision
- Object detection and recognition
- Object and motion tracking
- Indexing and retrieval of objects
- 3D scene reconstruction
- Scene recognition
6Examples of Multi-view Images 2
2 Mikolajczyk, K., Tuytelaars, T., Schmid, C.,
Zisserman, A., Matas, J., Schaffalitzky, F.,
Kadir, T., Van Gool, L., 2004, A comparison of
affine region detectors, Submitted to
International Journal of Computer Vision, August
2004, http//lear.inrialpes.fr/pubs/2004/MTSZMSKG0
7Process Components
- Covariant region detection
- Detect image regions covariant to class of
transformation between reference image and
transformed image - Invariant descriptor
- Compute invariant descriptors from covariant
regions - Descriptor matching
- Compute distance between descriptors in reference
image and transformed image - 1 Mikolajczyk, K., Schmid, C., 2004, A
performance evaluation of local descriptors,
Submitted to PAMI, - http//lear.inrialpes.fr/pubs/2004/MS04a
8Region Detectors
- Support regions for computation of descriptors
- Determined independently in each image
- Scale invariant or Affine invariant
- Can be points (feature points) or regions
(covariant) - Provide dense (local) coverage robust to
occlusion - Need to be stable and repeatable
- Five region detectors -
- Harris points -gt invariant to rotation
- Harris-Laplacian -gt invariant to rotation and
scale - Hessian-Laplace -gtinvariant to rotation and scale
- Harris-Affine -gt invariant to affine image
transformations - Hessian-Affine -gt invariant to affine image
transformations - 1 Mikolajczyk, K., Schmid, C., 2004, A
performance evaluation of local descriptors,
Submitted to PAMI, - http//lear.inrialpes.fr/pubs/2004/MS04a
9Region Detectors
- Harris points -
- Maxima of Harris function used to locate interest
point - Support region fixed in size, 41x41 neighborhood
centered at interest point - Harris-Laplace regions -
- Scale adapted Harris function
- Interest point is local minima or maxima across
scale-space by Laplacian-of-Gaussian - 1 Mikolajczyk, K., Schmid, C., 2004, A
performance evaluation of local descriptors,
Submitted to PAMI, - http//lear.inrialpes.fr/pubs/2004/MS04a
10Region Detectors
- Harris-Laplace Performance -
- Approximately 10 better than Laplacian, Lowe or
gradient methods. - Harris standard detector is very poor under
scale changes - 7 Mikolajczyk, K., Schmid, C., 2001, Indexing
based on scale invariant interest points, In
Proc. 8th ICCV, Pages 525-531.
11Region Detectors
- Hessian-Laplace regions -
- Interest point is at local maxima of Hessian
determinant - Location in scale-space using maxima of
Laplacian-of-Gaussian (can also use
Difference-of-Gaussians) - 1 Mikolajczyk, K., Schmid, C., 2004, A
performance evaluation of local descriptors,
Submitted to PAMI, - http//lear.inrialpes.fr/pubs/2004/MS04a
12Region Detectors
- Harris-Affine regions -
- Find regions using Harris-Laplace detector
- Region based on 2nd moment affine adapted
- Hessian-Affine regions -
- Find regions using Hessian-Laplace detector
- Affine adapted region based on 2nd moment.
- 2 Mikolajczyk, K., Tuytelaars, T., Schmid, C.,
Zisserman, A., Matas, J., Schaffalitzky, F.,
Kadir, T., Van Gool, L., 2004, A comparison of
affine region detectors, Submitted to
International Journal of Computer Vision, August
2004, http//lear.inrialpes.fr/pubs/2004/MTSZMSKG0
13Region Detectors
- Regions produced by Harris-Affine and
Hessian-Affine detectors
14Region Detectors
- Affine normalization using 2nd moment matrix for
region L and R
15Region Detectors
- Region normalization
- Detectors produce circular or elliptical regions
- Size dependant on detection scale
- Map regions to circular region with constant
radius - Rotate regions in direction of dominant gradient
orientation - Illumination normalization
- Use affine transformation -gt aI(x) b
- Mean and standard deviation of pixel intensities
- 1 Mikolajczyk, K., Schmid, C., 2004, A
performance evaluation of local descriptors,
Submitted to PAMI, - http//lear.inrialpes.fr/pubs/2004/MS04a
- Descriptors -gt Feature vector
- Invariant to changes in scale, rotation, affine
translation and affine illumination - Need to be distinct, stable and repeatable
- Distribution (histogram) type or Covariance type
- Ten Descriptor types
- Scale-Invariant Feature Transform (SIFT)
- Gradient Location and Orientation histogram
(GLOH) - Shape Context
- Principal Component Analysis (PCA)-SIFT
- Steerable Filters
- Differential Invariants
- Complex Filters
- Moment Invariants
- Cross-Correlation
- Spin Image
- 1 Mikolajczyk, K., Schmid, C., 2004, A
performance evaluation of local descriptors,
Submitted to PAMI, - http//lear.inrialpes.fr/pubs/2004/MS04a
- SIFT and GLOH 3D Descriptors
- SIFT -gt 4 x 4 x 8 128 dimension descriptor
- GLOH -gt Log-polar (2 x 8) 1 x 16 272
dimension descriptor
18Matching Criteria
- Distance measure
- Find putative matches between images
- Mahalanobis distance used for covariant
descriptors - Euclidean distance used for distribution
(histogram) descriptors - Direct distance comparison not suitable for
indexing or database searching - Simple threshold
- Descriptors match if distance between is below
threshold t - Descriptor in reference image can have many
matches to descriptors in transformed image - Nearest Neighbor (NN)
- Find closest match between descriptors in
reference and transformed image - Descriptor in reference image can have only 1
match to descriptor in transformed image
19Performance Evaluation
- Criterion basis
- Recall rate correct matched/correspondences
- 1-precision false matches/correct matches
false matches - Ideal descriptor -gt recall rate 1, for all
precision given no overlap error
20SIFT - Scale Invariant Feature Transform
- Scale Invariant Feature Transform (SIFT) Lowe 3
- Features
- Invariant to image scale, rotation
- Invariant for small changes in illumination and
3D camera viewpoint - Extracts large number of highly distinctive
features - Enables detection of small objects
- Improved performance in cluttered scenes
- Algorithms are efficient complex operations
applied to local regions or features vs whole
image - Procedure
- Scale-space extrema detection
- Keypoint localization
- Orientation asignment
- Keypoint vector (descriptor)
21SIFT - Scale Invariant Feature Transform 3
- Scale-Space Blob Detector -
- Search for stable features over all scales and
image locations - Scale-space kernel -gt Gaussian function
- Difference of Gaussian
22SIFT - Scale Invariant Feature Transform 3
- Difference of Gaussian (DoG)
- simple subtraction of blurred L images
- Approximation to scale-normalized Laplacian of
Gaussian -
- Maxima or minima of scale-normalized Laplacian
produces the most stable image features compared
to gradient, Hessian, or Harris corner function
(Mikolajczyk 2002)
23SIFT - Scale Invariant Feature Transform 3
- Scale-Space Image Set -
- Divide each octave into s intervals
- Compute s 3 filtered (increasing blurry)
images, k 2(1/s) - s 3, k 1.26 -gt 6th gt 3.18s
- 5th gt 2.52s
- 4th gt 2.00s
- 3rd gt 1.59s
- 2nd gt 1.26s
- 1st gt 1.00s
- Subtract adjacent images to produce DoG images
- Repeat for next octave using 2nd image from top
and decimate by 2
24SIFT - Scale Invariant Feature Transform 3
- Scale-Space Pyramid -
- (from Lowe)
25SIFT - Scale Invariant Feature Transform 3
- Locating Scale-Space Extrema -
- Detection of local maxima or minima of D(x, y, s)
- Compare each sample point to 8 neighbors in same
scale image and 9 neighbors in scale image above
and below. - Mark if sample is greater than or less than all
of the neighbors - Compares s number of DoG images
26SIFT - Scale Invariant Feature Transform 3
- Improving Localization -
- Reject points that have low contrast using
- ltthreshold
- Where gt
- Gives offset extremum -gt
- Hessian and derivative of D(x, y, s) uses
differences of neighboring sample points. x
(x, y , s)T is offset from sample point
27SIFT - Scale Invariant Feature Transform 3
- Edge Rejection -
- Eliminate poorly defined peaks (edges) using
Hessian matrix - Verify ratio of principal curves is less than
threshold rlt10 - Efficient to compute -gt less than 20 floating
point operations -
28SIFT - Scale Invariant Feature Transform 3
- Results from Lowe 3 832 keypoints reduced to
536 (233x189 image) -
29SIFT - Scale Invariant Feature Transform
- Results from Lowe 3 performance measures
30SIFT - Scale Invariant Feature Transform
- Results from Lowe 3 performance measures
31SIFT - Scale Invariant Feature Transform 3
- Orientation rotational invariance
- Use scale of point to select image L(x, y, s)
- Compute the gradient m(x, y) and orientation ?(x,
y) at each image sample using differences. -
- Orientation histogram of sample points entries
weighted by gradient magnitude and a Gaussian
window around the keypoint, bins cover 360 range - Peaks in histogram correspond to dominant
directions of local gradients
32SIFT - Scale Invariant Feature Transform 3
- Descriptor the feature vector
- 8x8 sub-region histograms allow shift in gradient
positions - 128 element feature vector -gt 4x4 array of 8
orientations - (2x2x8 from Lowe is shown below)
- Feature vectors matched by nearest neighbor
(Euclidean distance) -
33SIFT - Scale Invariant Feature Transform 3
- Results from Lowe 3
- Two training objects recognized in cluttered
image - Small squares show point matches
- Large rectangles shown border of training image
after affine transformation -
- Conclusions
- Harris-Laplacian region detector performs better
than Laplacian, DoG and gradient scale-space
operators - Scale-space detectors provide invariance to
rotation, scale and small changes to illumination
and viewpoint. - Affine adaptation provides invariance to affine
transformations - GLOH and SIFT descriptors provide the best
performance. - Dense, localized descriptors perform well under
occlusions - Nexts steps
- Coding and testing of region detectors,
descriptors and matching -