Title: Automatic Matching of MultiView Images

Automatic Matching of Multi-View Images
  • Ed Bremer
  • University of Rochester

  • Motivation
  • Applications
  • Process Components
  • Region Detectors
  • Descriptors
  • Matching Criteria
  • Performance Evaluation
  • Conclusion Next Steps

  • Multi-view/Multi-image Matching
  • Multiple images of scene taken by single or
    multiple cameras with different rotation, scale,
    viewpoint and illumination

3D scene
  • Applications
  • detecting matching regions is used in all the
  • Image registration
  • Super-resolution
  • Stereo vision
  • Object detection and recognition
  • Object and motion tracking
  • Indexing and retrieval of objects
  • 3D scene reconstruction
  • Scene recognition

Examples of Multi-view Images 2
2 Mikolajczyk, K., Tuytelaars, T., Schmid, C.,
Zisserman, A., Matas, J., Schaffalitzky, F.,
Kadir, T., Van Gool, L., 2004, A comparison of
affine region detectors, Submitted to
International Journal of Computer Vision, August
2004, http//
Process Components
  • Covariant region detection
  • Detect image regions covariant to class of
    transformation between reference image and
    transformed image
  • Invariant descriptor
  • Compute invariant descriptors from covariant
  • Descriptor matching
  • Compute distance between descriptors in reference
    image and transformed image
  • 1 Mikolajczyk, K., Schmid, C., 2004, A
    performance evaluation of local descriptors,
    Submitted to PAMI,
  • http//

Region Detectors
  • Support regions for computation of descriptors
  • Determined independently in each image
  • Scale invariant or Affine invariant
  • Can be points (feature points) or regions
  • Provide dense (local) coverage robust to
  • Need to be stable and repeatable
  • Five region detectors -
  • Harris points -gt invariant to rotation
  • Harris-Laplacian -gt invariant to rotation and
  • Hessian-Laplace -gtinvariant to rotation and scale
  • Harris-Affine -gt invariant to affine image
  • Hessian-Affine -gt invariant to affine image
  • 1 Mikolajczyk, K., Schmid, C., 2004, A
    performance evaluation of local descriptors,
    Submitted to PAMI,
  • http//

Region Detectors
  • Harris points -
  • Maxima of Harris function used to locate interest
  • Support region fixed in size, 41x41 neighborhood
    centered at interest point
  • Harris-Laplace regions -
  • Scale adapted Harris function
  • Interest point is local minima or maxima across
    scale-space by Laplacian-of-Gaussian
  • 1 Mikolajczyk, K., Schmid, C., 2004, A
    performance evaluation of local descriptors,
    Submitted to PAMI,
  • http//

Region Detectors
  • Harris-Laplace Performance -
  • Approximately 10 better than Laplacian, Lowe or
    gradient methods.
  • Harris standard detector is very poor under
    scale changes
  • 7 Mikolajczyk, K., Schmid, C., 2001, Indexing
    based on scale invariant interest points, In
    Proc. 8th ICCV, Pages 525-531.

Region Detectors
  • Hessian-Laplace regions -
  • Interest point is at local maxima of Hessian
  • Location in scale-space using maxima of
    Laplacian-of-Gaussian (can also use
  • 1 Mikolajczyk, K., Schmid, C., 2004, A
    performance evaluation of local descriptors,
    Submitted to PAMI,
  • http//

Region Detectors
  • Harris-Affine regions -
  • Find regions using Harris-Laplace detector
  • Region based on 2nd moment affine adapted
  • Hessian-Affine regions -
  • Find regions using Hessian-Laplace detector
  • Affine adapted region based on 2nd moment.
  • 2 Mikolajczyk, K., Tuytelaars, T., Schmid, C.,
    Zisserman, A., Matas, J., Schaffalitzky, F.,
    Kadir, T., Van Gool, L., 2004, A comparison of
    affine region detectors, Submitted to
    International Journal of Computer Vision, August
    2004, http//

Region Detectors
  • Regions produced by Harris-Affine and
    Hessian-Affine detectors

Region Detectors
  • Affine normalization using 2nd moment matrix for
    region L and R

Region Detectors
  • Region normalization
  • Detectors produce circular or elliptical regions
  • Size dependant on detection scale
  • Map regions to circular region with constant
  • Rotate regions in direction of dominant gradient
  • Illumination normalization
  • Use affine transformation -gt aI(x) b
  • Mean and standard deviation of pixel intensities
  • 1 Mikolajczyk, K., Schmid, C., 2004, A
    performance evaluation of local descriptors,
    Submitted to PAMI,
  • http//

  • Descriptors -gt Feature vector
  • Invariant to changes in scale, rotation, affine
    translation and affine illumination
  • Need to be distinct, stable and repeatable
  • Distribution (histogram) type or Covariance type
  • Ten Descriptor types
  • Scale-Invariant Feature Transform (SIFT)
  • Gradient Location and Orientation histogram
  • Shape Context
  • Principal Component Analysis (PCA)-SIFT
  • Steerable Filters
  • Differential Invariants
  • Complex Filters
  • Moment Invariants
  • Cross-Correlation
  • Spin Image
  • 1 Mikolajczyk, K., Schmid, C., 2004, A
    performance evaluation of local descriptors,
    Submitted to PAMI,
  • http//

  • SIFT and GLOH 3D Descriptors
  • SIFT -gt 4 x 4 x 8 128 dimension descriptor
  • GLOH -gt Log-polar (2 x 8) 1 x 16 272
    dimension descriptor

Matching Criteria
  • Distance measure
  • Find putative matches between images
  • Mahalanobis distance used for covariant
  • Euclidean distance used for distribution
    (histogram) descriptors
  • Direct distance comparison not suitable for
    indexing or database searching
  • Simple threshold
  • Descriptors match if distance between is below
    threshold t
  • Descriptor in reference image can have many
    matches to descriptors in transformed image
  • Nearest Neighbor (NN)
  • Find closest match between descriptors in
    reference and transformed image
  • Descriptor in reference image can have only 1
    match to descriptor in transformed image

Performance Evaluation
  • Criterion basis
  • Recall rate correct matched/correspondences
  • 1-precision false matches/correct matches
    false matches
  • Ideal descriptor -gt recall rate 1, for all
    precision given no overlap error

SIFT - Scale Invariant Feature Transform
  • Scale Invariant Feature Transform (SIFT) Lowe 3
  • Features
  • Invariant to image scale, rotation
  • Invariant for small changes in illumination and
    3D camera viewpoint
  • Extracts large number of highly distinctive
  • Enables detection of small objects
  • Improved performance in cluttered scenes
  • Algorithms are efficient complex operations
    applied to local regions or features vs whole
  • Procedure
  • Scale-space extrema detection
  • Keypoint localization
  • Orientation asignment
  • Keypoint vector (descriptor)

SIFT - Scale Invariant Feature Transform 3
  • Scale-Space Blob Detector -
  • Search for stable features over all scales and
    image locations
  • Scale-space kernel -gt Gaussian function
  • Difference of Gaussian

SIFT - Scale Invariant Feature Transform 3
  • Difference of Gaussian (DoG)
  • simple subtraction of blurred L images
  • Approximation to scale-normalized Laplacian of
  • Maxima or minima of scale-normalized Laplacian
    produces the most stable image features compared
    to gradient, Hessian, or Harris corner function
    (Mikolajczyk 2002)

SIFT - Scale Invariant Feature Transform 3
  • Scale-Space Image Set -
  • Divide each octave into s intervals
  • Compute s 3 filtered (increasing blurry)
    images, k 2(1/s)
  • s 3, k 1.26 -gt 6th gt 3.18s
  • 5th gt 2.52s
  • 4th gt 2.00s
  • 3rd gt 1.59s
  • 2nd gt 1.26s
  • 1st gt 1.00s
  • Subtract adjacent images to produce DoG images
  • Repeat for next octave using 2nd image from top
    and decimate by 2

SIFT - Scale Invariant Feature Transform 3
  • Scale-Space Pyramid -
  • (from Lowe)

SIFT - Scale Invariant Feature Transform 3
  • Locating Scale-Space Extrema -
  • Detection of local maxima or minima of D(x, y, s)
  • Compare each sample point to 8 neighbors in same
    scale image and 9 neighbors in scale image above
    and below.
  • Mark if sample is greater than or less than all
    of the neighbors
  • Compares s number of DoG images

SIFT - Scale Invariant Feature Transform 3
  • Improving Localization -
  • Reject points that have low contrast using
  • ltthreshold
  • Where gt
  • Gives offset extremum -gt
  • Hessian and derivative of D(x, y, s) uses
    differences of neighboring sample points. x
    (x, y , s)T is offset from sample point

SIFT - Scale Invariant Feature Transform 3
  • Edge Rejection -
  • Eliminate poorly defined peaks (edges) using
    Hessian matrix
  • Verify ratio of principal curves is less than
    threshold rlt10
  • Efficient to compute -gt less than 20 floating
    point operations

SIFT - Scale Invariant Feature Transform 3
  • Results from Lowe 3 832 keypoints reduced to
    536 (233x189 image)

SIFT - Scale Invariant Feature Transform
  • Results from Lowe 3 performance measures

SIFT - Scale Invariant Feature Transform
  • Results from Lowe 3 performance measures

SIFT - Scale Invariant Feature Transform 3
  • Orientation rotational invariance
  • Use scale of point to select image L(x, y, s)
  • Compute the gradient m(x, y) and orientation ?(x,
    y) at each image sample using differences.
  • Orientation histogram of sample points entries
    weighted by gradient magnitude and a Gaussian
    window around the keypoint, bins cover 360 range
  • Peaks in histogram correspond to dominant
    directions of local gradients

SIFT - Scale Invariant Feature Transform 3
  • Descriptor the feature vector
  • 8x8 sub-region histograms allow shift in gradient
  • 128 element feature vector -gt 4x4 array of 8
  • (2x2x8 from Lowe is shown below)
  • Feature vectors matched by nearest neighbor
    (Euclidean distance)

SIFT - Scale Invariant Feature Transform 3
  • Results from Lowe 3
  • Two training objects recognized in cluttered
  • Small squares show point matches
  • Large rectangles shown border of training image
    after affine transformation

  • Conclusions
  • Harris-Laplacian region detector performs better
    than Laplacian, DoG and gradient scale-space
  • Scale-space detectors provide invariance to
    rotation, scale and small changes to illumination
    and viewpoint.
  • Affine adaptation provides invariance to affine
  • GLOH and SIFT descriptors provide the best
  • Dense, localized descriptors perform well under
  • Nexts steps
  • Coding and testing of region detectors,
    descriptors and matching
