Title: 14 Dec 2005
1CENG 710Fundamentals of Autonomous Robotics
- Scale Invariant Feature Transform
- Maya Çakmak
2VISION
- the most powerful sense
- Area Computer/Robot Vision
- Sub-area Object Recognition
- Problem Obtain a representation that allows us
to find a particular object we've encountered
before
3Key properties of a good feature
- Highly distinctive
- Easy to extract
- Invariant, tolerant to changes
- Easy to match against a large database
4SIFT
- SIFT is an approach for detecting and
extracting local feature descriptors in which
image content is transformed into local feature
coordinates.
5The Paper
- Distinctive Image Features from Scale
Invariant Key-points - International Journal of Computer Vision, 2004
- International Conference of Computer Vision, 1999
- David Lowe
- CS Department, Univ. of British Columbia
6The method is
- Invariant to
- Image scaling
- Translation
- Rotation
- Partially invariant to
- Illumination changes
- View points
7Stages in SIFT
- Scale-space extrema detection
- Keypoint localization
- Orientation assignment
- Keypoint descriptor
8Scale Space Extrema
Stage 1
- Extrema of difference-of-Gaussian (DoG) of
image
- Gaussian-blurred image
- DoG for image
9Method to obtain DoG
Stage 1
10Key Point Localization
Stage 2
- Find local minimum and maximum of DoG
11Stage 2
- For each candidate
- Remove keypoints with low contrast
- (with value treshold)
- Remove responses along edges ( with
principle curvatures)
12Orientation Assignment
Stage 3
- For the selected keypoint, at the closest scale
- compute a gradient orientation histogram
- determine dominant orientation
-
13Scale Space Images
Example
5th
4th
3rd
1st image in 2nd octave
14DoG Images
Example
4th level 2nd octave
2nd level 2nd octave
3rd level 2nd octave
1st level 2nd octave
15Keypoint Images
Example
16Effect of eliminations
Stage 3
- 233x189 image
- (b) 832 DoG extrema
- (c) 729 left after elimination of low contrast
- (d) 536 left after eliminating edge responses
17Keypoint Descriptor
Stage 4
Keypoint is localized by (x, y, scale,
orientation) How to describe image content at
the keypoint?
SIFT descriptors are a set of orientation
histograms on 4x4 pixel neighborhoods of the
keypoint
18Matching SIFT features
- Feature vector dimension 4x4x8128
- Find nearest neighbor in a database of SIFT
features from training images. - For robustness, use ratio of nearest neighbor to
ratio of second nearest neighbor.
19Matching in different scales
20Matching in different scales
21Matching different view points
22Matching in different illumination
23Multiple object instances
24Closing Comments
- SIFT features are reasonably invariant to
rotation, scaling, and illumination changes - We can use them for matching and object
recognition among other things - Robust to occlusion, as long as we can see at
least 3 features from the object we can compute
the location and pose - Efficient on-line matching, recognition can be
performed in close-to-real time (at least for
small object databases)
25References
- 1 Lowe, David Object Recognition from Local
Scale-Invariant Features, ICCV, 1999 and IJCV,
2004 - 2 Lowe, David CVPR 2003 Tutorial
- 3 Matlab SIFT toolbox tutorial
- 4 Computer Vision Lecture Notes, by Pinar
Duygulu, Bilkent University, CS department.