Title: SIFT (Lowe 99)
1SIFT (Lowe 99)Beyond Bags of Features
Spatial Pyramid Matching for Recognizing Natural
Scene Categories (Lazebnik et al
2006)(various slides stolen from the web)
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAA
2Scale-Invariant Feature Transform
- Generates image features, keypoints
- invariant to image scaling and rotation
- partially invariant to change in illumination and
3D camera viewpoint - many can be extracted from typical images
- highly distinctive
3Algorithm Stages
- Scale-space Extrema Detection
- Uses difference-of-Gaussian function
- Keypoint Localization
- Sub-pixel location and scale fit to a model
- Orientation assignment
- 1 or more for each keypoint
- Keypoint descriptor
- Created from local image gradients
4Scale Space
5Difference Of Gaussian Pyramid
Blur Resample
A
B
A
B
6Difference Of Gaussian Pyramid
A- B
7Extrema Detection
- Keypoint must be a minima or maxima of its 8
neighbors at its scale and the 9 neighbors above
and 9 below.
8Extrema Detection
9Keypoint Localization and Refinement
- Refine keypoint/extrema position fitting a 3D
quadratic model to get subpixel accuracy of x,y
position and scale. - Throw out points that have low contrast
- Remove points that are too edgy.
10Keypoint Localization and Refinement
11Keypoint Localization and Refinement
12Orientation Assignment
- Create histogram of local gradient directions
computed at selected scale - Assign canonical orientation at peak of smoothed
histogram - Each keypoint specifies stable 2D coordinates (x,
y, scale, orientation)
13Example from paper
14SIFT Descriptor
- Try to mimic complex cells in the visual cortex
- Selective to spatial frequency and orientation
but allows for shifts in position - Be robust to small affine transformations
- Local affine transformations affect positions
more than orientation and spatial frequency.
15SIFT Descriptor
- Thresholded image gradients are sampled over
16x16 array of locations at keypoint scale - Create array of orientation histograms rotated
relative to orientation of keypoint. - 8 orientations x 4x4 histogram array 128
dimensions - Distribute each sample to adjacent bins by
trilinear interpolation (avoids boundary effects)
163D object recognition example from paper
17SIFT Review
- Generates image features, keypoints
- invariant to image scaling and rotation
- partially invariant to change in illumination and
3D camera viewpoint - many can be extracted from typical images
- Each keypoint has an associated descriptor that
is - Relative to keypoint orientation and scale
- Is robust to small affine transformations.
18SIFT Review
- Note
- We can skip the keypoint detection.
- Pick a grid over the image and make descriptor
for each point. - Fei Fe and Perona (CVPR 2005) showed this works
better for scene classification.
19Beyond Bags of Features Spatial Pyramid Matching
for Recognizing Natural Scene Categories
(Lazebnik et. al 2006)Many slides borrowed
from http//www.ima.umn.edu/2005-2006/W5.22-26.06
/activities/Lazebnik-Svetlana/ima_poster.pdfand
http//people.csail.mit.edu/kgrauman/slides/pyr_m
atch_iccv2005.ppt
20Overview
- Adds approximate global geometric
correspondence to bag of features techniques
for scene recognition - Spatial pyramid matching partitions the image
into multiscale subregions and computes feature
histograms. - Use weak-features (orientated edges at multiple
scales) and strong-features (Vocabulary formed
by gridded SIFT descriptors)
21Motivation
- A pre-attentive approach Recognize scene as
whole without examining its constituent objects.
22Images as collections of features
- Image as unordered set of d-dimensional feature
vectors - Varying number of vectors per instance
23Classifiers (hand wavy)
- Training data multiple images for each class
- Image is represented by unordered set of features
- We need some way to compare feature set X to
feature set Y. - Some similarity function K(X,Y).
24Classifiers (hand wavy)
- Nearest neighbor Input X,
- find Y that maximizes K(X,Y) for all Y in the
training set. - Label X with the class label for Y.
- SVM use K(X,Y) as kernel function
- Inner product
- Mercer Kernel
25Partial matching
- Compare sets by computing a partial matching
between their features.
26Computing the partial matching
- Earth Movers Distance
- Rubner, Tomasi, Guibas 1998
- Hungarian method
- Kuhn, 1955
- Greedy matching
-
- Pyramid match
Grauman and Darrell, ICCV 2005
for sets with features of dimension
27Pyramid match overview
Pyramid match measures similarity of a partial
matching between two sets
- Place multi-dimensional, multi-resolution grid
over point sets - Consider points matched at finest resolution
where they fall into same grid cell - Approximate optimal similarity with worst case
similarity within pyramid cell
No explicit search for matches!
28Pyramid match overview
29Pyramid Match
- d dimensional feature vectors
- A sequence of grids at resolutions 0 L
- At level l
d2, L2
30Pyramid match Kernel
- Matches at level l include matches at level l 1
- New matches at level l (for l0L-1)
- Penalize easy matches at larger scales with
weight - Match kernel
31Vocabulary of M features
- Only features of the same type can be matched.
- Each channel m treated separately
32Vocabulary of M features
33Spatial pyramid representation
d2 (x,y)
M classes of features
34Feature Extraction
35Experimental Results
36Scene Category Dataset
37Scene Category Retrieval
38Scene Category Confusion
39Caltech 101
40Caltech 101 Comparision
41Caltech 101 Challenges
42Gratz