Title: SIFT
1SIFT
- Guest Lecture by Jiwon Kim
- http//www.cs.washington.edu/homes/jwkim/
2SIFT Features andIts Applications
3Autostitch Demo
4Autostitch
- Fully automatic panorama generation
- Input set of images
- Output panorama(s)
- Uses SIFT (Scale-Invariant Feature Transform) to
find/align images
51. Solve for homography
61. Solve for homography
71. Solve for homography
82. Find connected sets of images
92. Find connected sets of images
102. Find connected sets of images
113. Solve for camera parameters
- New images initialised with rotation, focal
length of best matching image
123. Solve for camera parameters
- New images initialised with rotation, focal
length of best matching image
134. Blending the panorama
- Burt Adelson 1983
- Blend frequency bands over range ? l
142-band Blending
Low frequency (l gt 2 pixels)
High frequency (l lt 2 pixels)
15Linear Blending
162-band Blending
17So, what is SIFT?
- Scale-Invariant Feature Transform
- David Lowe at UBC
- Scale/rotation invariant
- Currently best known feature descriptor
- Many real-world applications
- Object recognition
- Panorama stitching
- Robot localization
- Video indexing
18Example object recognition
19SIFT properties
- Locality features are local, so robust to
occlusion and clutter - Distinctiveness individual features can be
matched to a large database of objects - Quantity many features can be generated for even
small objects - Efficiency close to real-time performance
20SIFT algorithm overview
- Feature detection
- Detect points that can be repeatably selected
under location/scale change - Feature description
- Assign orientation to detected feature points
- Construct a descriptor for image patch around
each feature point - Feature matching
211. Feature detection
- Detect points stable under location/scale change
- Build continuous space (x, y, scale)
- Approximated by multi-scale Difference-of-Gaussian
pyramid - Select maxima/minima in (x, y, scale)
221. Feature detection
231. Feature detection
- Localize extrema by fitting a quadratic
- Sub-pixel/sub-scale interpolation using Taylor
expansion - Take derivative and set to zero
241. Feature detection
- Discard low-contrast/edge points
- Low contrast discard keypoints with lt
threshold - Edge points high contrast in one direction, low
in the other ? compute principal curvatures from
eigenvalues of 2x2 Hessian matrix, and limit ratio
251. Feature detection
- (a) 233x189 image
- (b) 832 DOG extrema
- (c) 729 left after peak
- value threshold
- (d) 536 left after testing
- ratio of principle
- curvatures
262. Feature description
- Assign orientation to keypoints
- Create histogram of local gradient directions
computed at selected scale - Assign canonical orientation at peak of smoothed
histogram
272. Feature description
- Construct SIFT descriptor
- Create array of orientation histograms
- 8 orientations x 4x4 histogram array 128
dimensions
282. Feature description
- Advantage over simple correlation
- Gradients less sensitive to illumination change
- Gradients may shift robust to deformation,
viewpoint change
29Performance stability to noise
- Match features after random change in image scale
orientation, with differing levels of image
noise - Find nearest neighbor in database of 30,000
features
30Performancestability to affine change
- Match features after random change in image scale
orientation, with 2 image noise, and affine
distortion - Find nearest neighbor in database of 30,000
features
31Performance distinctiveness
- Vary size of database of features, with 30 degree
affine change, 2 image noise - Measure correct for single nearest neighbor
match
323. Feature matching
- For each feature in A, find nearest neighbor in B
A
B
333. Feature matching
- Nearest neighbor search too slow for large
database of 128-dimenional data - Approximate nearest neighbor search
- Best-bin-first Beis et al. 97 modification to
k-d tree algorithm - Use heap data structure to identify bins in order
by their distance from query point - Result Can give speedup by factor of 1000 while
finding nearest neighbor (of interest) 95 of the
time
343. Feature matching
- Reject false matches
- Compare distance of nearest neighbor to second
nearest neighbor - Common features arent distinctive, therefore bad
- Threshold of 0.8 provides excellent separation
353. Feature matching
- Now, given feature matches
- Find an object in the scene
- Solve for homography (panorama)
363. Feature matching
- Example 3D object recognition
373. Feature matching
- 3D object recognition
- Assume affine transform clusters of size gt3
- Looking for 3 matches out of 3000 that agree on
same object and pose too many outliers for
RANSAC or LMS - Use Hough Transform
- Each match votes for a hypothesis for object
ID/pose - Voting for multiple bins large bin size allow
for error due to similarity approximation
383. Feature matching
- 3D object recognition solve for pose
- Affine transform of x,y to u,v
- Rewrite to solve for transform parameters
393. Feature matching
- 3D object recognition verify model
- Discard outliers for pose solution in prev step
- Perform top-down check for additional features
- Evaluate probability that match is correct
- Use Bayesian model, with probability that
features would arise by chance if object was not
present - Takes account of object size in image, textured
regions, model feature count in database,
accuracy of fit Lowe 01
40Planar recognition
41Planar recognition
- Reliably recognized at a rotation of 60 away
from the camera - Affine fit approximates perspective projection
- Only 3 points are needed for recognition
423D object recognition
433D object recognition
- Only 3 keys are needed for recognition, so extra
keys provide robustness - Affine model is no longer as accurate
44Recognition under occlusion
45Illumination invariance
46Applications of SIFT
- Object recognition
- Panoramic image stitching
- Robot localization
- Video indexing
-
- The Office of the Past
- Document tracking and recognition
47Location recognition
48Robot Localization
49Map continuously built over time
50Locations of map features in 3D
51- Sony Aibo
- SIFT usage
- Recognize
- charging
- station
- Communicate
- with visual
- cards
- Teach object
- recognition
52The Office of the Past
53Unify physical andelectronic desktops
Video camera
- Recognize video of paper on physical desktop
- Tracking
- Recognition
- Linking
Desktop
54Unify physical andelectronic desktops
Video camera
- Applications
- Find lost documents
- Browse remote desktop
- Find electronic version
- History-based queries
Desktop
55Example input video
56Demo Remote desktop
57System overview
Video camera
Computer
User
Desk
58System overview
Video of desk
59System overview
Images from PDF
Video of desk
60System overview
Images from PDF
Video of desk
Track recognize
61System overview
Internal representation
Images from PDF
Video of desk
Track recognize
T
T1
62System overview
Internal representation
Images from PDF
Video of desk
Track recognize
T
T1
Scene Graph
63System overview
Where is my W-2?
Internal representation
Images from PDF
Video of desk
Track recognize
T
T1
64System overview
Where is my W-2?
Answer
Internal representation
Images from PDF
Video of desk
Track recognize
Desk
Desk
T
T1
65Assumptions
- Document
- Corresponding electronic copy exists
- No duplicates of same document
66Assumptions
- Document
- Corresponding electronic copy exists
- No duplicates of same document
- Motion
- 3 event types move/entry/exit
- One document at a time
- Only topmost document can move
67Non-assumptions
- Desk need not be initially empty
68Non-assumptions
- Desk need not be initially empty
- Stacks may overlap
69Algorithm overview
Input Frames
70Algorithm overview
Input Frames
Event Detection
before
after
71Algorithm overview
Input Frames
Event Detection
before
after
Event Interpretation
A document moved from (x1,y1) to (x2,y2)
72Algorithm overview
Input Frames
Event Detection
before
after
Event Interpretation
A document moved from (x1,y1) to (x2,y2)
File1.pdf
Document Recognition
File2.pdf
File3.pdf
73Algorithm overview
Input Frames
Event Detection
before
after
Event Interpretation
A document moved from (x1,y1) to (x2,y2)
File1.pdf
Document Recognition
File2.pdf
File3.pdf
Scene Graph Update
Desk
Desk
74Algorithm overview
Input Frames
Event Detection
before
after
Event Interpretation
A document moved from (x1,y1) to (x2,y2)
SIFT
File1.pdf
Document Recognition
File2.pdf
File3.pdf
Scene Graph Update
Desk
Desk
75Document tracking example
before
after
76Document tracking example
before
after
77Document tracking example
before
after
78Document tracking example
before
after
79Document tracking example
before
after
80Document tracking example
before
after
81Document tracking example
before
after
82Document tracking example
before
after
83Document tracking example
before
after
84Document tracking example
Motion (x,y,?)
before
after
85Document Recognition
- Match against PDF image database
File2.pdf
File3.pdf
File4.pdf
File5.pdf
File6.pdf
File1.pdf
86Document Recognition
- Performance analysis
- Tested 20 pages against database of 162 pages
87Document Recognition
- Performance analysis
- Tested 20 pages against database of 162 pages
- 200x300 pixels per document for reliable match
Recognition Rate
Document Resolution
88Document Recognition
- Performance analysis
- Tested 20 pages against database of 162 pages
- 200x300 pixels per document for reliable match
0.9
Recognition Rate
300
Document Resolution
89Results
- Input video
- 40 minutes
- 1024x768 _at_ 15 fps
- 22 documents, 49 events
- Running time
- Video processed offline
- No optimization
- A few hours for entire video
90Demo Paper tracking
91Photo sorting example
92Photo sorting example
93Demo Photo sorting
94Future work
- Enhance realism
- Handle more realistic desktops
- Real-time performance
- More applications
- Support other document tasks
- E.g., attach reminder, cluster documents
- Beyond documents
- Other 3D desktop objects, books/CDs
95Summary
- SIFT is
- Scale/rotation invariant local feature
- Highly distinctive
- Robust to occlusion, illumination change, 3D
viewpoint change - Efficient (real-time performance)
- Suitable for many useful applications
96References
- Distinctive image features from scale-invariant
keypoints - David G. Lowe, International Journal of Computer
Vision, 60, 2 (2004), pp. 91-110 - Recognising panoramas
- Matthew Brown and David G. Lowe, International
Conference on Computer Vision (ICCV 2003), Nice,
France (October 2003), pp. 1218-25. - Video-Based Document Tracking Unifying Your
Physical and Electronic Desktops - Jiwon Kim, Steven M. Seitz and Maneesh Agrawala,
ACM Symposium on User Interface Software and
Technology (UIST 2004), pp. 99-107.