SIFT - PowerPoint PPT Presentation

About This Presentation

Title:

SIFT

Description:

... with differing levels of image noise Find nearest neighbor in database of 30,000 features Performance: ... – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 97

Provided by: jwk72

Learn more at: https://courses.cs.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: SIFT

1
SIFT

Guest Lecture by Jiwon Kim
http//www.cs.washington.edu/homes/jwkim/

2
SIFT Features andIts Applications
3
Autostitch Demo
4
Autostitch

Fully automatic panorama generation
Input set of images
Output panorama(s)
Uses SIFT (Scale-Invariant Feature Transform) to
find/align images

5
1. Solve for homography
6
1. Solve for homography
7
1. Solve for homography
8
2. Find connected sets of images
9
2. Find connected sets of images
10
2. Find connected sets of images
11
3. Solve for camera parameters

New images initialised with rotation, focal
length of best matching image

12
3. Solve for camera parameters

New images initialised with rotation, focal
length of best matching image

13
4. Blending the panorama

Burt Adelson 1983
Blend frequency bands over range ? l

14
2-band Blending
Low frequency (l gt 2 pixels)
High frequency (l lt 2 pixels)
15
Linear Blending
16
2-band Blending
17
So, what is SIFT?

Scale-Invariant Feature Transform
David Lowe at UBC
Scale/rotation invariant
Currently best known feature descriptor
Many real-world applications
Object recognition
Panorama stitching
Robot localization
Video indexing

18
Example object recognition
19
SIFT properties

Locality features are local, so robust to
occlusion and clutter
Distinctiveness individual features can be
matched to a large database of objects
Quantity many features can be generated for even
small objects
Efficiency close to real-time performance

20
SIFT algorithm overview

Feature detection
Detect points that can be repeatably selected
under location/scale change
Feature description
Assign orientation to detected feature points
Construct a descriptor for image patch around
each feature point
Feature matching

21
1. Feature detection

Detect points stable under location/scale change
Build continuous space (x, y, scale)
Approximated by multi-scale Difference-of-Gaussian
pyramid
Select maxima/minima in (x, y, scale)

22
1. Feature detection
23
1. Feature detection

Localize extrema by fitting a quadratic
Sub-pixel/sub-scale interpolation using Taylor
expansion
Take derivative and set to zero

24
1. Feature detection

Discard low-contrast/edge points
Low contrast discard keypoints with lt
threshold
Edge points high contrast in one direction, low
in the other ? compute principal curvatures from
eigenvalues of 2x2 Hessian matrix, and limit ratio

25
1. Feature detection

Example

(a) 233x189 image
(b) 832 DOG extrema
(c) 729 left after peak
value threshold
(d) 536 left after testing
ratio of principle
curvatures

26
2. Feature description

Assign orientation to keypoints

Create histogram of local gradient directions
computed at selected scale
Assign canonical orientation at peak of smoothed
histogram

27
2. Feature description

Construct SIFT descriptor
Create array of orientation histograms
8 orientations x 4x4 histogram array 128
dimensions

28
2. Feature description

Advantage over simple correlation
Gradients less sensitive to illumination change
Gradients may shift robust to deformation,
viewpoint change

29
Performance stability to noise

Match features after random change in image scale
orientation, with differing levels of image
noise
Find nearest neighbor in database of 30,000
features

30
Performancestability to affine change

Match features after random change in image scale
orientation, with 2 image noise, and affine
distortion
Find nearest neighbor in database of 30,000
features

31
Performance distinctiveness

Vary size of database of features, with 30 degree
affine change, 2 image noise
Measure correct for single nearest neighbor
match

32
3. Feature matching

For each feature in A, find nearest neighbor in B

A
B
33
3. Feature matching

Nearest neighbor search too slow for large
database of 128-dimenional data
Approximate nearest neighbor search
Best-bin-first Beis et al. 97 modification to
k-d tree algorithm
Use heap data structure to identify bins in order
by their distance from query point
Result Can give speedup by factor of 1000 while
finding nearest neighbor (of interest) 95 of the
time

34
3. Feature matching

Reject false matches
Compare distance of nearest neighbor to second
nearest neighbor
Common features arent distinctive, therefore bad
Threshold of 0.8 provides excellent separation

35
3. Feature matching

Now, given feature matches
Find an object in the scene
Solve for homography (panorama)

36
3. Feature matching

Example 3D object recognition

37
3. Feature matching

3D object recognition
Assume affine transform clusters of size gt3
Looking for 3 matches out of 3000 that agree on
same object and pose too many outliers for
RANSAC or LMS
Use Hough Transform
Each match votes for a hypothesis for object
ID/pose
Voting for multiple bins large bin size allow
for error due to similarity approximation

38
3. Feature matching

3D object recognition solve for pose
Affine transform of x,y to u,v
Rewrite to solve for transform parameters

39
3. Feature matching

3D object recognition verify model
Discard outliers for pose solution in prev step
Perform top-down check for additional features
Evaluate probability that match is correct
Use Bayesian model, with probability that
features would arise by chance if object was not
present
Takes account of object size in image, textured
regions, model feature count in database,
accuracy of fit Lowe 01

40
Planar recognition

Training images

41
Planar recognition

Reliably recognized at a rotation of 60 away
from the camera
Affine fit approximates perspective projection
Only 3 points are needed for recognition

42
3D object recognition

Training images

43
3D object recognition

Only 3 keys are needed for recognition, so extra
keys provide robustness
Affine model is no longer as accurate

44
Recognition under occlusion
45
Illumination invariance
46
Applications of SIFT

Object recognition
Panoramic image stitching
Robot localization
Video indexing
The Office of the Past
Document tracking and recognition

47
Location recognition
48
Robot Localization
49
Map continuously built over time
50
Locations of map features in 3D
51

Sony Aibo
SIFT usage
Recognize
charging
station
Communicate
with visual
cards
Teach object
recognition

52
The Office of the Past

Paper everywhere

53
Unify physical andelectronic desktops
Video camera

Recognize video of paper on physical desktop
Tracking
Recognition
Linking

Desktop
54
Unify physical andelectronic desktops
Video camera

Applications
Find lost documents
Browse remote desktop
Find electronic version
History-based queries

Desktop
55
Example input video
56
Demo Remote desktop
57
System overview
Video camera
Computer
User
Desk
58
System overview
Video of desk
59
System overview
Images from PDF
Video of desk
60
System overview
Images from PDF
Video of desk
Track recognize
61
System overview
Internal representation
Images from PDF
Video of desk
Track recognize
T
T1
62
System overview
Internal representation
Images from PDF
Video of desk
Track recognize
T
T1
Scene Graph
63
System overview
Where is my W-2?
Internal representation
Images from PDF
Video of desk
Track recognize
T
T1
64
System overview
Where is my W-2?
Answer
Internal representation
Images from PDF
Video of desk
Track recognize
Desk
Desk
T
T1
65
Assumptions

Document
Corresponding electronic copy exists
No duplicates of same document

66
Assumptions

Document
Corresponding electronic copy exists
No duplicates of same document
Motion
3 event types move/entry/exit
One document at a time
Only topmost document can move

67
Non-assumptions

Desk need not be initially empty

68
Non-assumptions

Desk need not be initially empty
Stacks may overlap

69
Algorithm overview
Input Frames

70
Algorithm overview
Input Frames

Event Detection
before
after
71
Algorithm overview
Input Frames

Event Detection
before
after
Event Interpretation
A document moved from (x1,y1) to (x2,y2)
72
Algorithm overview
Input Frames

Event Detection
before
after
Event Interpretation
A document moved from (x1,y1) to (x2,y2)
File1.pdf
Document Recognition
File2.pdf
File3.pdf
73
Algorithm overview
Input Frames

Event Detection
before
after
Event Interpretation
A document moved from (x1,y1) to (x2,y2)
File1.pdf
Document Recognition
File2.pdf
File3.pdf
Scene Graph Update
Desk
Desk
74
Algorithm overview
Input Frames

Event Detection
before
after
Event Interpretation
A document moved from (x1,y1) to (x2,y2)
SIFT
File1.pdf
Document Recognition
File2.pdf
File3.pdf
Scene Graph Update
Desk
Desk
75
Document tracking example
before
after
76
Document tracking example
before
after
77
Document tracking example
before
after
78
Document tracking example
before
after
79
Document tracking example
before
after
80
Document tracking example
before
after
81
Document tracking example
before
after
82
Document tracking example
before
after
83
Document tracking example
before
after
84
Document tracking example
Motion (x,y,?)
before
after
85
Document Recognition

Match against PDF image database

File2.pdf
File3.pdf
File4.pdf
File5.pdf
File6.pdf
File1.pdf
86
Document Recognition

Performance analysis
Tested 20 pages against database of 162 pages

87
Document Recognition

Performance analysis
Tested 20 pages against database of 162 pages
200x300 pixels per document for reliable match

Recognition Rate
Document Resolution
88
Document Recognition

Performance analysis
Tested 20 pages against database of 162 pages
200x300 pixels per document for reliable match

0.9
Recognition Rate
300
Document Resolution
89
Results

Input video
40 minutes
1024x768 _at_ 15 fps
22 documents, 49 events
Running time
Video processed offline
No optimization
A few hours for entire video

90
Demo Paper tracking
91
Photo sorting example
92
Photo sorting example
93
Demo Photo sorting
94
Future work

Enhance realism
Handle more realistic desktops
Real-time performance
More applications
Support other document tasks
E.g., attach reminder, cluster documents
Beyond documents
Other 3D desktop objects, books/CDs

95
Summary

SIFT is
Scale/rotation invariant local feature
Highly distinctive
Robust to occlusion, illumination change, 3D
viewpoint change
Efficient (real-time performance)
Suitable for many useful applications

96
References

Distinctive image features from scale-invariant
keypoints
David G. Lowe, International Journal of Computer
Vision, 60, 2 (2004), pp. 91-110
Recognising panoramas
Matthew Brown and David G. Lowe, International
Conference on Computer Vision (ICCV 2003), Nice,
France (October 2003), pp. 1218-25.
Video-Based Document Tracking Unifying Your
Physical and Electronic Desktops
Jiwon Kim, Steven M. Seitz and Maneesh Agrawala,
ACM Symposium on User Interface Software and
Technology (UIST 2004), pp. 99-107.