Nearest Neighbor in High Dimensions - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Nearest Neighbor in High Dimensions

Description:

Near(est) Neighbor in High Dimensions. Alexandr Andoni (s by Piotr Indyk) Nearest Neighbor ... coordinates in {1...M} into dM-dimensional Hamming space ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 11
Provided by: coursesC1
Category:

less

Transcript and Presenter's Notes

Title: Nearest Neighbor in High Dimensions


1
Near(est) Neighbor in High Dimensions
  • Alexandr Andoni
  • (slides by Piotr Indyk)

2
Nearest Neighbor
  • Given
  • A set P of points in Rd
  • Goal build data structure which, for any query
    q, returns a point p?P minimizing p-q

q
3
Solution for d2 (sketch)
  • Compute Voronoi diagram
  • Given q, perform point location
  • Performance
  • Space O(n)
  • Query time O(log n)
  • (see 6.838 for details)

4
NN in Rd
  • Exact algorithms use
  • Either nO(d) space,
  • Or O(dn) time
  • Approximate algorithms
  • Space/time exponential in d Arya-Mount-et al,
    Kleinberg97, Har-Peled02
  • Space/time polynomial in d Kushilevitz-Ostrovsky-
    Rabani98, Indyk-Motwani98, Indyk98,

5
Eigenfaces
0.01 - 0.2 0.03 ..

0.02 0.03 0.01 ..
?
6
(Approximate) Near Neighbor
  • Near neighbor
  • Given
  • A set P of points in Rd, rgt0
  • Goal build data structure which, for any query
    q, returns a point p?P, p-q r (if it
    exists)
  • c-Approximate Near Neighbor
  • Goal build data structure which, for any query
    q
  • If there is a point p?P, p-q r
  • it returns p?P, p-q cr

r
q
cr
7
Locality-Sensitive Hashing
Indyk-Motwani98
q
  • Idea construct hash functions g Rd ? U such
    that for any points p,q
  • If p-q r, then Prg(p)g(q) is high
  • If p-q gtcr, then Prg(p)g(q) is small
  • Then we can solve the problem by hashing

p
not-so-small
8
LSH
  • A family H of functions h Rd ? U is called
    (P1,P2,r,cr)-sensitive, if for any p,q
  • if p-q ltr then Pr h(p)h(q) gt P1
  • if p-q gtcr then Pr h(p)h(q) lt P2
  • We hash using g(p)h1(p).h2(p)hk(p)
  • Intuition amplify the probability gap
  • We can solve a c-approximate NN with
  • Number of hash functions n? , ? log1/P2(1/P1)
  • Query time O(d n? log n)
  • Space O(n?1 dn)

9
LSH for Hamming metric IM98
  • Functions h(p)pi, i.e., the i-th bit of p
  • We have
  • Pr h(p)h(q) 1-D(p,q)/d
  • This gives exponent ?1/c
  • Used for genetic sequence retrieval, motif
    finding, video sequence retrieval, compression,
    web clustering, pose estimation, etc.

10
Other norms
  • Can embed l1d with coordinates in 1M into
    dM-dimensional Hamming space
Write a Comment
User Comments (0)
About PowerShow.com