Title: Nearest Neighbor Search in High Dimensions
 1Nearest Neighbor Search in High Dimensions
- Seminar in Algorithms and Geometry 
 - Mica Arie-Nachimson and Daniel Glasner 
 - April 2009 
 
  2Talk Outline
- Nearest neighbor problem 
 - Motivation 
 - Classical nearest neighbor methods 
 - KD-trees 
 - Efficient search in high dimensions 
 - Bucketing method 
 - Locality Sensitive Hashing 
 - Conclusion
 
Main Results
Indyk and Motwani, 1998
Gionis, Indyk and Motwani, 1999 
 3Nearest Neighbor Problem
- Input A set P of points in Rd (or any metric 
space).  - Output Given a query point q, find the point p 
in P which is closest to q. 
q
p 
 4What is it good for?
- Many things! 
 - Examples 
 - Optical Character Recognition 
 - Spell Checking 
 - Computer Vision 
 - DNA sequencing 
 - Data compression
 
  5What is it good for?
- Many things! 
 - Examples 
 - Optical Character Recognition 
 - Spell Checking 
 - Computer Vision 
 - DNA sequencing 
 - Data compression
 
2
query
2
1
2
3
7
7
2
2
3
8
4
Feature space 
 6What is it good for?
- Many things! 
 - Examples 
 - Optical Character Recognition 
 - Spell Checking 
 - Computer Vision 
 - DNA sequencing 
 - Data compression
 
query
abaut
shout
bat
abate
scout
about
boat
able
Feature space 
 7What is it good for?
- Many things! 
 - Examples 
 - Optical Character Recognition 
 - Spell Checking 
 - Computer Vision 
 - DNA sequencing 
 - Data compression 
 
And many more 
 8Approximate Nearest Neighbor ?-NN 
 9Approximate Nearest Neighbor ?-NN
- Input A set P of points in Rd (or any metric 
space).  - Given a query point q, let 
 - p point in P closest to q 
 - r the distance p-q 
 - Output Some point p with distance at most 
r(1?) 
q
r
p 
 10Approximate Nearest Neighbor ?-NN
- Input A set P of points in Rd (or any metric 
space).  - Given a query point q, let 
 - p point in P closest to q 
 - r the distance p-q 
 - Output Some point p with distance at most 
r(1?) 
r(1?)
q
r
p
r(1?) 
 11Approximate vs. ExactNearest Neighbor
- Many applications give similar results with 
approximate NN  - Example from Computer Vision
 
  12Retiling
Slide from Lihi Zelnik-Manor 
 13Exact NNS 27 sec 
Approximate NNS 0.6 sec 
Slide from Lihi Zelnik-Manor 
 14Solution Method
- Input A set P of n points in Rd. 
 - Method Construct a data structure to answer 
nearest neighbor queries  - Complexity 
 - Preprocessing space and time to construct the 
data structure  - Query time to return answer
 
  15Solution Method
- Naïve approach 
 - Preprocessing O(nd) 
 - Query time O(nd) 
 - Reasonable requirements 
 - Preprocessing time and space poly(nd). 
 - Query time sublinear in n.
 
  16Talk Outline
- Nearest neighbor problem 
 - Motivation 
 - Classical nearest neighbor methods 
 - KD-trees 
 - Efficient search in high dimensions 
 - Bucketing method 
 - Locality Sensitive Hashing 
 - Conclusion
 
  17Classical nearest neighbor methods
- Tree structures 
 - kd-trees 
 - Vornoi Diagrams 
 - Preprocessing poly(n), exp(d) 
 - Query log(n), exp(d) 
 - Difficult problem in high dimensions 
 - The solutions still work, but are exp(d)
 
  18KD-tree
5
20
12
15
7
8
10
13
18
13,15,18
7,8,10,12
18
13,15
10,12
7,8
7, 8
10, 12
13, 15
18 
 19KD-tree
5
20
12
15
7
8
10
13
18
query
17
13,15,18
7,8,10,12
18
13,15
10,12
7,8
min dist  1
7, 8
10, 12
13, 15
18 
 20KD-tree
5
20
12
15
7
8
10
13
18
query
16
13,15,18
7,8,10,12
18
13,15
10,12
7,8
min dist  2
min dist  1
7, 8
10, 12
13, 15
18 
 21KD-tree
- dgt1 alternate between dimensions 
 - Example d2
 
(12,5) (6,8) (17,4) (23,2) (20,10) (9,9) (1,6)
x
(17,4) (23,2) (20,10)
(12,5) (6,8) (1,6) (9,9)
y
x 
 22KD-tree
- dgt1 alternate between dimensions 
 - Example d2
 
x
x
y
x 
 23KD-tree
- dgt1 alternate between dimensions 
 - Example d2 
 - NN search
 
Animated gif from http//en.wikipedia.org/wiki/Fil
eKDTree-animation.gif 
 24KD-tree complexity
- Preprocessing O(nd) 
 - Query 
 - O(logn) if points are randomly distributed 
 - w.c. O(kn1-1/k) almost linear when n close to k 
 - Need to search the whole tree
 
  25Talk Outline
- Nearest neighbor problem 
 - Motivation 
 - Classical nearest neighbor methods 
 - KD-trees 
 - Efficient search in high dimensions 
 - Bucketing method 
 - Locality Sensitive Hashing 
 - Conclusion
 
  26Sublinear solutions
Preprocessing Query time 
nO(1/? ) O(logn) Bucketing
O(n11/(1?)) n3/2 when ?1 O(n1/(1?)) sqrt(n) when ?1 LSH
2
Not counting logn factors
Linear in d
Solve ?-NN by reduction 
 27r-PLEBPoint Location in Equal Balls
- Given n balls of radius r, for every query q, 
find a ball that it resides in, if exists.  - If doesnt reside in any ball return NO.
 
Return p1
p1 
 28r-PLEBPoint Location in Equal Balls
- Given n balls of radius r, for every query q, 
find a ball that it resides in, if exists.  - If doesnt reside in any ball return NO.
 
Return NO 
 29Reduction from ?-NN to r-PLEB
- The two problems are connected 
 - r-PLEB is like a decision problem for ?-NN
 
  30Reduction from ?-NN to r-PLEB
- The two problems are connected 
 - r-PLEB is like a decision problem for ?-NN
 
  31Reduction from ?-NN to r-PLEB
- The two problems are connected 
 - r-PLEB is like a decision problem for ?-NN
 
  32Reduction from ?-NN to r-PLEBNaïve Approach
- Set Rproportion between largest dist and 
smallest dist of 2 points  - Define r(1?)0, (1?)1,,R 
 - For each ri construct ri-PLEB 
 - Given q, find the smallest r which gives a YES 
 - Use binary search to find r
 
  33Reduction from ?-NN to r-PLEBNaïve Approach
- Set Rproportion between largest dist and 
smallest dist of 2 points  - Define r(1?)0, (1?)1,,R 
 - For each ri construct ri-PLEB 
 - Given q, find the smallest ri which gives a YES 
 - Use binary search
 
  34Reduction from ?-NN to r-PLEBNaïve Approach
- Correctness 
 - Stopped at ri(1?)k 
 - ri1(1?)k1 
 
(1?)k  r  (1?)k1
r3-PLEB
r2-PLEB
r1-PLEB 
 35Reduction from ?-NN to r-PLEBNaïve Approach
- Reduction overhead 
 - Space O(log1?R) r-PLEB constructions 
 - Size of (1?)0, (1?)1,,R is log1?R 
 - Query O(loglog1?R) calls to r-PLEB 
 
Dependency on R 
 36Reduction from ?-NN to r-PLEBBetter Approach
- Set rmed as the radius which gives n/2 connected 
components (C.C) 
Har-Peled 2001 
 37Reduction from ?-NN to r-PLEBBetter Approach
- Set rmed as the radius which gives n/2 connected 
components (C.C) 
  38Reduction from ?-NN to r-PLEBBetter Approach
- Set rmed as the radius which gives n/2 connected 
components (C.C)  - Set rtop 4nrmedlogn/?
 
rtop
rmed 
 39Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set 
Rrtop/rmed and perform binary search on 
r(1?)0, (1?)1,,R  - R independent of input points 
 - If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far 
away  - Enough to choose one point from each C.C and 
continue recursively with these points 
(accumulating error  1?/3)  - If q2 B(pi,rmed) for some i then continue 
recursively on the C.C. 
rmed 
 40Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set 
Rrtop/rmed and perform binary search on 
r(1?)0, (1?)1,,R  - R independent of input points 
 - If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far 
away  - Enough to choose one point from each C.C and 
continue recursively with these points 
(accumulating error  1?/3)  - If q2 B(pi,rmed) for some i then continue 
recursively on the C.C. 
rtop 
 41Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set 
Rrtop/rmed and perform binary search on 
r(1?)0, (1?)1,,R  - R independent of input points 
 - If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far 
away  - Enough to choose one point from each C.C and 
continue recursively with these points 
(accumulating error  1?/3)  - If q2 B(pi,rmed) for some i then continue 
recursively on the C.C. 
rmed 
 42Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set 
Rrtop/rmed and perform binary search on 
r(1?)0, (1?)1,,R  - R independent of input points 
 - If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far 
away  - Enough to choose one point from each C.C and 
continue recursively with these points 
(accumulating error  1?/3)  - If q2 B(pi,rmed) for some i then continue 
recursively on the C.C. 
rtop 
 43Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set 
Rrtop/rmed and perform binary search on 
r(1?)0, (1?)1,,R  - R independent of input points 
 - If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far 
away  - Enough to choose one point from each C.C and 
continue recursively with these points 
(accumulating error  1?/3)  - If q2 B(pi,rmed) for some i then continue 
recursively on the C.C. 
rtop 
 44Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set 
Rrtop/rmed and perform binary search on 
r(1?)0, (1?)1,,R  - R independent of input points 
 - If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far 
away  - Enough to choose one point from each C.C and 
continue recursively with these points 
(accumulating error  1?/3)  - If q2 B(pi,rmed) for some i then continue 
recursively on the C.C. 
rmed 
 45Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set 
Rrtop/rmed and perform binary search on 
r(1?)0, (1?)1,,R  - R independent of input points 
 - If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far 
away  - Enough to choose one point from each C.C and 
continue recursively with these points 
(accumulating error  1?/3)  - If q2 B(pi,rmed) for some i then continue 
recursively on the C.C. 
rmed 
 46Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set 
Rrtop/rmed and perform binary search on 
r(1?)0, (1?)1,,R  - R independent of input points 
 - If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far 
away  - Enough to choose one point from each C.C and 
continue recursively with these points 
(accumulating error  1?/3)  - If q2 B(pi,rmed) for some i then continue 
recursively on the C.C. 
rmed 
 47Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set 
Rrtop/rmed and perform binary search on 
r(1?)0, (1?)1,,R  - R independent of input points 
 - If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far 
away  - Enough to choose one point from each C.C and 
continue recursively with these points 
(accumulating error  1?/3)  - If q2 B(pi,rmed) for some i then continue 
recursively on the C.C. 
rmed 
 48Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set 
Rrtop/rmed and perform binary search on 
r(1?)0, (1?)1,,R  - R independent of input points 
 - If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far 
away  - Enough to choose one point from each C.C and 
continue recursively with these points 
(accumulating error  1?/3)  - If q2 B(pi,rmed) for some i then continue 
recursively on the C.C. 
O(loglogR)O(log(n/?))
2  half of the points
Complexity overhead how many r-PLEB queries?
Total O(logn) 
 49(r,?)-PLEBPoint Location in Equal Balls
- Given n balls of radius r, for query q 
 - If q resides in a ball of radius r, return the 
ball.  - If q doesnt reside in any ball, return NO. 
 - If q resides only in the border of a ball, 
return either the ball or NO. 
p1
Return p1 
 50(r,?)-PLEBPoint Location in Equal Balls
- Given n balls of radius r, for query q 
 - If q resides in a ball of radius r, return the 
ball.  - If q doesnt reside in any ball, return NO. 
 - If q resides only in the border of a ball, 
return either the ball or NO. 
Return NO 
 51(r,?)-PLEBPoint Location in Equal Balls
- Given n balls of radius r, for query q 
 - If q resides in a ball of radius r, return the 
ball.  - If q doesnt reside in any ball, return NO. 
 - If q resides only in the border of a ball, 
return either the ball or NO. 
Return YES or NO 
 52Talk Outline
- Nearest neighbor problem 
 - Motivation 
 - Classical nearest neighbor methods 
 - KD-trees 
 - Efficient search in high dimensions 
 - Bucketing method 
 - Locality Sensitive Hashing 
 - Conclusion
 
  53Bucketing Method
- Apply a grid of size r?/sqrt(d) 
 - Every ball is covered by at most k cubes 
 - Can show that k Cd/?d for some Clt5 constant 
 - kn cubes cover all balls 
 - Finite number of cubes can use hash table 
 - Key cube, Value a ball it covers 
 - Space req O(nk)
 
r-PLEB
Indyk and Motwani, 1998 
 54Bucketing Method
- Apply a grid of size r?/sqrt(d) 
 - Every ball is covered by at most k cubes 
 - Can show that k Cd/?d for some Clt5 constant 
 - kn cubes cover all balls 
 - Finite number of cubes can use hash table 
 - Key cube, Value a ball it covers 
 - Space req O(nk)
 
r-PLEB 
 55Bucketing Method
- Apply a grid of size r?/sqrt(d) 
 - Every ball is covered by at most k cubes 
 - Can show that k Cd/?d for some Clt5 constant 
 - kn cubes cover all balls 
 - Finite number of cubes can use hash table 
 - Key cube, Value a ball it covers 
 - Space req O(nk)
 
r-PLEB 
 56Bucketing Method
- Apply a grid of size r?/sqrt(d) 
 - Every ball is covered by at most k cubes 
 - Can show that k Cd/?d for some Clt5 constant 
 - kn cubes cover all balls 
 - Finite number of cubes can use hash table 
 - Key cube, Value a ball it covers 
 - Space req O(nk)
 
r-PLEB 
 57Bucketing Method
- Given query q 
 - Compute the cube it resides in O(d) 
 - Find the ball this cube intersects O(1) 
 - This point is an (r,?)-PLEB of q
 
r-PLEB 
 58Bucketing Method
- Given query q 
 - Compute the cube it resides in O(d) 
 - Find the ball this cube intersects O(1) 
 - This point is an (r,?)-PLEB of q
 
r?/sqrt(d)
r-PLEB
?
r?/sqrt(d) 
 59Bucketing Method
- Given query q 
 - Compute the cube it resides in O(d) 
 - Find the ball this cube intersects O(1) 
 - This point is an (r,?)-PLEB of q
 
NO
YES or NO
r-PLEB
?
YES 
 60Bucketing MethodComplexity
- Space required O(nk)O(n(1/?d)) 
 - Query time O(d) 
 - If dO(logn) or nO(2d) 
 - Space req O(nlog(1/?)) 
 - Else use dimensionality reduction in l2 from d to 
?-2log(n) Johnson-Lindenstrauss lemma  - Space nO(1/? ) 
 
2 
 61Break 
 62Talk Outline
- Nearest neighbor problem 
 - Motivation 
 - Classical nearest neighbor methods 
 - KD-trees 
 - Efficient search in high dimensions 
 - Bucketing method 
 - Local Sensitive Hashing 
 - Conclusion
 
  63Locality Sensitive Hashing
- Indyk  Motwani 98, Gionis, Indyk  Motwani 99 
 - A solution for (r,?)-PLEB. 
 - Probabilistic construction, query succeeds with 
high probability.  - Use random hash functionsg X ? U (some finite 
range).  - Preserve separation of near and far points 
with high probability. 
  64Locality Sensitive Hashing
r
- If p-q  r, then Prg(p)g(q) is high 
 - If p-q gt (1?)r, then Prg(p)g(q) is low
 
  65A locality sensitive family
- A family H of functions h X ? U is called 
(P1,P2,r,(1?)r)-sensitive for metric dX, if for 
any p,q  - if p-q lt r then Pr h(p)h(q)  gt P1 
 - if p-q gt(1?)r then Pr h(p)h(q)  lt P2 
 - For this notion to be useful we requireP1 gt P2
 
  66Intuition
- if p-q lt r then Pr h(p)h(q)  gt P1 
 - if p-q gt(1?)r then Pr h(p)h(q)  lt P2 
 
h2
h1
Illustration from Lihi Zelnik-Manor 
 67Claim
- If there is a (P1,P2,r,(1?)r) - sensitive family 
for dX then there exists an algorithm for 
(r,?)-PLEB in dX with  - Space - O(dnn1?) 
 - Query - O(dn?)Where
 
 When ?  1 O(dn  n3/2) O(dsqrt(n))  
 68Algorithm  preprocessing
- For i  1,,L 
 - Uniformly select k functions from H 
 - Set gi(p)(h1(p),h2(p),,hk(p))
 
0 1 
hi  Rd ? 0,1  
 69Algorithm  preprocessing
- For i  1,,L 
 - Uniformly select k functions from H 
 - Set gi(p)(h1(p),h2(p),,hk(p)) 
 - Compute gi(p) for all p 2 P 
 - Store resulting values in a hash table
 
  70Algorithm - query
- S à ? , i à 1 
 - While S  2L 
 - S Ã S  points in bucket gi(q) of table i 
 - If 9 p 2 S s.t. p-q  (1?)rreturn p and 
exit.  - i 
 - Return NO.
 
  71Correctness
- Property Iif q-p  r then gi(p)  gi(q) 
for some i 2 1,...,L  - Property IInumber of points p2 P s.t. q-p  
(1?)r and gi(p)  gi(q) is less than 2L  - We show that PrI  II hold  ½-1/e
 
  72Correctness
- Property Iif q-p  r then gi(p)  gi(q) 
for some i 2 1,...,L  - Property IInumber of points p2 P s.t. q-p  
(1?)r and gi(p)  gi(q) is less than 2L  - Choose 
 - k  log1/p2n 
 - L  n? where 
 
  73Complexity
- k  log1/p2n 
 - L  n? where 
 - Space 
 - Ln  dn  O(n1?  dn) 
 - Query 
 - L hash function evaluations  O(L) distance 
calculations  O(dn?) 
Hash tables
Data points 
 74Significance of k and L
 Prg(p)  g(q)
 p-q 
 75Significance of k and L
 Prgi(p)  gi(q) for some i 2 1,...,L 
 p-q 
 76Application
- Perform NNS in Rd with l1 distance. 
 - Reduce the problem to NNS in Hd the hamming cube 
of dimension d.  - Hd  binary strings of length d. 
 - dHam(s1,s2)  number of coordinates where s1 and 
s2 disagree.  
  77Embedding l1d in Hd
- w.l.o.g all coordinates of all points in P are 
positive integer lt C.  - Map integer i 2 1,...,C to 
 - (1,1,....,1,0,0,...0) 
 - Map a vector by mapping each coordinate. 
 - Example (5,3,2),(2,4,1) ?(11111,11100,11000),
(11000,11110,10000) 
  78Embedding l1d in Hd
- Distances are preserved. 
 - Actual computations are performed in the original 
space O(log C) overhead. 
  79A sensitive family for the hamming cube
- Hd  hi  hi(b1,,bd)  bi for i  1,,d 
 - If dHam(s1,s2) lt r what is Prh(p)h(q) ? 
 -  at most 1-r/d 
 - If dHam(s1,s2) gt (1?)r what is Prh(p)h(q) ? 
 -  at least 1-(1?)r/d 
 - Hd is (r,(1?)r,1-r/d,1-(1?)r/d) sensitive. 
 - Question what are these projections in the 
original space? 
  80Corollary
- We can bound 
 -   (1/1?) 
 - Space - O(dnn(11/(1?)) 
 - Query - O(dn1/(1?))
 
 When ?  1 O(dn  n3/2) O(dsqrt(n))  
 81Recent results
- In Euclidian space 
 -  ?  1/(1?)2  O(log log n / log1/3 n)Andoni 
 Indyk 2008  -  ?  0.462/(1?)2Motwani, Naor  Panigrahy 
2006  - LSH family for ls s 2 0,2)Datar,Immorlica,Indyk
  Mirrokni 2004  - And many more. 
 
  82Conclusion
- NNS is an important problem with many 
applications.  - The problem can be efficiently solved in low 
dimensions.  - We saw some efficient approximate solutions in 
high dimensions, which are applicable to many 
metrics.