Title: Nearest Neighbor Queries using Rtrees
1Nearest Neighbor Queries using R-trees
- Based on notes by Yufei Tao
2Nearest Neighbor Search
- Find the object nearest to a query point q
- E.g., find the gas station nearest to the red
point. - k nearest neighbors Find the k objects nearest
to q - E.g., 1 NN h, 2NN h, a, 3NN h, a, i
3Nearest Neighbor Processing
- The R-tree can accelerate NN search, too.
- Concept mindist(q, E)
- The minimum distance between a point q and a
rectangle E
4Depth-first NN Algorithm
- First load the root and compute the mindist from
each entry to the query. - Visit the child of the entry with the smallest
mindist. - In this case E6
5Depth-first NN Algorithm (cont.)
- Do this recursively at the next level. In the
child node of E6, compute the mindist from every
entry to the query. - Visit the child node of the entry having the
smallest mindist. - In this case, E1 and E2 have the same mindist.
- So the decision is random say, E1 first.
- Among all the points in the child node of E1,
find the closest point a (our current result).
6Depth-first NN Algorithm (cont.)
- Then backtrack to the child node of E6, where the
entry with the next mindist value is E2. - Its mindist 51/2 is however the same as the
distance from q to a. - So, we know that no point in E2 can possibly be
closer to q than a. - No result in E3 either same reasoning.
7Depth-first NN Algorithm (cont.)
- We now backtrack to the root, where the entry
with the next mindist is E7. - Its mindist 21/2 closer than the distance 51/2
from q to a. - Thus, its subtree may contain some point whose
distance to q is smaller than the distance
between q and a so we have to visit it - At the child node of E7, compute the mindist of
all entries to q. - E4 will be descended next.
8Depth-first NN Algorithm (cont.)
- In the child node of E4, we find a point h that
is closer to q than a. - So h becomes our new nearest neighbor.
- We backtrack to the child node of E7, where the
entry with the next mindist is E5. - E5s mindist 131/2 is larger than the distance
21/2 from q to a. So we prune its subtree. - The algorithm backtracks to the root and
terminates. - Visited (in this order) root, and the child nodes
of E6, E1, E7, E4.
9Another Depth-first Example 2 NN
- Difference entries must be pruned based on their
distances to our 2nd current NN. - Root gt child node of E6 gt child node of E1 gt
find a, b here - Backtrack to child node of E6 gt child node of E2
(its mindist lt dist(q, b)) gt update our result
to a, f - Backtrack to child node of E6 gt child node of E3
gt backtrack to the root gt child node of E7 gt
child node of E4 gt update our result to a, h - Backtrack to child node of E7 gt prune E5 gt
backtrack to the root gt end.
10Optimal Performance of kNN Search
- Whats the best performance that can ever be
achieved for a kNN? - Vicinity circle Centered at query q, with radius
equal to the distance of q to its k-th NN - All nodes that intersect the vicinity circle must
be visited. - Child node of E6 must be accessed by any
algorithm. - Although theres no result in its subtree, this
cannot be verified unless we visit it!
11Best-first Algorithm (optimal algorithm)
- BF maintains all the (leaf- and non-leaf) entries
seen so far in the memory, and sorts them in
ascending order by their mindist. - Each step processes the entry in memory with the
smallest mindist.
12Best-first Algorithm (cont.)
- Insert all the entries in the child node of E6
into the sorted list. - E7 is the next one to be processed.
13Best-first Algorithm (cont.)
- Insert all the entries in the child node of E7
into the sorted list. - The next entry to be processed is E4.
14Best-first Algorithm (cont.)
- Insert all the entries in the child node of E4
into the sorted list. - The next entry to be processed is h, which is a
leaf entry. - This is the first NN of q.
15Best-first Algorithm 2NN
- Assume we want 2 NNs then, the algorithm
continues. - Report h as the 1st NN, and remove it from the
heap - The next entry to be processed is E1
16Best-first Algorithm 2NN (cont.)
- Visit the child node of E1 enter all its entries
into the sorted list. - The next entry is a, which is a leaf entry
- The 2nd NN and the algorithm terminates.
- Whenever we process a leaf entry in memory, it is
the next NN for sure.
17Best-first Best Performance
- To find the 1st NN, we visited the root, and the
child nodes of E6, E7, E4. - To find the 2nd, in addition to the above 3
nodes, we also visited the child node of E1. - Both cases are optimal.
- It can be proved that BF visits the nodes in the
tree in ascending order of their mindist to the
query point.
18Retrospect The Rationale Behind
- What is the main reasoning of depth-first and
best-first algorithms? - Use mindist to quantify the quality of the best
point in a subtree. - If a nodes mindist is already greater than our
current result, prune it.