Title: Ch. 16: Sweep-Zones
1Ch. 16 Sweep-Zones
Basic Question Is it possible to compute nearest
neighbors in expected time O(nlog(n)) ??? Basic
Idea Generalize sweep-lines to sweep-zones
!!! Def. The sweep-zone SZ of an area is the set
of regions touching the upper boundary of an area
from below.
2UB-Tree Insertion 18/19
1
3
6
7
8
4
2
9
10
6
5
15
11
16
12
17
18
13
14
3Sweep-Zone Algorithm 1 i Z-regions have been
read in increasing Z-order up to region Ri-1,
i.e. area(R i-1) with upper boundary B(R i-1)
set of cached regions C(R i) is the set of
regions in SZi-1 SZ(area(R i-1)) plus
region Ri 1. for every point p ? Ri let l(p)
and h(p) be the lower and higher neighbor of p on
Z-curve, compute l(p) and h(p). 2. let q
l(p) if dist(p,l(p)) lt dist (p,
h(p)) h(p) otherwise 3. Let Q(p) be the query
box with center p and side length 2dist(p,q)
q
p
44. Retrieve Q(p) from cache or disk and compute
the nearest neighbor ?(p) Note retrieval of
Q(p) should take time O(log n), finding ?(p)
should be nearly constant 5. Cache regions
intersecting Q(p) to enforce linear I/O time 6.
If Ri was the last region in Z-order then exit 7.
Release all regions from C(Ri) which are not in
SZi 8. i i1 read next region R i in Z-order
9. Goto step 1 all nearest neighbors are
known, now cluster
5Sweep-Zone Algorithm 2 Basic Idea run algorithm
forward to compute lower (w.r. to Z-order)
nearest neighbor ?(p) of p and backward to
compute upper (w.r. to Z-order) nearest neighbor
?(p) of p, then ?(p) closest of ?(p),
?(p) i.e. modify step 4 in Sweep-Zone algorithm
1 to compute Q(p) ? area(Ri) Advantages all
pages are read in increasing or decreasing
Z-order only (sequential reads) and cache
requirements are smaller Disadvantage data must
be read twice, tradeoff???
6Cache Contents for Algorithm 2 1 10 9 8 7 6
5 2 1 11 10 6 5 3 2 1 12 11 10 6 4 3
2 13 5 4 3 2 14 6 5 4 3 15 7 6 5 4
3 16 15 14 12 10 8 7 6 5 4 17 16 15 14 12 9
8 7 6 5 4 18 17 16
7Cache Modification 1. Determine extension of next
region to be read using upper part of UB-index 2.
Determine regions that can be released, i.e. SZi
- SZi-1 3. Release regions from cache 4. Read
next region, i.e. transfer it from disk to cache
8Observations expected cache size 1.5 sqrt
(18) 6.4 maximal occurring cache size
6 average cache size 4.28 Cache
Organization keep cache organized as a set of
regions sorted in Z-order, e.g. AVL-tree with
elementary operations append single element and
delete set of elements
9- Open Questions
- which algorithm is faster
- which algorithm requires less resources
- what are the tradeoffs between I/O, cache size,
CPU-time, total time, etc. - analytic comparison of both algorithms?
10Algorithm 3 this is a local optimization of
Algorithm 2 if Q(p) ? area (Ri) then ?(p)
?(p) and we can ignore the computation of ?(p) in
the backward phase Algorithm 4 if ?(p) ?(p)
then discard p entirely from the backward phase,
i.e. reduce the amount of data and computations
for the second phase, but then we have to write
out the non-discarded points Open Question under
what conditions is Algorithm 4 better than
Algorithm 3?