Title: Progressive Computation of The Min-Dist Optimal-Location Query
1Progressive Computation of The Min-Dist
Optimal-Location Query
- Donghui Zhang,
- Yang Du, Tian Xia, Yufei Tao
- Northeastern University
- Chinese University of Hong Kong
VLDB06, Seoul, Korea
2Motivation
- What is the optimal location in Boston area to
build a new McDonalds store? - Suppose a customer drives to the closest
McDonalds. - Optimality Minimize AVG driving distance.
3Who will be interested?
- Corporations
- Chained restaurants (e.g. McDonalds, Burger
King, Starbucks) - Supermarkets (e.g. Wal-Mart, Costco, Stop Shop)
- Location-based service providers (e.g. Verizon,
ATT) - Computer Scientists especially in
- Databases
- Computational Geometry
- Algorithms
4min-dist OL
600
200
200
600
- Without any new site
- AD (200200600600)/4 400.
5min-dist OL
600
30
l1
30
600
- Without any new site
- AD (200200600600)/4 400.
- With new site l1
- AD(l1) (3030600600)/4 315.
6min-dist OL
200
30
l2
30
200
- Without any new site
- AD (200200600600)/4 400.
- With new site l1
- AD(l1) (3030600600)/4 315.
- With new site l2
- AD(l2) (2002003030)/4 115.
7Formal Definition
- Given a set S of sites, a set O of objects, and a
query range Q , - min-dist OL is a location l ? Q which minimizes
-
- Solution compute all AD(l). But
8Challenging
- There are infinite number of locations in Q! How
to produce a finite set of candidates (yet
keeping optimality)? - How to avoid computing AD(l) for all candidates?
9Solution Highlights
- Algorithm to compute AD(l).
- Theorems to limit candidates.
- Lower-bound of AD(l) for all locations l in a
cell C. - Progressive algorithm.
10L1 Distance
111. Compute AD(l)
- Let RNN(l) be the objects attracted by l.
- AD(l)AD if RNN(l)?
RNN(l)? ADAD(l)
121. Compute AD(l)
- Let RNN(l) be the objects attracted by l.
- AD(l)AD if RNN(l)?
l
131. Compute AD(l)
- Let RNN(l) be the objects attracted by l.
- AD(l)AD if RNN(l)?
141. Compute AD(l)
- S and O are static versus l.
- AD can be pre-computed.
- So is dNN(o, S)
- To compute AD(l)
- Find RNN(l)
- ?o?RNN(l), compute d(o, l)
15How to compute RNN(l)?
- This is an implementation detail, dealing with
computational geometry and spatial databases. - Naïve solution ?o ?O , compare with all sites
and l. - More efficient
- Compute Voronoi cell of l.
- Retrieve objects inside the Voronoi cell using a
range search on R-tree.
16How to compute RNN(l)?(1) Compute Voronoi cell
- Remember RNN(l) is the set of objects close to l
than to any existing site in S. - Consider all sites. Draw a spatial region close
to l than to any site.
17How to compute RNN(l)?(2) Retrieve objects
- Standard range search.
- Any spatial access methods, e.g. R-tree.
18y axis
10
m
g
h
l
8
k
f
e
6
i
j
d
4
b
a
2
c
x axis
10
0
8
2
4
6
Range query find the objects in a given
range. E.g. find all hotels in Boston. No index
scan through all objects. NOT EFFICIENT!
19(No Transcript)
20(No Transcript)
21(No Transcript)
22y axis
10
m
g
h
l
8
k
f
e
E
2
6
i
j
E
d
1
4
b
a
2
c
x axis
10
0
8
2
4
6
Root
E
E
1
2
E
E
E
E
E
E
1
E
3
4
5
6
7
2
e
a
c
d
g
b
f
m
j
l
i
h
k
E
E
E
E
E
4
3
5
7
6
23y axis
10
m
g
h
l
8
k
f
e
E
2
6
i
j
E
d
1
4
b
a
2
c
x axis
10
0
8
2
4
6
Root
E
E
1
2
E
E
E
E
E
E
1
E
3
4
5
6
7
2
e
a
c
d
g
b
f
m
j
l
i
h
k
E
E
E
E
E
4
3
5
7
6
24y axis
10
m
g
h
l
8
k
f
e
E
2
6
i
j
E
d
1
4
b
a
2
c
x axis
10
0
8
2
4
6
Root
E
E
1
2
E
E
E
E
E
E
1
E
3
4
5
6
7
2
e
a
c
d
g
b
f
m
j
l
i
h
k
E
E
E
E
E
4
3
5
7
6
252. Limit candidates
- Theorem within the X/Y range of Q, draw grid
lines crossing objects. Only need to consider
intersections!
Q
262. Limit candidates
- Theorem within the X/Y range of Q, draw grid
lines crossing objects. Only need to consider
intersections!
Q
5x630 candidates
272. Limit candidates
- Proof idea suppose the OL is not, move it will
produce a better (or equal) result.
- Move to the right ? saves total dist.
282. VCU(Q)
- A spatial region, enclosing the objects closer to
Q than to sites in S. - Its the Voronoi cell of Q versus sites in S.
Q
292. Further Limit candidates
- Only consider objects in VCU(Q).
302. Further Limit candidates
- Only consider objects in VCU(Q).
312. Further Limit candidates
- Only consider objects in VCU(Q).
32Naïve Algorithm
- Derive candidates.
- Compute AD(l) for each.
- Pick smallest.
- Not efficient! Too many candidates! To compute
AD(l) for each one, need - compute RNN(l)
- retrieve all these objects
33Progressive Idea
- Treat Q as a cell and consider its corners.
34Progressive Idea
35Progressive Idea
36Progressive Idea
- Recursively divide a sub-cell.
37Progressive Idea
- Recursively divide a sub-cell.
- Able to check all candidates.
38Progressive Idea
- A Cell pruning, if its lower bound ? AD(l0) of
some candidate l0.
Suppose 60 is a lower bound for AD(l), ?l?
393. LB(C) lower bound for AD(l), ?l?C
AD(c1)1000
AD(c2)3000
c
AD(c3)4000
AD(c4)2500
403. LB(C) lower bound for AD(l), ?l?C
AD(c1)1000
AD(c2)3000
c
AD(c3)4000
AD(c4)2500
is a lower bound, where p is perimeter.
413. LB(C) lower bound for AD(l), ?l?C
- A better lower bound Theorem
- Comparing with the previous lower bound
- Higher quality since the lower bound is larger.
- More computation.
424. The Progressive Algorithm
- Maintain a heap of cells ordered by LB().
Initially one cell Q. - Maintain the best candidate lopt
- Pick the cell with minimum LB() and partition it.
- Compute AD() for the corners of sub-cells.
- Compute LB() for the sub-cells.
- Insert sub-cell ci to heap if LB(ci)ltAD(lopt)
- Goto 3.
43Progressiveness
- The algorithm quickly reports a candidate OL with
a confidence interval, and keeps refining.
44Progressiveness
- The algorithm quickly reports a candidate OL with
a confidence interval, and keeps refining.
AD(best candidate)
LB(Q)
Time
45Progressiveness
- The algorithm quickly reports a candidate OL with
a confidence interval, and keeps refining.
AD(best candidate)
Min LB(C) C in heap
Time
- User may choose to terminate any time.
46Batch Partitioning
- To partition a cell, should partition into
multiple sub-cells. - Reason to compute AD(l), need to access the
R-tree of objects. When access the R-tree, want
to compute multiple AD(l). - Tradeoff if partition too much wasteful! Since
some candidates could be pruned.
47Performance Setup
- O 123,593 postal addresses in Northeastern part
of US. Stored using an R-tree. - S randomly select 100 sites from O.
- Buffer 128 pages.
- Dell Pentium IV 3.2GHz.
- Query size 1 in each dimension.
482. Further Limit candidates
- Only consider objects in VCU(Q).
49Effect of VCU Computation
503. LB(C) lower bound for AD(l), ?l?C
AD(c1)1000
AD(c2)3000
c
AD(c3)4000
AD(c4)2500
is a lower bound, where p is perimeter.
513. LB(C) lower bound for AD(l), ?l?C
- A better lower bound Theorem
- Comparing with the previous lower bound
- Higher quality since the lower bound is larger.
- More computation.
52Comparison of Lower Bounds
53Effect of Batch Partitioning
54Progressiveness
- The algorithm quickly reports a candidate OL with
a confidence interval, and keeps refining.
- User may choose to terminate any time.
55Progressiveness
- Each step partition a cell to 40 sub-cells.
- After 200 steps, accurate answer.
- After 20 steps, answer is 1 away from optimal.
56Conclusions
- Introduced the min-dist optimal-location query.
- Proved theorems to limit the number of
candidates. - Presented lower-bound estimators.
- Proposed a progressive algorithm.