Progressive Computation of The Min-Dist Optimal-Location Query - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Progressive Computation of The Min-Dist Optimal-Location Query

Description:

'What is the optimal location in Boston area to build a new McDonald's store? ... Location-based service providers (e.g. Verizon, AT&T) Computer Scientists ... – PowerPoint PPT presentation

Number of Views:11
Avg rating:3.0/5.0
Slides: 57
Provided by: zgk
Category:

less

Transcript and Presenter's Notes

Title: Progressive Computation of The Min-Dist Optimal-Location Query


1
Progressive Computation of The Min-Dist
Optimal-Location Query
  • Donghui Zhang,
  • Yang Du, Tian Xia, Yufei Tao
  • Northeastern University
  • Chinese University of Hong Kong

VLDB06, Seoul, Korea
2
Motivation
  • What is the optimal location in Boston area to
    build a new McDonalds store?
  • Suppose a customer drives to the closest
    McDonalds.
  • Optimality Minimize AVG driving distance.

3
Who will be interested?
  • Corporations
  • Chained restaurants (e.g. McDonalds, Burger
    King, Starbucks)
  • Supermarkets (e.g. Wal-Mart, Costco, Stop Shop)
  • Location-based service providers (e.g. Verizon,
    ATT)
  • Computer Scientists especially in
  • Databases
  • Computational Geometry
  • Algorithms

4
min-dist OL
600
200
200
600
  • Without any new site
  • AD (200200600600)/4 400.

5
min-dist OL
600
30
l1
30
600
  • Without any new site
  • AD (200200600600)/4 400.
  • With new site l1
  • AD(l1) (3030600600)/4 315.

6
min-dist OL
200
30
l2
30
200
  • Without any new site
  • AD (200200600600)/4 400.
  • With new site l1
  • AD(l1) (3030600600)/4 315.
  • With new site l2
  • AD(l2) (2002003030)/4 115.

7
Formal Definition
  • Given a set S of sites, a set O of objects, and a
    query range Q ,
  • min-dist OL is a location l ? Q which minimizes
  • Solution compute all AD(l). But

8
Challenging
  1. There are infinite number of locations in Q! How
    to produce a finite set of candidates (yet
    keeping optimality)?
  2. How to avoid computing AD(l) for all candidates?

9
Solution Highlights
  1. Algorithm to compute AD(l).
  2. Theorems to limit candidates.
  3. Lower-bound of AD(l) for all locations l in a
    cell C.
  4. Progressive algorithm.

10
L1 Distance
  • d(o, s) o.x s.xo.y s.y

11
1. Compute AD(l)
  • Remember
  • Let RNN(l) be the objects attracted by l.
  • AD(l)AD if RNN(l)?

RNN(l)? ADAD(l)
12
1. Compute AD(l)
  • Remember
  • Define
  • Let RNN(l) be the objects attracted by l.
  • AD(l)AD if RNN(l)?

l
13
1. Compute AD(l)
  • Remember
  • Define
  • Let RNN(l) be the objects attracted by l.
  • AD(l)AD if RNN(l)?
  • AD(l)AD - ?

14
1. Compute AD(l)
  • Theorem
  • S and O are static versus l.
  • AD can be pre-computed.
  • So is dNN(o, S)
  • To compute AD(l)
  • Find RNN(l)
  • ?o?RNN(l), compute d(o, l)

15
How to compute RNN(l)?
  • This is an implementation detail, dealing with
    computational geometry and spatial databases.
  • Naïve solution ?o ?O , compare with all sites
    and l.
  • More efficient
  • Compute Voronoi cell of l.
  • Retrieve objects inside the Voronoi cell using a
    range search on R-tree.

16
How to compute RNN(l)?(1) Compute Voronoi cell
  • Remember RNN(l) is the set of objects close to l
    than to any existing site in S.
  • Consider all sites. Draw a spatial region close
    to l than to any site.

17
How to compute RNN(l)?(2) Retrieve objects
  • Standard range search.
  • Any spatial access methods, e.g. R-tree.

18
y axis
10
m
g
h
l
8
k
f
e
6
i
j
d
4
b
a
2
c
x axis
10
0
8
2
4
6
Range query find the objects in a given
range. E.g. find all hotels in Boston. No index
scan through all objects. NOT EFFICIENT!
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
y axis
10
m
g
h
l
8
k
f
e
E
2
6
i
j
E
d
1
4
b
a
2
c
x axis
10
0
8
2
4
6
Root
E
E
1
2
E
E
E
E
E
E
1
E
3
4
5
6
7
2
e
a
c
d
g
b
f
m
j
l
i
h
k
E
E
E
E
E
4
3
5
7
6
23
y axis
10
m
g
h
l
8
k
f
e
E
2
6
i
j
E
d
1
4
b
a
2
c
x axis
10
0
8
2
4
6
Root
E
E
1
2
E
E
E
E
E
E
1
E
3
4
5
6
7
2
e
a
c
d
g
b
f
m
j
l
i
h
k
E
E
E
E
E
4
3
5
7
6
24
y axis
10
m
g
h
l
8
k
f
e
E
2
6
i
j
E
d
1
4
b
a
2
c
x axis
10
0
8
2
4
6
Root
E
E
1
2
E
E
E
E
E
E
1
E
3
4
5
6
7
2
e
a
c
d
g
b
f
m
j
l
i
h
k
E
E
E
E
E
4
3
5
7
6
25
2. Limit candidates
  • Theorem within the X/Y range of Q, draw grid
    lines crossing objects. Only need to consider
    intersections!

Q
26
2. Limit candidates
  • Theorem within the X/Y range of Q, draw grid
    lines crossing objects. Only need to consider
    intersections!

Q
5x630 candidates
27
2. Limit candidates
  • Proof idea suppose the OL is not, move it will
    produce a better (or equal) result.
  • Consider RNN(l).
  • Move to the right ? saves total dist.

28
2. VCU(Q)
  • A spatial region, enclosing the objects closer to
    Q than to sites in S.
  • Its the Voronoi cell of Q versus sites in S.

Q
29
2. Further Limit candidates
  • Only consider objects in VCU(Q).

30
2. Further Limit candidates
  • Only consider objects in VCU(Q).

31
2. Further Limit candidates
  • Only consider objects in VCU(Q).

32
Naïve Algorithm
  • Derive candidates.
  • Compute AD(l) for each.
  • Pick smallest.
  • Not efficient! Too many candidates! To compute
    AD(l) for each one, need
  • compute RNN(l)
  • retrieve all these objects

33
Progressive Idea
  • Treat Q as a cell and consider its corners.

34
Progressive Idea
  • Divide the cell.

35
Progressive Idea
  • Divide the cell.

36
Progressive Idea
  • Recursively divide a sub-cell.

37
Progressive Idea
  • Recursively divide a sub-cell.
  • Able to check all candidates.

38
Progressive Idea
  • Q What do you save?
  • A Cell pruning, if its lower bound ? AD(l0) of
    some candidate l0.

Suppose 60 is a lower bound for AD(l), ?l?
39
3. LB(C) lower bound for AD(l), ?l?C
AD(c1)1000
AD(c2)3000
c
AD(c3)4000
AD(c4)2500
40
3. LB(C) lower bound for AD(l), ?l?C
AD(c1)1000
AD(c2)3000
c
AD(c3)4000
AD(c4)2500
  • Theorem

is a lower bound, where p is perimeter.
  • e.g. LB(C)3500-p/4

41
3. LB(C) lower bound for AD(l), ?l?C
  • A better lower bound Theorem
  • Comparing with the previous lower bound
  • Higher quality since the lower bound is larger.
  • More computation.

42
4. The Progressive Algorithm
  1. Maintain a heap of cells ordered by LB().
    Initially one cell Q.
  2. Maintain the best candidate lopt
  3. Pick the cell with minimum LB() and partition it.
  4. Compute AD() for the corners of sub-cells.
  5. Compute LB() for the sub-cells.
  6. Insert sub-cell ci to heap if LB(ci)ltAD(lopt)
  7. Goto 3.

43
Progressiveness
  • The algorithm quickly reports a candidate OL with
    a confidence interval, and keeps refining.

44
Progressiveness
  • The algorithm quickly reports a candidate OL with
    a confidence interval, and keeps refining.

AD(best candidate)
LB(Q)
Time
45
Progressiveness
  • The algorithm quickly reports a candidate OL with
    a confidence interval, and keeps refining.

AD(best candidate)
Min LB(C) C in heap
Time
  • User may choose to terminate any time.

46
Batch Partitioning
  • To partition a cell, should partition into
    multiple sub-cells.
  • Reason to compute AD(l), need to access the
    R-tree of objects. When access the R-tree, want
    to compute multiple AD(l).
  • Tradeoff if partition too much wasteful! Since
    some candidates could be pruned.

47
Performance Setup
  • O 123,593 postal addresses in Northeastern part
    of US. Stored using an R-tree.
  • S randomly select 100 sites from O.
  • Buffer 128 pages.
  • Dell Pentium IV 3.2GHz.
  • Query size 1 in each dimension.

48
2. Further Limit candidates
  • review slide
  • Only consider objects in VCU(Q).

49
Effect of VCU Computation
50
3. LB(C) lower bound for AD(l), ?l?C
  • review slide

AD(c1)1000
AD(c2)3000
c
AD(c3)4000
AD(c4)2500
  • Theorem

is a lower bound, where p is perimeter.
  • e.g. LB(C)3500-p/4

51
3. LB(C) lower bound for AD(l), ?l?C
  • review slide
  • A better lower bound Theorem
  • Comparing with the previous lower bound
  • Higher quality since the lower bound is larger.
  • More computation.

52
Comparison of Lower Bounds
53
Effect of Batch Partitioning
54
Progressiveness
  • review slide
  • The algorithm quickly reports a candidate OL with
    a confidence interval, and keeps refining.
  • User may choose to terminate any time.

55
Progressiveness
  • Each step partition a cell to 40 sub-cells.
  • After 200 steps, accurate answer.
  • After 20 steps, answer is 1 away from optimal.

56
Conclusions
  • Introduced the min-dist optimal-location query.
  • Proved theorems to limit the number of
    candidates.
  • Presented lower-bound estimators.
  • Proposed a progressive algorithm.
  • Q A...
Write a Comment
User Comments (0)
About PowerShow.com