Title: On Computing Topt Influential Spatial Sites
1On Computing Top-t Influential Spatial Sites
- Authors T. Xia, D. Zhang, E. Kanoulas, Y.Du
- Northeastern University, USA
- Appeared in VLDB 2005
- Presenter Xiangyuan Dai
2Outline
- Problem Definition
- Related Work
- The New Metric minExistDNN
- Data Structures and Algorithm
- Experimental Results
- Conclusions
3Motivation
- Which candidate position of a new McDonalds in
Cenrtal and Western is the most influential among
residential buildings? - Sites candidate positions of new McDonalds
- Objects residential buildings
- Weight people in a building
- Query region Central and Western District
- Which wireless station in Hong Kong is the most
influential among mobile users?
4Problem Definition
- Given
- a set of sites S
- a set of weighted objects O
- a spatial region Q
- an integer t.
- Top-t most influential sites query
- find t sites in Q with the largest influences.
- influence of a site s total weight of objects
that consider s as the nearest site.
5Example
- Suppose all objects have weight 1, Q is the
whole space, and t 1. - The most influential site is s1, with influence
3.
6Example
o2
o4
s2
s3
o5
o1
s4
s1
o3
o6
- Now that Q is the shadowed rectangle and t 2.
- Top-2 most influential sites s4 and s2.
7Outline
- Problem Definition
- Related Work
- The New Metric minExistDNN
- Data Structures and Algorithm
- Experimental Results
- Conclusions
8Related Work
- Bi-chromatic RNN query considers two datasets,
sites and objects. - The RNNs of a site s ? S are the objects that
consider s as the nearest site.
9Related Work
- Solutions to the RNN query based on
pre-computation KM00, YL01.
10Related Work
- Solution to RNN query based on Voronoi diagram
SRAE01. - Compute the Voronoi cell of s a region enclosing
the locations closer to s than to any other
sites. - Querying the object R-tree using the Voronoi cell.
11Related Work SRAE01
o2
o4
s2
s3
o5
o1
s4
s1
o3
o6
12Our Problem vs. RNN Query
- RNN query
- A single site as an input.
- Interested in the actual set of the RNNs.
- Top-t most influential sites query
- A spatial region as an input.
- Interested in the aggregate weight of RNNs.
13Straightforward Solution 1
- For each site, pre-compute its influence.
- At query time, find the sites in Q and return the
t sites with max influences. - Drawback 1 Costly maintenance upon updates.
- Drawback 2 binding a set of sites closely with a
set of objects.
14Straightforward Solution 2
- An extension of the Voronoi diagram based
solution to the RNN query. - Find all sites in Q.
- For each such site, find its RNNs by using the
Voronoi cell, and compute its influence. - Return the t sites with max influences.
15Straightforward Solution 2
- Drawback 1 All sites in Q need to be retrieved
from the leaf nodes. - Drawback 2 The object R-tree and the site R-tree
are browsed multiple times. - For each site in Q, browse the site R-tree to
compute the Voronoi Cell. - For each such Voronoi Cell, browse the object
R-tree to compute the influence.
16Features of Our Solution
- Systematically browse both trees once.
- Pruning techniques are provided based on a new
metric, minExistDNN. - No need to compute the influences for all sites
in Q, or even to locate all sites in Q.
17Outline
- Problem Definition
- Related Work
- The New Metric minExistDNN
- Data Structures and Algorithm
- Experimental Results
- Conclusions
18Motivation
- Intuitively, if some object in Oi may consider
some site in Sj as an NN, Oi affects Sj. - To estimate the influences of all sites in a site
MBR Sj, we need to know whether an object MBR Oi
will affect Sj.
19maxDist A Loose Estimation
- If maxDist(O1, S1) lt minDist(O1, S2), O1 does not
affect S2. - Why not good enough?
20minMaxDist A Tight Estimation?
- An object o does not affect S2, if there exists
S1 such that - minMaxDist(o1, S1) lt minDist(o1, S2)
21minMaxDist A Tight Estimation?
- Not true for an object MBR O1.
22A Tight Estimation?
- A metric m(O1, S1) should
- guarantee that, each location in O1 is within
m(O1, S1) of a site in S1, - and be the smallest distance with this property.
23New Metric minExistDNNS1(O1)
- Definition minExistDNNS1(O1)
- max minMaxDist(l, S1) ? location l? O1
- O1 does not affect S2, if there exists S1, s.t.
minExistDNNS1(O1) lt minDist(O1, S2).
24Examples of minExistDNNS1(O1)
25Calculating minExistDNNS1(O1)
- Step 1 Space partitioning
Every location l in the same partition is
associated with the second closest corner of S1
the distance is minMaxDist(l, S1)!
26Space Partitioning
- O1 is divided into multiple sub-regions, one in
each partition.
27Calculating minExistDNNS1(O1)
- Step 2 Choose up-to 8 locations on O1 border
and compute the minMaxDists to S1. - minExistDNN is the largest one!
28Outline
- Problem Definition
- Related Work
- The New Metric minExistDNN
- Data Structures and Algorithm
- Experimental Results
- Conclusions
29Data Structure
- Two R-trees S of sites, O of objects.
- Three queues
- queueSIN entries of S inside Q.
- queueSOUT entries of S outside Q.
- queueO entries of O.
30Data Structure (cont)
- queueSIN Sj Sj is a visited but not expanded
entry in S, whose MBR is inside Q and whose
maxInuencegt0 - queueO Oi Oi is a visited but not expanded
entry in O, which affects some entry in queueSIN - queueSOUT Sj Sj is a visited but not
expanded entry in S, whose MBR is outside Q and
which is affected by some entry in queueO
31Data Structure (cont)
- queueO only consists of entries from O that
affect at least one entry in queueSIN - queueSOUT only consists of entries from S (but
outside Q) that are affected by at least one
entry in queueO.
32Data Structure
S1
S2
- queueSIN
- queueO
- queueSOUT
O1
S3
33maxInfluence and minInfluence
- For each entry Sj in queueSIN,
- maxInfluence total weight of entries in queueO
that affect Sj. - minInfluence total weight of entries in queueO
that ONLY affect Sj, divided by the number of
objects in Sj. - queueSIN is sorted in decreasing order of
maxInfluence.
34Algorithm Overview
- Expand an entry from one of the three queues.
- Remove the entry from the queue.
- Retrieve the referenced node, and insert the
(unpruned) entries into the same queue. - Update maxInfluence and minInfluence if
necessary. - If top-t entries in queueSIN are sites, with
minInfluences maxInfluences of all remaining
entries, return.
35Example
- queueSIN S1
- queueO O1
- queueSOUT S3
- queueSIN S5, S7
- queueO O6
- queueSOUT S9
Q
- S6 is not affected by O1, prune S6.
- O5 does not affect S5 and S7, prune O5.
- S8 is not affected by O6, prune S8.
36A Pruning Case
S1
Expand S1
- S2 is pruned because of minExistDNNS3(O1) lt
minDist(S2, O1)
37Outline
- Problem Definition
- Related Work
- The New Metric minExistDNN
- Data Structures and Algorithm
- Experimental Results
- Conclusions
38Experimental Setup
- Data sets
- 24,493 populated places in North America
- 9,203 cultural landmarks in North America
- R-tree page size 1 KB
- LRU buffer 128 disk pages.
- t 4.
- Comparing to the solution using Voronoi diagram.
39Selected Experimental Results
40Selected Experimental Results
41Outline
- Problem Definition
- Related Work
- The New Metric minExistDNN
- Data Structures and Algorithm
- Experimental Results
- Conclusions
42Conclusions
- We addressed a new problem Top-t most
influential sites query. - We proposed a new metric minExistDNN. It can be
used to prune search space in NN/RNN related
problems. - We carefully designed an algorithm which
systematically browses both R-trees once. - Experiments showed more than an order of
magnitude improvement.