Title: Location-based Spatial Queries
1Location-based Spatial Queries
-
AGM SIGMOD 2003 - Jun Zhang, Manli Zhu, Dimitris Papadias, Yufei
Tao, Dik Lun Lee - Department of Computer Science
- Hong Kong University of Science and Technology
- and
- Carnegie Mellon University
- Presented By
- Sreepraveen Veeramachaneni
2Outline
- Introduction
- General techniques in spatial query Processing
- Motivation
- Background
- Location-based nearest neighbor search
- Location-based window queries
- Experiments
- Summary
- References
3 Introduction
- The paper proposes an approach that enables
mobile clients to determine the validity of
previous queries based on their current location. - Two types are spatial queries are discussed
- 1.Window Queries
- 2.Nearest Neighbor Queries
4Techniques In use
- Spatial databases have been extensively studied
during the last two decades and several spatial
access methods have been proposed. - The most popular one is R-tree and its variations
like R -tree. - R-trees can be viewed as multi-dimensional
extensions of B-trees.
5R-treeAssuming capacity of 3 entries per node.
Points that are close in space are clustered in
the same leaf node represented as a MBR. Nodes
are then recursively grouped together following
the same principle until the top level, which
consists of a single root
6R tree Contd.
- This technique is used to answer window queries
- Another important type of spatial information
processing is nearest neighbor query, which
retrieves the data point that is closest to a
query point
7Branch and Bound Algorithm
- Roussopoulos et al., proposed a branch and bound
algorithm that searches the R-tree in a depth
first manner. - Starting from root, all entries are sorted
according to their minimum distance from the
query point, and the entry with the smallest
value is visited first. - The process is repeated recursively until the
leaf level where the first potential nearest
neighbor is found. - During backtracking to upper levels, the
algorithm only visits entries whose mindist is
smaller than the distance of the nearest neighbor
already found.
8(No Transcript)
9Outline
- Introduction
- General techniques in spatial query Processing
- Motivation
- Background
- Location-based nearest neighbor search
- Location-based window queries
- Experiments
- Summary
10Traditional Scenario
- The traditional scenario in spatial databases
- assumes that
- Queries are static and
- Each query returns a single output and terminates.
11Where Is Your Nearest Restaurant?
- Traditional nearest neighbor search in spatial
databases considers static query points.
12 What if You Move?
- Getting only the nearest neighbor is inadequate
When will it expire?
13- The conventional approach to attain up-to-date
information is to pose a new query to the server
after a position update, which could lead to high
network overhead and extra processing effort. - And due to high mobility of the user, the result
may be invalidated immediately as the users
position changes
14Outline
- Introduction
- General techniques in spatial query Processing
- Motivation
- Background
- Location-based nearest neighbor search
- Location-based window queries
- Experiments
- Summary
15First spatial query processing Technique for
Mobile computing Zheng, Lee SSTD 2001
- The first technique was to pre-compute and store
in an R-tree the Voronoi diagram of the dataset. - Voronoi Diagram The Voronoi diagram of a
collection of geometric objects is a partition of
space into cells, each of which consists of the
points closer to one particular object than to
any others
16- When a nearest neighbor query arrives at the
server, the Voronoi diagram is used to
efficiently compute the nearest neighbor
17- In addition to result, the server sends back to
the client the client the validity time T of the
result, which is a conservative approximation
assuming that the querys speed is below a
maximum value. - Problem Difficult to estimate the value of Query
speed. A high value will result in very short T
and a low value will result in false misses - The method only deals with single nearest
neighbor queries and retrieval of K neighbors
would require order-K Voronoi diagrams , which
are complicated and incur large space overhead.
18K- Nearest neighbor query Song, Roussopoulos,
SSTD 2001
- Song and Roussopoulos proposed a technique that
does not assume Voronoi diagrams and can be used
for any number of neighbors. - When a k nearest neighbor query q arrives, the
server computes and returns to the client a
number m gt k of neighbors.
19Implementation
- Let dist (k) and dist (m) be the distances of the
kth and mth nearest neighbor from the query point
q. - If the client re-issues the query at a new
location q, the new k nearest neighbors will be
among the m objects of the first query, provided
that -
- 2.dist(q,q) dist(m) dist(k)
20Example
- The figure shows an example for a 2-nearest
neighbor query at location q, where the server
returns four results o, a, b and c ( the nearest
neighbors are o and a) - When the client moves to the location q, the two
NN are o and b. - If 2.dist(q, q) dist(4) dist(2), the client
can determine this by computing new distances
(wrt to q) of the four objects, with out having
to issue a new query to the server
21Problems
- An obvious problem of this approach lies in
obtaining a proper value of m - A high value will increase the network overhead
and the storage requirements at the client, while
a low value may be useless( if it does not reduce
the number of queries) - m depends on factors like data distribution and
query frequency which are difficult to estimate
22Time Parameterized Nearest Neighbor (TP NN) Tao
and Papadias, SIGMOD02
- Given a query moving with steady velocity, return
all nearest neighbor results ( up to a future
timestamp), i.e., the output is a set of tuples
ltRi, Tigt, where Ri is the set of nearest
neighbors during future interval Ti - For this situation, the concept of time
parameterized queries can be applied for both
window queries and nearest neighbor queries. - When a server receives a request from a client ,
it executes a TP query and returns ltR,T,Cgt, where
R is the set of objects satisfying the
corresponding spatial query (current result), T
is the validity time of R, and C is the result
change at T - From the set of objects in R, and the set of
objects in C that will cause the changes , the
client can incrementally compute the next result
23TP window Query
- Consider, that a client moving east with speed
one issues a window query. - The server returns ltb, 1, -bgt meaning that
object b currently intersects the query window,
but after 1 time unit it will stop doing so and
therefore, b should be removed from the result.
24Influence of a Object
- The result of a spatial query changes in future
because some objects influence its correctness. - If an object (e.g., b) satisfies the query at the
current time, it may influence the result when it
no longer satisfies it in the future (at time 1). - An object not currently in the result (e.g., d)
may influence the query when it becomes part of
the result (at time 2). - Some objects such as a and c, may never change
the result, so their influence time is set to 8
25Time Parameterized Nearest Neighbor (TP NN)Tao
and Papadias, SIGMOD02
- Returns
- The nearest neighbor R of the current query
location - The expiry time T of R (given the querys
movement) - The change C of the result at T
Result Ri, T2, Cj
26Problem with the techniques discussed so far
- All the techniques we discussed for mobile
computing presuppose that future locations of
clients can be calculated using their current
movements (i.e., the velocity of client is known
and constant during the lifespan of the query) - But in many applications query velocities are
continuously updated as the users change their
speed or direction of movement - Motivated by this, the authors introduce a
technique where, instead of time, the validity of
the result is determined by the users location in
space.
27Outline
- Introduction
- General techniques in spatial query Processing
- Motivation
- Background
- Location-based nearest neighbor search
- Location-based window queries
- Experiments
- Summary
28Location-Based Nearest Neighbors
- Assumptions
- We assume that there exists a spatial index
(e.g., R-trees) for data objects, but no
specialized structures (e.g., Voronoi diagrams)
for nearest neighbor search.
29Getting You Covered by the Nearest Restaurant
- Some users (say, a tourist walking causally)
cannot specify their heading directions clearly.
30Validity Region of the Result
- In addition to the nearest restaurant, we also
return the validity region of this restaurant. - Another query is issued to retrieve the new
nearest restaurant, only if the user moves out of
this region.
31Influence Points
- Points that determine the influence region.
32Influence Points
- Keeping the influence points avoids the
in-polygon check. - The user only needs to check if her/his location
is closer to any yellow point than a.
33Validity Region A Closer Look
- The validity region of q is the Voronoi Cell (VC)
of o.
34Pre-Compute the Voronoi Diagram?
- Bad idea!
- To answer kNN of a specific value k, a k-order
Voronoi Diagram is necessary. - If we want to answer NN, 2NN, , 20NN, then 20
sets of Voronoi Diagrams are necessary. - Huge space!
- Poor support for data update.
- Our solution Compute the cell on the fly.
- Use a single R-tree
- Support all values of k
35Relationship with Time Parameterized NN
- If q moves towards l, then its nearest restaurant
will change to point a at position q. - The corresponding TP query q returns (i) o,
(ii), a, and (iii) q.
36Algorithm
- Step 1 Find the current NN
- Step 2 Use TP NN queries to tighten the
validity region progressively
37Algorithm
- Step 2 Use TP NN queries to tighten the
validity region progressively
- The algorithm issues totally 2Sinf TP NN queries,
where Sinf is the number of influence points. - This algorithm generalizes to computing k-order
Voronoi Cells for arbitrary values of k (see the
paper for details).
38Extensions to k NN queries
- The above method can be easily applied to k
nearest neighbor queries, where the validity
region is the maximal area around the query,
where each point has the same set of k nearest
neighbors.
39Outline
- Introduction
- General techniques in spatial query Processing
- Motivation
- Background
- Location-based nearest neighbor search
- Location-based window queries
- Experiments
- Summary
- References
40Location-based Window Queries Find All Close
Restaurants
- Some users would consider more restaurants in
their vicinity. - The validity region here is such that, as long as
the user stays in this region, the query result
does not change.
41Location-based Window Queries
- The focus f of the window query q is the centroid
of the query window - The validity region V (q) of a query q is the
maximal area around the query focus (i.e., f ? V
(q)) where the query result R (q) does not change - The points that satisfy q are called inner
objects, and those outside the query window outer
objects
42Location-based Window Queries
- The Minkowski region of each point (e.g., a) is a
rectangle (ra) identical to the query window
whose centroid lies on the corresponding point
(a) - If query focus moves inside ra, the query result
always contains object a. - The intersection of the inner Minkowski regions
corresponds to inner validity region.
43The Validity Region of Window Queries
- If the user location is at the boundary of the
validity region, the corresponding query windows
boundary will cross some data point.
44The Validity Region of Window Queries
- If the user location is at the boundary of the
validity region, the corresponding query windows
boundary will cross some data point.
45The Validity Region of Window Queries
- If the user location is at the boundary of the
validity region, the corresponding query windows
boundary will cross some data point.
46The Validity Region of Window Queries
- If the user location is at the boundary of the
validity region, the corresponding query windows
boundary will cross some data point.
47The Influence Points
- In addition to the query result a,b,c, the user
is also aware of 2 inner influence points a,b
and 2 outer influence points d,e. - The original result is invalidated if the query
window - Does not cover any inner influence point.
- Covers any outer influence point
- The user does not need to store the actual
boundary of the validity region).
48Retrieving the Influence Points
- First get the query result a,b,c (a traditional
window query). - Then the influence points.
- Using Time Parameterized Window Queries (see
paper).
49Outline
- Introduction
- General techniques in spatial query Processing
- Motivation
- Background
- Location-based nearest neighbor search
- Location-based window queries
- Experiments
- Summary
- References
50Experiments
- Datasets
- GR (23K, data space 800km?800km),
- NA (569K, data space 7000km?7000km)
- Disk page size set to 4k bytes
- Index R-tree
- Queries
- LB kNN parameter k
- LB WQ parameter query length
- Each workload consists of 200 queries with the
same parameters distributed uniformly in the data
space.
51Experiments
- The area of validity region drops linearly with
cardinality since the number voronoi cells
increases ( while the area of data space remains
constant). - Under all settings the average number of edges
in a voronoi cell is 6 for uniform datasets which
is equal to number of influence objects.
52Experiment 1 Number of Influence Points for LB
kNN
- The number of influence objects decreases to 4
for kgt10. this is because for kgt1, an influence
object may contribute more than one edge (since
it can form perpendicular bisector with any of
the k nearest neighbors of the query), while the
total number of edges remains around 6.
53Experiment 2 Query cost for LB kNN
- The above figure shows the number of node
accesses as a function of cardinality for k 1 - The number of nodes accesses for TPNN queries is
about 12 times that of the regular nearest
neighbor query because, on average we need 6 TPNN
queries to retrieve the influence objects and
another 6 queries to confirm the vertices of the
validity region.
54Experiment 2 Query cost for LB kNN with a buffer
- As we can see, using an LRU buffer equal to 10
of the R-tree size the actual cost of TPNN
queries reduces significantly, since all the
queries access similar parts of the data space. - Thus, given a relatively small buffer, the
overhead imposed by location-based NN queries is
not significant
55Experiment 3 Number of Influence Points for LB WQ
56Experiment 4 Query cost for LB WQ
57Conclusion
- Location-based queries retrieve the validity
regions for the query results. - We considered kNN and window queries.
- Future work
- Apply the concept of validity region to other
types of queries (e.g., range queries). - Study the incremental computation of the query
result (i.e., what happens after the user exits
the validity region?)
58References
- Song, Z., Roussopoulos, N. K-Nearest Neighbor
Search for Moving Query Point. SSTD 2001 - Tao, Y., Papadias, D. Time Parameterized Queries
in Spatio-Temporal Databases. SIGMOD 2002 - Zheng, B., Lee, D. Semantic Caching in Location
Dependant Query Processing. SSTD 2001 - Roussopoulos, N., Kelly, S., Vincent, F. Nearest
Neighbor Queries. SIGMOD 1995