Reverse Nearest Neighbor Queries for Dynamic Databases - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Reverse Nearest Neighbor Queries for Dynamic Databases

Description:

[KM00] Korn, F. and Muthukrishnan, S., Influence Set Based on Reverse Nearest Neighbor Queries. ... [KMS02] Korn, F., Muthukrishnan, S. and Srivastava, D. ... – PowerPoint PPT presentation

Number of Views:363
Avg rating:3.0/5.0
Slides: 28
Provided by: Sin3
Category:

less

Transcript and Presenter's Notes

Title: Reverse Nearest Neighbor Queries for Dynamic Databases


1
Reverse Nearest Neighbor Queries for Dynamic
Databases
SIGMOD 2000
  • SHOU Yu Tao
  • Jan. 10th, 2003

2
Outline of the Presentation
  • Background
  • Nearest neighbor (NN) search algorithm RKV95
  • Reverse nearest neighbor (RNN) search algorithm
    SAA00
  • Other NN related problems CNN, RNNa, etc.
  • Conclusions
  • References
  • Q A

3
Background
  • RNN(q) returns a set of data points that have
    the query point q as the nearest neighbor.
  • Receives much interests during recent years due
    to its increased importance in advanced database
    applications
  • fixed wireless telephone access application
    load detection problemcount how many users
    are currently using a specific base station q ?
    if qs load is too heavy ? activating an inactive
    base station to lighten the load of that over
    loaded base station

4
Nonsymmetrical property of RNN queries
  • NN(q) p NN(p) q
  • If p is the nearest neighbor of q, then q need
    not be the nearest neighbor of p (in this case
    the nearest neighbor of p is r).
  • those efficient NN algorithms cannot directly
    applied to solve the RNN problems. Algorithms for
    RNN problems are needed.
  • A straight forward solution-- check for each
    point whether it has q as its nearest neighbor
    -- not suitable for large data set!

5
Two versions of RNN problem
  • monochromatic version
  • -- the data points are of two categories, say red
    and blue. The RNN query point q is in one of the
    categories, say blue. So RNN(q) must determine
    the red points which have the query point q as
    the closest blue point.
  • -- e.g. fixed wireless telephone access
    application clients/red (e.g. call initiation
    or termination)
  • servers/blue (e.g. fixed wireless base stations)
  • bichromatic version
  • -- all points are of the same color is the
    monochromatic version.
  • Static vs Dynamic
  • -- whether insertions or deletions of the data
    points are allowed.

6
RNN problem this paper concerns
  • Monochromatic case
  • Dynamic case
  • Whole Algorithm is based on(1). Geometric
    observations ? enable a reduction of the RNN
    problem to the NN problem.
  • (2). NN search algorithm RKV95.
  • Both RNN(q) and NN(q) are sets of points in
    the databases, while query point q may or may not
    correspond to an actual data point in the data
    base.

7
Geometric Observations
Let the space around a query point q be divided
into six equal regions Si (1lines intersecting q. Si therefore is the space
between two space dividing lines. Proposition
1 For a given 2-dimensional dataset, RNN(q)
will return at most six data points. And they are
must be on the same circle centered at q.
L3
L2
s2
s1
s3
q
L1
s4
s6
s5
8
Geometric Observations
  • Proposition 2
  • In each region Si
  • (1). There exists at most two RNN points
  • (2). If there exist exactly two RNN points in a
    region Si, then each point must be on one of the
    space dividing lines through q delimiting Si.
  • Proposition 3
  • In each region Si, let p NN(q) in Si,
    if p is not on a space dividing line, then either
    NN(p) q (and then RNN(q) p) or RNN(q) null.

p
9
Important result from Observations
  • Implications In a region Si, if the number of
    results of NN(q) is(1) one point only If
    NN(q) is not on the space dividing lines either
    the nearest neighbor is also the reverse nearest
    neighbor, or there is no RNN(q) in Si. (2) more
    than one point, (but the NN(q) of each region
    will return at most two points for each
    region) These two points must be on the two
    dividing lines and on the same circle centered at
    q.
  • Allow us to have a criterion for limiting the
    choice of RNN(q) to one or two points in each of
    the six regions Si.
  • The RNN query has been reduced to the NN query

10
Basic NN Search Algorithm
  • This is based on MINDIST metric only
  • return single NN(q) result only

11
Algorithms in RKV95
  • Two metrics introduced effectively directing
    and pruning the NN search
  • MINDIST (optimistic)
  • MINMAXDIST (pessimistic)
  • DFS Search

12
MINDIST(Optimistic)
  • MINDIST(RECT,q) the shortest distance from RECT
    to query point q
  • This provides a lower bound for distance from q
    to objects in RECT
  • MINDIST guarantees that all points in the RECT
    have at least MINDIST distance from the query
    point q.

13
MINMAXDIST(Pessimistic)
  • MBR property Every face (edge in 2D, rectangle
    in 3D, hyper-face in high D) of any MBR contains
    at least one point of some spatial object in the
    DB.
  • MINMAXDIST Calculate the maximum dist to each
    face, and choose the minimal.
  • Upper bound of minimal distance
  • MINMAXDIST guarantees that at least 1 object with
    distance less or equal to MINMAXDIST in the MBR

14
Illustration of MINMAXDIST

(t1,t2)
MINDIST
(q1,q2)
(t1,p2)
MINMAXDIST
y
x
(s1,s2)
(t1,s2)
15
Pruning
  • Downward Pruning during the descending phase
  • MINDIST(q, M) MINMAXDIST(q, M)
  • M can be pruned
  • Distance(q, O) MINMAXDIST(q, M)
  • O can be discarded
  • Upward Pruning when return from the recursion
  • MINDIST(q, M) Distance(q, O)
  • M can be pruned

16
DFS Search on R-Tree
  • Traversal DFS
  • Expanding non-leaf node during the descending
    phase Order all its children by the metrics
    (MINDIST or MINMAXDIST) and sort them into an
    Active Branch List (ABL). Apply downward pruning
    techniques to the ABL to remove unnecessary
    branches.
  • Expanding leaf node Compare objects to the
    nearest neighbor found so far. Replace it if the
    new object is closer.
  • At the return from the recursion phase Using
    upward pruning tech.

17
RNN Algorithm
  • Algorithm Outline for RNN(q) query
  • 1. Construct the space dividing lines so that
    space has been divided into 6 regions based on
    the query point q.
  • 2. (a) Traverse R-tree and find one or two
    points in each region Si that satisfy the nearest
    neighbor condition NN(q). -- this part is also
    called conditional NN queries
  • (b) The candidate points are tested for the
    condition whether their nearest neighbor is q and
    add to answer list if condition is fulfilled.
  • 3. Eliminate duplicates in RNN(q)

18
How to find NN(q) in Si
p2
Si
p1
q
p7
p3
p6
p4
p5
  • Brute-force Algorithm
  • finds all the nearest neighbors until there is
    one in the queried region Si.
  • ?inefficient! (as shown in the figure)

19
How to find NN(q) in Si
  • The only difference between the NN algorithm
    proposed by RKV95 and conditional NN algorithm
    resides only in the metric used to sort and prune
    the list of candidate nodes.

20
New MINMAXDIST definition
Mindist(q, M)
Mindist(q, M)
Minmaxdist(q, M)
Minmaxdist(q, M)
queried region S
queried region S
MINMAXDIST(q, M, Si) distance to furthest
vertex on closest face IN Si MINDIST(q, M, Si)
MINDIST(q, M)
21
New metric definition
Mindist(q, M, Si) Mindist(q, M) Because
mindist(q, M) is valid for all case, since it
provides a definite lower bound on the location
of data points inside an MBR, although a little
bit looser.
22
CNN/NN algorithm difference
  • When expanding non-leaf node during the
    descending phase
  • NN Search
  • Order all its children by the metrics (MINDIST
    or MINMAXDIST) and sort them into an Active
    Branch List (ABL). Apply downward pruning
    techniques to the ABL to remove unnecessary
    branches.
  • CNN Search-- build a set of lists
    branchListinodecard 0
    ? the list whose pointer points to the
    children of that node and overlaps with region
    (i1)
  • i num_section ?the list contains the counter
    (for each child) the total number of sections
    overlaps with this child ? child with higher
    counter is visited first for I/O optimization.

23
Other NN related researches
  • NN and RNN for moving objects BJKS02
  • CNN PTS02
  • RNNA over data streams KMS02

24
Conclusions
  • The RNN algorithm proposed is based on using the
    underling indexing data structure (R-tree), also
    necessary to answer NN queries.
  • By integrating RNN queries in the framework of
    already existing access structures, the approach
    developed in this paper is therefore algorithmic
    and independent of data structures constructed
    especially for a set of such queries.
  • No additional data structures are necessary,
    therefore the space requirement does not
    increase.

25
References
  • RKV95 N. Roussopoulos, S. Kelley, and F.
    Vincent. Nearest neighbor queries. In SIGMOD,
    1995.
  • SAA00 I. Stanoi, D. Agrawal, and A. El Abbadi.
    Reverse nearest neighbor queries for dynamic
    databases. In Proceedings of the ACM SIGMOD
    Workshop on Data Mining and Knowledge Discovery
    (DMKD), 2000.
  • KM00 Korn, F. and Muthukrishnan, S., Influence
    Set Based on Reverse Nearest Neighbor Queries.
    SIGMOD, 2000.
  • BJKS02 Benetis, R., Jensen, C., Karciauskas,
    G., Saltenis, S. Nearest Neighbor and Reverse
    Nearest Neighbor Queries for Moving Objects.
    IDEAS, 2002
  • PTS02 Papadias, Tao, Y. and Shen, D.,
    Continuous Nearest Neighbor Search. VLDB, 2002.
  • KMS02 Korn, F., Muthukrishnan, S. and
    Srivastava, D., Reverse nearest neighbor
    aggregates over data streams. VLDB, 2002.

26
Questions and Answers
  • Any Questions?

?
27
Thank you for attending this presentation!
Write a Comment
User Comments (0)
About PowerShow.com