Efficient Searching with Linear Constraints - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Efficient Searching with Linear Constraints

Description:

What is the problem. How to preprocess a set S of points in into an external memory ... If i lies below l, report all the points in Si by traversing Ti of v ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 28
Provided by: ven79
Category:

less

Transcript and Presenter's Notes

Title: Efficient Searching with Linear Constraints


1
Efficient Searching with Linear Constraints
  • Agarwal, P. K. Arge, L. Erickson, J. Franciosa,
    P. G. Vitter, J. S.
  • PODS 1998
  • Presented by WEN, Qi
  • Apr. 25th 2008

2
What is the problem
  • How to preprocess a set S of points in into
    an external memory data structure that
    efficiently supports linear-constraint queries
  • Each query is in the form
  • Goal
  • Minimize the number of disk blocks required to
    store the data structure
  • Minimize the number of disk accesses (I/Os)
    required to answer a query

3
What is the problem (cont.)
  • Result
  • For d 2
  • Present the first near-linear size data structure
    that can answer linear-constraint queries using
    and optimal number of I/Os
  • Present a linear-size data structure that can
    answer queries efficiently in the worst case
  • Combine these two approaches to obtain tradeoffs
    between space and query time
  • Extend some of the techniques to higher dimensions

4
Agenda
  • Introduction
  • Geometric Preliminaries
  • Query-Optimal Data Structure in 2D
  • A Linear-Size Data Structure in 2D
  • Trading Space for Query Time in 2D
  • Conclusion

5
Introduction
6
Problem Statement
Preprocess a set S of N points in into a data
structure so that all points satisfying a query
constraint can be reported
efficiently.
Main interests minimizing the number of disk
blocks used to store the points and the number of
disk accesses needed to answer a halfspace range
query In standard external memory model Assume
each disk access transmits in a single
input/output operation (or I/O) a contiguous
block of B units of data
7
Previous results
8
Our Result
9
Geometric Preliminaries
10
Duality
The dual of a point is the
hyperplane The dual of a hyperplane Is the
point
Lemma 1 A point p is above (resp., below, on) a
hyperplane h if the dual hyperplane p is above
(resp., below, on) the dual point h
11
Arrangements
Let H be a set of N hyperplanes in . The
arrangement of H, denoted as A(H), is the
decompostion of into cells of dimensions
k, for , each cell being a
maximal connected set of points contained in the
intersection of a fixed subset of H and not
intersecting any other hyperplane of H.
12
Levels
The level of a point with respect to
H is the number of hyperplanes of H that lie
(strictly) below p. All the points in a single
cell of arrangement A(H) lie above the same
subset of hyperplanes of H, so we can define the
level of a cell of A(H) to be the level of any
point in that cell. For any ,
the k-level of A(H), denoted as Ak(H), is the
closure of all the (d-1)-dimensional cells whose
level is k
Lemma 3 The optimal number of I/Os required to
compute the level is O(v(log2n)logBn), where v is
the complexity of the level.
13
Query-Optimal Data Structure in 2D
14
Additional Geometric Concept
V (v0, v1, v2, , vu) a subsequence of
vertices of Ak(L), sorted from left to right,
where x(v0)-8, x(vu)8 Clustering GC1, C2, ,
Cu defined by V Ci is the cluster induced by
vi-1, vi. v0, v1, , vu is the boundary point
of G. b-Clustering G every cluster contains b or
fewer lines. The size of a clusteringG the
number of clusters u.
15
Proof of Lemma
Lemma 4 Let L be a set of N lines in the plane.
For any 1kltN, there exists a 3k-clustering G of
Ak(L) of size at most N/k. It can be computed
using O(Vk(log2n)logBn) I/Os, where Vk is the
complexity of Ak(L). ()
Vc
  • Construction greedy algorithm 3k-clustering.
  • For Ci - cm, done
  • - otherwise, check line l appears immediately
    after Vc
  • - l in Ci, nothing to do
  • - l not in Ci, Ci already has 3k lines, set Vc
    as the boundary point. Start Ci1. Add l to
    Ci1
  • - l not in Ci, Ci has less than 3k lines, add l
    to Ci. Goto next point
  • Proof of size N/k
  • at least k lines in each cluster Ci that do not
    belong to any other cluster Cj with jgti.

16
The layered structure
Set ß BlogBn, m log2(N/ß), For all 0im-1,
choose random ß2i ?iß2i1-1, ?mN, compute
A?i(L). The plane is partitioned by A?0 (L), ,
A?m (L) into series of layers of exponentially
increasing size.
Construction Compute 3?i-clustering Gi of A?i (L)
of size at most N/?i (Note A?i (L)s complexity
can be O(N)) according to lemma 24(p12p15),Gi
use O(N(log2n)logBn) I/Os. Each cluster in Gi
use 3?i/B blocks and x-coordinates of boundary
points stored in Btree use O(N/(?iB)) blocks,
soGi use O(n) blocks. Thus the total space used
on all layers is O(nlog2n), the number of I/Os to
construct is O(N(log2n)2logBn)
17
The layered structure (cont.)
  • Query
  • let p be a query point. To report all the lines
    below p, we visit the layers in order i0,1,2,..,
    until find a cluster contains all the lines of L
    that lie below p
  • (Note if the number of lines below p for Gi lt
    ?i, halt and report)
  • For each Gi, finding the relevant Ci of p -
    O(logBn)
  • Thus ith layer use O(logBn?i/B)O(2ilogBn) I/Os
  • Suppose we visit from layer 0 to u, the total
    number of I/Os is O(2ulogBn).
  • - u0, O(logBn).
  • ugt0, the query point must lie above at least
    ?u-1 lines. The output size T ?u-1 2u-1BlogBn,
    so the I/Os in this case is O(T/B)O(t)
  • Thus the query can take O(logBnt) I/Os

Theorem 1 Let S be a set of N points in the
plane. We can store S in a data structure that
use O(nlog2n) blocks so that a halfspace range
query can be answered in O(logBnt) I/Os. The
expected number of I/Os used to construct the
data structure is O(N(log2n)2logBn).
.p
18
The binary tree structure
Construction If the number of lines in L is less
than ßBlogBn, simply store them Otherwise,
choose ß ?2ß, - construct 3?-clustering Gof
A?(L), build a B tree on the boundary points
of G - partition L into 2 equal-size subsets,
construct the data structure recursively. The
overall structure is a balanced binary tree with
depth log2n. Any node at depth i is associated
with a subset of N/2i lines. O(n/2i) blocks to
store the clustering in a node at depth i in the
tree. -gt O(n) blocks to store the clustering in
all nodes at depth i in the tree. -gt The overall
disk space used is O(nlog2n)
19
The binary tree structure (cont.)
  • Query
  • Note the partition LL1UL2 is balanced
    (obtained it in construction)
  • That is, for any point , the sets
    LpnL1 and LpnL2 each contain at least ?/4
    lines.
  • This condition is necessary to guarantee optimal
    query time.
  • Perform a B search to find the cluster
    that is relevant for p, using O(logBn) I/Os.
  • Count the lines in C that lie below p, using
    O(?/B) O(logBn) I/Os.
  • If the number of lines below p is less than ?,
    report and halt otherwise recursively query both
    L1 and L2.
  • Since the partition is balance, it must lie above
    at least ?/4gtß/4 lines in both L1 and L2, that
    means whenever reaches a node v, except v is
    root, there are at least ß/4 lines below p.
    Follow the query algorithm, it visits at most
    O(T/ ß) nodes besides the root.
  • Thus the total number of I/Os is O((T/
    ß1)logBn)O(logBn1).

20
A Linear-Size Data Structure in 2D
21
Balanced simplicial partition
S a set of N points in R2 simplicial partition
of S a set of pairs r is the size of ?. Note a
point of S may lie in many triangle, but in one
Si a simplicial partition is balanced each
subset Si contains between N/r and 2N/r points.
22
Linear-size data structure
Partition tree T Each node v in T is associated
with a subset Sv and ?v. (For root u, SuS,
?vR2) NvSv and nvNv/B
Construction Given a node v, we construct the
subtree rooted at v. If nvlt1, v is a leaf and
store all points in Sv in a block. Otherwise, v
is a internal node with degree of rvmincB,
2nv cgt1 is a constant. Compute and
recursively construct Ti for each Si. For each
node i, store the vertices of ?i and a pointer to
Ti. Thus we need O(c)O(1) blocks for each node
v. And total number of nodes in the tree is
O(n). So the total size of partition tree is
O(n).
23
Linear-size data structure(cont.)
  • Query
  • To find all points below a query line l, visit T
    from top. Suppose at node v.
  • If v is a leaf, report all points of Sv that lie
    below l
  • Otherwise test each ?i of ?v
  • If ?i lies above l, ignore
  • If ?i lies below l, report all the points in Si
    by traversing Ti of v
  • (Note each point is reported only once.)
  • Sallnvltt, so in this case, need O(t) I/Os
  • If l crosses ?i, recursively visit the ith child
    of v
  • suppose u is the number of nodes v
    visited by the query procedure for which ?v
    crossing l, that is, recursive calls for v. The
    I/Os are O(u)
  • when cc(e) is very large, we can prove
  • Thus the data structure uses O(n) blocks and
    O(n1/2et) I/Os

24
Trading Space for Query Time in 2D
25
Trading space for query time in 2D
  • Idea
  • Use the same recursive procedure to construct a
    partition tree, but stop the recursion when NvBa
    for some constant agt1.In that case preprocess Sv
    into the data structure using Theorem 1.
  • The query is answered as the partition tree
    except at the leaf. At the leaf of the tree we
    use the query procedure as described in Theorem
    1.
  • Result
  • Given a set S of N points in the plane and
    constants egt0 andagt1, we can preprocess S into a
    data structure of size O(nlog2B) blocks so that a
    halfspace range query can be answered using
    O((n/Ba)1/2et) I/Os.

26
Conclusion
27
Conclusion
Some of the techniques can be extended to higher
dimensions.
Write a Comment
User Comments (0)
About PowerShow.com