Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data


1
Probabilistic Threshold Range Aggregate Query
Processing over Uncertain Data
Wenjie Zhang University of New South Wales
NICTA, Australia
Joint work Shuxiang Yang, Ying Zhang, Xuemin Lin
(UNSW NICTA)
2
Outline
  • Background and Preliminaries
  • Probabilistic Threshold Range Aggregate Query
  • Exact query processing
  • Approximate query processing Simple Sampling
    Double Sampling
  • Experiments
  • Conclusion

3
Applications
  • Many applications involve data that is imperfect
    due to
  • data randomness and incompleteness
  • limitation of equipment
  • delay or lose in data transfer
  • Applications
  • Sensor networks
  • Environmental surveillance
  • Moving objects
  • Data cleaning and integration

4
Applications
  • Sensor Networks
  • Sensor readings are often imprecise due to
    equipment limitation and periodical reporting
    mechanism.
  • (figures are borrowed from Jian et al,
    SIGMOD08)

5
Applications
  • Mobile Equipments / Moving Objects
  • A mobile object reports its location
    periodically, the exact location is often
    uncertain.

6
Applications
  • Satellite data

7
Applications
  • Data Quality
  • Social Data Collection Errors and estimation
    inherent in customer surveys and sampling

8
Outline
  • Background and Preliminaries
  • Modeling Uncertainty Related Work
  • Probabilistic Threshold Range Query
  • Conclusion

9
Modeling Uncertainty ( cont. )
  • Uncertain Objects Model
  • Continuous case described using a probability
    density function (PDF) fU such that
    . E.g., uniform distribution, normal
    distribution.

10
Modeling Uncertainty ( cont. )
  • Uncertain Objects Model
  • Discrete case described using a set of
    instances each instance u has an occurrence
    probability pu

11
Possible World Semantics
  • Given a set of uncertain objects U1,U2, ...,
    Un, a possible world W u1,u2, .., un is a
    set of n instances --- one instance per uncertain
    object
  • The probability of a possible worlds is
  • P(W)
  • Let ? be the set of all possible world, clearly,

12
Probabilistic Queries
  • Query Evaluation CKP03, CXPSV04, DS04, DS05,
    DS07, SD07
  • Aggregate Queries BDJR05, MJ07, CG07
  • Join Queries CSP06, AW07
  • Top-k queries SIC07, YLSK08, RDS07, HJZL08
  • Nearest Neighbor Queries KKR07, CCMC08
  • Skyline Queries PJLY07

13
Range query
  • Uncertain objects, exact query
  • Probability threshold is often assigned

14
Related Work
  • Range Queries TCXNKP05, BPS06, AY08
  • Given a rectangle r and a probabilistic
    threshold t , find all objects that appear in r
    with probability at least t.
  • Appearance probability

15
U-tree
Probabilistically Constrained Region ( PCR )
TCXNKP05
PCR (0.2)
Multi PCRs
16
Outline
  • Introduction
  • Modeling Uncertainty Related Work
  • Probabilistic Threshold Range Aggregate Query
    (PTRA)
  • Conclusion

17
Contribution
  • Formally define PTRA query
  • aU-Tree structure for exact PTRA query
  • singleSample and doubleSample techniques for
    approximate answer.

18
Problem Statement
  • Given a set of uncertain objects and query q ,
    return the number of uncertain objects with
    appearance probability no less than threshold pq

19
Problem Definition
  • Assume threshold 0.5, if the appearance
    probability computed for b is gt 0.5 and for c is
    lt 0.5, then the aggregate returned is 2 (a b)

20
Exact Query Processing ( aU-Tree)
  • Main idea add aggregate information on U-tree
  • Advantage stop at intermediate level if pruned
    or fully covered by the query
  • Disadvantage otherwise, still need to drill down
    to the leaf nodes.
  • For a large portion of uncertain objects,
    appearance probability needs to be computed
  • Expensive for a massive number of instances per
    object!

21
Exact Query Processing ( aU-Tree)
22
singleSample
  • Sampling the instances of the uncertain objects.
  • If m out of m sampled instances are inside query
    region, then the approximate appearance
    probability is m/m

23
singleSample ( cont. )
An immediate application of Chernoff-Hoeffding
bound
24
doubleSample
  • Single Sampling is expensive when there is a
    massive number of objects!
  • Sampling the uncertain objects as well.
  • Naive uniform sampling objects from all
    uncertain objects.

25
doubleSample Accuracy
  • Note appearance probability of each object
    follows uniform distribution means spatial
    location is uniformly distributed.
  • Using Chernoff-Hoeffding bound.

26
doubleSample Our Approach
  • Skew!
  • Aim select K disjoint groups covering all
    objects with the minimum skew i.e. objects in
    each group with uniform distribution. (Then do
    uniform sampling of objects in each group.)
  • The optimization problem is NP-hard.
  • Observation
  • Min-skew is a good heuristic to conduct such a
    group.
  • aU-tree groups objects with a similar principle
    to the min-skew.

27
doubleSample Our Approach
  • Step 1 choose K subtrees to cover all objects
    with the total minimum skew. NP-hard!
  • Find a level L such that the number of nodes at
    level L is smaller than K but the number of nodes
    at level L-1 is larger than K.
  • Feed the min-skew algorithm with the subtrees at
    level L.
  • (note if at a level L, the number of nodes
    K, then these K subtrees are chosen.)
  • Step 2 sample objects in each subtree.
  • Step 3. sample instances in each sampled object.

28
Experiments
  • Algorithms
  • exact, singleSample, doubleSample
  • Data set
  • LB 53k objects at long beach country
  • CA 62k objects at California
  • Synthetic aircraft dataset in 3D
  • 10k instances for each points follow Uniform
    or constrained-Gaussian
  • Setting C, P4 2.8GHz , 2G memory, Debian
    linux, Page size 8K

29
Efficiency
30
Accuracy
31
Accuracy ( cont. )
32
Conclusion
  • Definition of PTRA
  • aU-Tree technique
  • Sampling technique
  • Future work. Any approach with theoretic
    guarantee?

33
  • Thanks

34
Min-Skew technique
Write a Comment
User Comments (0)
About PowerShow.com