Indexing MultiDimensional Uncertain Data with Arbitrary Probability Density Functions - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Indexing MultiDimensional Uncertain Data with Arbitrary Probability Density Functions

Description:

Using PCRs in a Conservative Way. for a query q with search region rq and probability pq= 0.25 ... Conservative Functional Boxes (CFB) ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 35
Provided by: Xia103
Category:

less

Transcript and Presenter's Notes

Title: Indexing MultiDimensional Uncertain Data with Arbitrary Probability Density Functions


1
Indexing Multi-Dimensional Uncertain Data with
Arbitrary Probability Density Functions
  • Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kai
    Ngai, Ben Kao, Sunil Prabhakar
  • City University of Hong Kong
  • Hong Kong Polytechnic University
  • University of Hong Kong
  • Purdue University

2
Multi-dimensional Uncertain Data
  • Moving objects
  • An object sends its location to a server whenever
    its distance from the previously reported
    location is larger than certain threshold.
  • Sensor readings
  • Each sensor reports the temperature, humidity, UV
    index, , in its neighborhood periodically.
  • Querying the (uncertain) data stored in the
    server directly is meaningless.

3
Uncertainty Modeling
An objects location is described by a
probability density function.
4
Probabilistic Range Search
Find the clients that are currently in CityU with
at least 50 probability (probabilistic range
query) (probability
threshold)
5
Appearance Probability
E.g., uniform pdf
apperance probability
6
Appearance Probability
must be calculated numerically
Calculation time of an appearance probability in
2D space 1.3ms
Time for a random access 10ms
7
A good solution should
  • Support any pdf.
  • Minimize the number of page accesses.
  • Minimize the number of appearance probability
    calculations.
  • Minimize the total cost (I/O CPU)

8
Main Idea
  • Pre-compute some auxiliary information that can
    be used to
  • efficiently decide whether an object appears in a
    region with at least a certain probability
  • without calculating its actual appearance
    probability.

9
Quick Examples
pq20
10
Probabilistically Constrained Regions (PCR)
11
Probabilistically Constrained Regions (PCR)
For a query q with search region rq and
probability pq 0.2
  • Observation 1.1 (pruning)

an object o can not satisfy q if rq does not
intersect o.pcr(0.2)
12
Probabilistically Constrained Regions (PCR)
( 1 0.2)
For a query q with search region rq and
probability pq 0.8
  • Observation 1.2 (pruning)

an object o can not satisfy q if rq does not
fully contain o.pcr(0.2)
13
Probabilistically Constrained Regions (PCR)
A query q with search region rq and probability
pq 0.2
  • Observation 1.3 (validating)

an object o definitely satisfies q if rq fully
contains the part of o.MBR on the left of l1- (or
on the right of l1 or below l2- or above l2)
14
Probabilistically Constrained Regions (PCR)
A query q with search region rq and probability
pq 0.8
  • Observation 1.4 (for validating)

an object o definitely satisfies q if rq fully
contains the part of o.MBR on the left of l1
(or on the right of l1- or below l2 or above
l2-)
15
Probabilistically Constrained Regions (PCR)
A query q with search region rq and probability
pq 0.6
(1 2 0.2)
  • Observation 1.5 (for validating)

an object o must satisfy q if rq fully contains
the part of o.MBR between l1- and l1 (or between
l2- and l2)
16
Probabilistically Constrained Regions (PCR)
  • o.pcr(0.2) provides 5 heuristics to reduce CPU
    cost
  • In general, for a prob-range query with
    probability threshold pq
  • if pq lt 0.5
  • o may be pruned using o.pcr( pq ) observation
    1.1
  • o may be validated using o.pcr( pq ) observation
    1.3
  • o may be validated using o.pcr( (1 - pq)/2
    ) observation 1.5
  • if pq gt 0.5
  • o may be pruned using o.pcr( 1 - pq
    ) observation 1.2
  • o may be validated using o.pcr( 1 - pq
    ) observation 1.4
  • o may be validated using o.pcr( pq /2
    ) observation 1.5
  • pq in 0, 1 ? infinite number of pq
  • ? infinite number of PCRs
  • Impractical!
  • It is possible to use a finite number of PCRs to
    achieve pruning and validating.

17
Using PCRs in a Conservative Way
E.g., U-catalog 0, 0.1, 0.2, 0.3, 0.4, 0.5
for a query q with search region rq and
probability pq 0.25
  • Observation 1.1

an object o cannot satisfy q if rq does not
intersect o.pcr(0.25)
  • Observation 2.1

an object o cannot satisfy q if rq does not
intersect o.pcr(0.2)
18
Using PCRs in a Conservative Way
U-catalog 0, 0.1, 0.2, 0.3, 0.4, 0.5
for a query q with search region rq and
probability pq 0.75
  • Observation 1.2

an object o cannot satisfy q if rq does not fully
contain o.pcr(0.25)
  • Observation 2.2

an object o cannot satisfy q if rq does not fully
contain o.pcr(0.3)
19
U-catalog Size m
0, 0.5, m 2 0, 0.25, 0.5, m 3 0,
0.1, 0.2, 0.3, 0.4, 0.5, m 6 larger m ?
more PCRs ? greater pruning/validating power
? less CPU cost larger m
? higher space consumption ? larger I/O cost
m 9
20
Conservative Functional Boxes (CFB)
U-catalog 0, 0.1, 0.2, 0.3, 0.4, 0.5 o.pcr
2m values for each dimension o.cfbout 4
values for each dimension o.cfbin 4 values
for each dimension total 8 values m 9 8
18
21
Conservative Functional Boxes (CFB)
for a query q with search region rq and
probability pq 0.25
U-catalog 0, 0.1, 0.2, 0.3, 0.4, 0.5
  • Observation 1.1

an object o cannot satisfy q if rq does not
intersect o.pcr(0.25)
  • Observation 2.1

an object o cannot satisfy q if rq does not
intersect o.pcr(0.2)
  • Observation 3.1

an object o cannot satisfy q if rq does not
intersect o.cfbout(0.2)
22
Conservative Functional Boxes (CFB)
for a query q with search region rq and
probability pq 0.75
U-catalog 0, 0.1, 0.2, 0.3, 0.4, 0.5
  • Observation 1.2

an object o cannot satisfy q if rq does not fully
contain o.pcr(0.25)
  • Observation 2.2

an object o cannot satisfy q if rq does not fully
contain o.pcr(0.3)
  • Observation 3.2

an object o cannot satisfy q if rq does not fully
contain o.cfbin(0.3)
23
Comparing CFBs with PCRs
  • CFBs have weaker pruning/validating power than
    PCRs
  • But CFBs require less space than PCRs

24
Finding Conservative Functional Boxes
  • goal minimize
  • for the i th dimension, minimize
  • with the following constrains
  • Linear Programming Simplex Method

25
More in Our Paper
  • The U-tree
  • a dynamic index designed to accelerate
    prob-range queries.

26
Experimental Results
  • data space 0, 10000d
  • uncertainty region shape circle (sphere)
  • uncertainty region radius 250
  • data set
  • Long Beach County (LB) 53k 2D objects, uniform
    pdf
  • California (CA) 62k 2D objects, Gaussian pdf
  • Aircraft 100k 3D objects, uniform pdf
  • query set 100 queries for each data set with
    various sizes of rq and different pq

27
Experimental Results
28
Experimental Results
Query performance vs. search region size (LB, pq
0.6)
29
Experimental Results
Query performance vs. search region size (CA, pq
0.6)
30
Experimental Results
Query performance vs. search region size on
(Aircraft, pq 0.6)
31
Experimental Results
Query performance vs. probability threshold on
(LB, qs 1500)
32
Experimental Results
Query performance vs. probability threshold on
(CA, qs 1500)
33
Experimental Results
Query performance vs. probability threshold on
(Aircraft, qs 1500)
34
Summary
  • A fast method for answering probabilistic range
    search queries.
Write a Comment
User Comments (0)
About PowerShow.com