Managing Uncertainty in a Database System - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Managing Uncertainty in a Database System

Description:

What is the region that gives max temperature? RF-ID. Find a cab within 2 miles of my location. Reynold Cheng. 3. Uncertainty Management. System Goal ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 64
Provided by: reyn64
Category:

less

Transcript and Presenter's Notes

Title: Managing Uncertainty in a Database System


1
Managing Uncertainty in a Database System
The University of Hong Kong Seminar
  • Dr. Reynold Cheng
  • Department of Computing
  • Hong Kong Polytechnic University
  • Email csckcheng_at_comp.polyu.edu.hk
  • URL http//www.comp.polyu.edu.hk/csckcheng/
  • 31st March, 2008

2
Location and Sensor Applications
What is the region that gives max temperature?
Find a cab within 2 miles of my location.
Service Provider
RF-ID
3
System Goal
  • Provides services with the 3 objectives
  • Correctness
  • Efficiency
  • Scalability

4
Data Uncertainty
  • Due to limited network bandwidth and battery
    power, readings are just sampled
  • The value of the entity being monitored (e.g.,
    temperature, location) is changing
  • The database stores old values only
  • Query results can be incorrect!

5
Answering Minimum Query with Database Readings
Recorded Temperature
15
Current Temperature
10
  • Database answer x
  • Correct answer y

5
0
oC
x
y
6
Bounding Uncertainty with Dead-Reckoning
  • Data values cannot change drastically
  • The system negotiates a bound d with the sensor

v-d,vd
System
(v, d)
sensor
v
  • Trade-off between data uncertainty and update
    frequency

7
Answering MIN Query with Error-Bounded Readings
Recorded Temperature
15
Bound for Current Temperature
10
  • Answer x

5
0
oC
x
y
8
Answering MIN Query with Error-Bounded Readings
Recorded Temperature
probability distribution
15
Bound for Current Temperature
10
  • (x,0.7), (y,0.3)
  • Answers augmented with probabilistic guarantees
  • Measurement error is another source of uncertainty

5
0
oC
x
y
9
Uncertain databases
  • Treat data uncertainty as a first-class citizen
  • Model uncertainty of data attributes, e.g.,
  • closed region probability distribution function
    (pdf)
  • Probabilistic query
  • answers with probabilities
  • imprecise but correct

10
Related Work
  • Barbara, Garcia-Molina and Porter proposed a
    relational data model that incorporates discrete
    pdf in attribute values (Attribute Uncertainty)
    TKDE92
  • Wolfson DPD99 and Pfoser ISSD99 studied
    range queries for imprecise locations of moving
    objects.
  • Probabilistic Queries ICDE03, SIGMOD03, TKDE04,
    VLDB04a
  • Deshpande VLDB04c presented probabilistic
    prediction for sensor values.
  • Uncertainty in biometric databases was studied in
    ICDE06, ICDE07b.
  • In VLDB07, evaluation algorithms for skyline
    queries were presented.

11
Our Contributions
  • Database and Query Semantics
  • Query classification SIGMOD03
  • Quality metrics IS07a, SSDBM08
  • Formal query semantics ICDE08b
  • Query Evaluation and Indexing
  • Range query VLDB04a, VLDB05a, TODS07
  • Nearest-neighbor query ICDE03, TKDE04, ICDE08a
  • Join CIKM06
  • System Implementation
  • The ORION database VLDB05c
  • Uncertainty and location privacy
  • Privacy-aware location services PET06
  • Location-dependent query ICDE07a
  • Uncertain data mining
  • Clustering uncertain data PAKDD06, ICDM06,
    DUNE07

12
Other Uncertainty Models
  • Probabilistic Database each tuple is augmented
    with a probability value (tuple uncertainty)
  • The semi-structured model was studied by Hung,
    Getoor and Subrahmanian in ICDE03.
  • Dalvi Suciu VLDB04b studied efficient query
    operator evaluation with ranked results.
  • Dai Mamoulis SSTD05 studied spatial queries
    over data points with existential uncertainty.
  • VLDB06, ICDE08b combined the studies of
    attribute and tuple uncertainty.
  • A large branch of work deals with fuzzy modeling
    IGP06.

13
Outline
  • Query Classification and Quality
  • Probabilistic Range Queries
  • Location-Dependent Queries
  • Ongoing Projects

14
Data Uncertainty
Attribute (temperature, locations) of object
Ti (GPS, sensor)
  • fi(x) can be arbitrary, e.g., continuous,
    uniform, Gaussian, discrete, histogram
  • Used in various domains, e.g.,
  • location uncertainty DPD99, ISSD99
  • biometric databases ICDE06, ICDE07b

15
Classification of Probabilistic Queries
  • Nature of answer
  • Value-based returns a single value
  • e.g., Average query (l,u, pdf)
  • Entity-based returns a set of objects
  • e.g., Range query ((Ti,pi), pigt0)
  • Dependence
  • Dependent interplay between objects decides
    result e.g., Nearest-Neighbor query
  • Independent whether an object satisfies a query
    is independent of others
  • e.g., Range query

16
Classification of Probabilistic Queries
  • Only probabilistic range query (entity-based
    independent class) is briefly studied
    WS99,ISSD99 before our work.

17
Equality Join
pdf of a
  • In continuous domain, 2 real values are equal at
    a point with zero probability.
  • Resolution c a is equal to b if they are within
    c of each other.

cdf of b
18
Quality of Probabilistic Result
  • Probabilistic queries notion of result "quality"
  • Consider a range query (Is vi in l, u?)
  • regular range query
  • "yes" or "no"
  • probabilistic range query
  • Recently used in SSDBM08 as data cleaning
    metric.

19
Quality for Value- Dependent Queries
  • Query result l,u, p(x) x ? l,u
  • U3,4 less ambiguous than U1,100
  • Differential entropy
  • Measures uncertainty associated with r.v. X with
    pdf p
  • H(X) attains a max value of log2(u-l)

20
The ORION Database
  • Based on an open-source database (PostgreSQL 8.0)
  • Enhances SQL by providing uncertainty management
    functionalities
  • Recently extended to support 2D data and tuple
    uncertainty
  • The ORION project won the Pan-Pearl IT Project
    Competition, China in 2007

21
(No Transcript)
22
Queries in ORION
23
Outline
  • Query Classification and Quality
  • Probabilistic Range Queries
  • Location-Dependent Queries
  • Ongoing Projects

24
ORION Query Evaluation
Recorded Temperature
Uncertainty for Current Temperature
30
20
  • (T1,0.2),(T2,0.8)

10
0
oF
T1
T2
25
Probabilistic Threshold Range Query (PTRQ)
  • Users are likely to be concerned with results
    with a high probability
  • Retrieve sensor ids with readings between 10oF to
    25oF with probability 0.7
  • PTRQ Given interval a,b and T, return Ti
    where Prob(vi ? a,b) T

26
Pruning in a 1D R-Tree SIGMOD84
Minimum Bounding Rectangle (MBR)
  • Many irrelevant objects in the MBR (probability lt
    T) may be processed.
  • Similar problems occur with interval indexes
    (e.g., FOCS96, ADI00).

27
Indexing Uncertain Data
  • Probability Threshold Indexing (PTI)
  • 1D R-tree with uncertainty rectangles
  • Variance-based Clustering (VBC)
  • Cluster uncertain data based on their means and
    variances

28
p-bounds in a PTI Node
left-0.2-bound
right-0.2-bound
? 0.2
0.8
29
p-bounds in a PTI Node
left-0-bound (MBR)
right-0-bound (MBR)
30
Pruning with p-bounds
left-0.2-bound
right-0.2-bound
  • An MBR is not retrieved if there exists a value p
  • T gt p
  • a on the right of right-p-bound
  • An MBR is not retrieved if there exists a value p
  • T gt p
  • b on the left of left-p-bound

31
Implementation of PTI
32
Advantages of PTI
  • Ability to index any form of uncertainty pdf
  • Simple implementation
  • Support different queries, e.g.,
  • Joins CIKM06
  • Location-dependent range queries ICDE07a
  • Facilitate query evaluation over
    multi-dimensional data VLDB05a, TODS07

33
Drawback of PTI
  • Extra overhead in storing p-bounds
  • Small intervals near edges limit gains

right-0.2-bound
left-0.2-bound
34
Variance-based Clustering (VBC)
  • Obtain the mean and variance of each object
  • Construct the PTI by clustering the (mean,
    variance) pairs of uncertain objects
  • For uniform pdf
  • Index (Li,Ri) with a 2D R-tree
  • Convert a,b to a trapezoidal query

35
VBC for uniform pdf
cluster of large intervals
yRi
xy
(Li,Ri)
  • When 2D points are indexed (e.g., by an R-tree),
    intervals of different variances are separated

cluster of smaller intervals
xLi
36
VBC for Uniform pdf
y Ri
xy
Q(T)
b
a
y(1-T)xT a Intervals containing a
x(1-T)yT ? b Intervals containing b
b-a T(y-x) Intervals containing a,b
a ltx lt y lt b Intervals in a,b
a
b
x Li
a
b
1D View (Uniform pdf)
2D View
37
Experimental Setup
38
Scalability of Indexes
  • Both PTI and VBC outperform R-tree
  • Answering PTRQ with R-tree requires more
    computation
  • VBC needs about 50 less I/Os than PTI

39
Query Probability Threshold
  • R-tree does not benefit from the increasing value
    of T
  • When T is 0.5, VBC is 4 times better than PTI

40
Outline
  • Query Classification and Quality
  • Probabilistic Range Queries
  • Location-Dependent Queries
  • Ongoing Projects

41
Location-Dependent Queries
  • Find all vehicles within 2 miles of my current
    location
  • We consider location uncertainty of a user who
    issues the query (called query issuer)
  • measurement error of a GPS device
  • privacy concern PET06
  • Imprecise Location-Dependent Query (ILDQ)

42
What is an ILDQ?
Query issuers actual position
Query issuers actual position
A
Query issuers actual position
Traditional location-dependent query
Query issuers actual position
Evaluate the probability of A for satisfying the
query.
43
Basic Evaluation of ILDQ
A
Uncertainty of Query Issuer U with pdf fU(x,y)
R
44
Pruning by the Minkowski Sum
May be pruned by exploiting probability threshold
A
R ? U
U
C
R
B
  • The Minkowski Sum (R ? U) is evaluated by
    computational geometry techniques BK00
  • Prune objects with spatial structures (e.g.,
    R-Tree)

45
The p-expanded-query
A
p-expanded-query
U
Includes only point objects with probability ? p
R
R ? U
0-expanded-query
46
Pruning uncertain objects
T-expanded-query
U
pA lt T ? A is pruned
R ? U
can be found by p-bounds!
47
A 2D p-bound
p
Uncertainty region
0 ? p ? 0.5
48
Deriving p-expanded-query with p-bounds
top-p-bound
p-expanded-query
U
R ? U
left-p-bound
49
Pruning Strategy 1 Use p-expanded query
T-expanded-query
U
R ? U
50
Pruning Strategy 2 Use an objects p-bound
Uncertain object
right-T-bound
  • The objects p-bounds can be indexed by PTI

51
Pruning Strategy 3 Use both p-bound and
p-expanded query
If x ? y lt T, then A can be pruned.
A
U
R ? U
right-x-bound (x gt T)
52
Experimental Setup
53
1. Effect of Probability Threshold
60
54
2. Effect of Gaussian pdf
55
Outline
  • Query Classification and Quality
  • Probabilistic Range Queries
  • Location-Dependent Queries
  • Ongoing Projects

56
Project 1 Location Cloaking
  • Purpose
  • Study the use of location cloaking for privacy
    protection
  • Investigate the trade-off between location
    cloaking and service quality
  • Grants
  • Privacy Protection in Location-based Services
    with Location Cloaking (RGC CERG. Ref PolyU
    5138/06E). Co-I E. Bertino and S. Prabhakar
    (Purdue), HKD 386,000.
  • Query Processing on Historical Uncertain
    Spatiotemporal Data (Co-I, RGC CERG. Ref
    120206). PI Y. Tao (CUHK), HKD 961,920.
  • Efficient Evaluation of Probabilistic
    Nearest-Neighbor Queries over Uncertain Data.
    Internal Research Grant (ICRG), 2008-09, PolyU.
    Ref G-YG27. HKD 120,000.
  • Efficiency of Privacy Preservation Mechanisms in
    Routing over the Internet. Internal Research
    Grant (ICRG), 2006-07, PolyU. Ref A-PH09. Co-I
    D. Yau (Purdue), HKD 120,000.
  • Protecting Network Privacy with Spatial and
    Temporal Cloaking. Internal Research Grant
    (ICRG), 2007-08, PolyU. Ref A-PH39. Co-I D. Yau
    (Purdue), HKD 120,000.

57
Project 2 Data Stream Management
  • Purpose
  • Study continuous queries, data uncertainty, and
    resource consumption issues in data stream
    systems.
  • Grants
  • Adaptive Filters for Continuous Queries over
    Constantly-Evolving Data Streams (RGC CERG, Ref
    513307, 2008-09). Co-I K. Rothermel (Stuttgart),
    HKD 421,512.
  • Efficient Protocols for Quality-Aware Querying of
    Sensor Data in Pervasive Environments, RGC
    Germany/Hong Kong Joint Research Scheme
    2006/2007. Ref G_HK013/06, Co-I K. Rothermel
    (Stuttgart), HKD 59,600.
  • Affiliated Member of A Research Center for
    Ubiquitous Computing (Central Allocation Group
    Research Projects, RGC, 2006-09, HKBU 1/05C). PI
    Prof. J. Ng (HKBU).
  • Member of the Infrastructure for Information
    Fusion Project (StrucFus), with University of
    Skovde, IIT Bombay, HKBU, University of Wuhan,,
    2008-10. PI J. Mellin (U. Skovde).

58
Our Publications (1)
  • Uncertain Database and Query Models
  • ICDE08b S. Singh, C. Mayfield, R. Shah, S.
    Prabhakar, S. Hambrusch, J. Neville and R. Cheng.
    Database Support for pdf Attributes.
  • IS07a R. Cheng, D. Kalashinkov and S.
    Prabhakar. Evaluation of Probabilistic Queries
    over Imprecise Data in Constantly-Evolving
    Environments. In Information Systems (IS), Vol.
    32, No. 1, pp. 104-130, Mar 2007.
  • SIGMOD03 R. Cheng, D. Kalashinkov and S.
    Prabhakar. Evaluating Probabilistic Queries over
    Uncertain Data. (Cited 149 times)
  • Probabilistic Range Queries
  • ICDE07a J. Chen and R. Cheng. Efficient
    Evaluation of Imprecise Location-Dependent
    Queries. In Proc. ICDE 2007.
  • TODS07 Y. Tao, X. Xiao and R. Cheng. Range
    Queries for Multidimensional Data. In IEEE TODS,
    2007, 32(3)15.
  • VLDB05a Y. Tao, R. Cheng, X. Xiao, W. K. Ngai,
    B. Kao, and S. Prabhakar. Indexing
    multi-dimensional uncertain data with arbitrary
    probability density functions. In VLDB 2005.
  • VLDB04a R. Cheng, Y. Xia, S. Prabhakar, R.
    Shah, and J. S. Vitter. Efficient indexing
    methods for probabilistic threshold queries over
    uncertain data. In VLDB 2004. (Cited 46 times)

59
Our Publications (2)
  • Probabilistic Nearest-Neighbor Queries
  • ICDE08a R. Cheng, J. Chen, M. Mokbel and C.
    Chow. Probabilistic Verifiers Evaluating
    Constrained Nearest-Neighbor Queries over
    Uncertain Data.
  • TKDE04 R. Cheng, D. V. Kalashnikov, and S.
    Prabhakar. Querying imprecise data in moving
    object environments. IEEE TKDE, 16(9),2004.
    (Cited 69 times)
  • ICDE03 R. Cheng, D. Kalashinkov and S.
    Prabhakar. Querying imprecise data in moving
    object environments.
  • Probabilistic Joins
  • CIKM06 R. Cheng, S. Singh, S. Prabhakar, R.
    Shah, J. Vitter and Y. Xia. Efficient Join
    Processing over Uncertain Data. In ACM 15th Conf.
    on Information and Knowledge Management (CIKM
    2006), Arlington, USA 2006.
  • Uncertain Data Mining
  • DUNE07 S. Lee, B. Kao and R. Cheng. Reducing
    UK-means to K-means. In the 1st Workshop on Data
    Mining of Uncertain Data (DUNE), co-located with
    IEEE ICDM, Ohama, US, Oct 2007.
  • ICDM06 J. Ngai, B. Kao, C. Chui, R. Cheng, M.
    Chau and K. Yip. Efficient Clustering of
    Uncertain Data. In IEEE Intl. Conf. on Data
    Mining (IEEE ICDM 2006), Hong Kong, Dec, 2006.
  • PAKDD06 M. Chau, R. Cheng, B. Kao and J. Ng.
    Uncertain Data Mining An Example in Clustering
    Location Data. In the Methodologies for Knowledge
    Discovery and Data Mining, Pacific-Asia
    Conference (PAKDD 2006), Singapore, April 2006.
  • WSA05 M. Chau, R. Cheng and B. Kao. Uncertain
    Data Mining A New Research Direction. Invited
    Paper, in the Workshop on the Sciences of The
    Artificial (WSA) 2005, National Dong Hwa
    University, Taiwan, Dec 2005.

60
Our Publications (3)
  • Data Stream Management
  • SSDBM08 J. Chen and R. Cheng. Quality-Aware
    Probing of Uncertain Data with Resource
    Constraints. Accepted in SSDBM 2008, July, 2008.
  • IS07b R. Cheng, K.Y. Lam, S. Prabhakar and B.
    Liang. An Efficient Location Update Mechanism for
    Continuous Queries over Moving Objects. In
    Information Systems (IS), Vol. 32, No. 4, pp.
    593-620, Jun 2007.
  • IDEAS07 T. Farrell, R. Cheng. and K. Rothermel.
    Energy-Efficient Monitoring of Mobile Objects
    with Uncertainty-Aware Tolerances.Accepted in
    Intl. Database Engineering Applications
    Sympoisum (IDEAS 2007), Banff, 2007.
  • RTS07 S. Han, E. Chan, R. Cheng and K. Y. Lam.
    A Statistics-Based Sensor Selection Scheme for
    Continuous Probabilistic Queries in Sensor
    Network. In Real Time Systems Journal (RTS), Vol
    . 35, No. 1, pp. 33-58, Jan 2007.
  • VLDB05b R. Cheng, B. Kao, S. Prabhakar, A. Kwan
    and Y. Tu. Adaptive Stream Filters for
    Entity-based Queries with Non-Value Tolerance. In
    Very Large Databases Conf. (VLDB 2005),
    Trondheim, Norway, Aug 2005. Acceptance rate
    16.5, 53/322.
  • ICDE05 R. Cheng, Y. Xia, S. Prabhakar and R.
    Shah.  Change Tolerant Indexing over Constantly
    Evolving Data. In Intl. Conf. on Data Engineering
    (IEEE ICDE 2005), Tokyo, Japan, Apr 2005.
  • Privacy-Aware System Support
  • ICNP07 R. Cheng, D. Yau and J. Fu. Packet
    Cloaking Protecting Receiver Privacy Against
    Traffic Analysis. In the 3rd Workshop on Secure
    Network Protocols (NPSec), co-located with IEEE
    ICNP, Beijing, China, Oct 2007.
  • PET06 R. Cheng, Y. Zhang, E. Bertino, and S.
    Prabhakar. Preserving user location privacy in
    mobile data management infrastructures. In Proc.
    6th Workshop on Privacy Enhancing Technologies,
    2006.

61
References (1)
  • FOCS96 L. Arge and J. S. Vitter. On dynamic
    interval management in external memory (extended
    abstract). In FOCS, p. 560-569, 1996.
  • TKDE92 D. Barbara, H. Garcia-Molina and D.
    Porter. The management of probabilistic data.
    IEEE TKDE, 4(5)487-502, 1992.
  • BK00 M. Berg, M. Kreveld, M. Overmars and O.
    Schwarzkopf. Computational Geometry Algorithms
    and Applications. 2nd ed., Springer Verlag
    (2000).
  • ICDE06 C. Bohm, A. Pryakhin, and M. Schubert.
    The gauss-tree Efficient object identification
    in databases of probabilistic feature vectors. In
    Proc. ICDE, 2006.
  • SSTD05 X. Dai, M. L. Yiu, N. Mamoulis, Y. Tao,
    and M. Vaitis. Probabilistic Spatial Queries on
    Existentially Uncertain Data. Proc. SSTD, pp.
    400-417, August 2005.
  • VLDB04b N. Dalvi and D. Suciu. Efficient Query
    Evaluation on Probabilistic Databases. VLDB 2004.
  • VLDB04c A. Deshpande, C. Guestrin, S. Madden,
    J. Hellerstein and W. Hong. Model-Driven Data
    Acquisition in Sensor Networks. In VLDB, 2004.
  • IDG06 J. Galindo, A. Urrutia and M. Piattini.
    Fuzzy Databases Modeling, Design, and
    Implementation. Ideas Group Publishing, 2006.
  • SIGMOD84 A. Guttman. R-trees A dynamic index
    structure for spatial searching. Proc. of the ACM
    SIGMOD Intl. Conf., 1984.

62
References (2)
  • ICDE03 E. Hung, L. Getoor and V. S.
    Subrahmanian. PXML A Probabilistic
    Semistructured Data Model and Algebra. In ICDE
    2003.
  • VLDB06 O. Mar, A. Sarma, A. Halevy, and J.
    Widom. ULDBs databases with uncertainty and
    lineage. In VLDB, 2006.
  • ICDE07b V. Ljosa and A. K. Singh. APLA
    Indexing arbitrary probability distributions. In
    Proc. ICDE, 2007.
  • ADI00 Y. Manolopoulos, Y. Theodoridis, and V.
    J. Tsotras. Chapter 4 Access methods for
    intervals. In Advanced Database Indexing, Kluwer,
    2000.
  • VLDB07 J. Pei, B. Jiang, X. Lin, and Y. Yuan.
    Probabilistic skylines on uncertain data. In
    Proc. VLDB, 2007.
  • DPD99 O. Wolfson, P. Sistla, S. Chamberlain,
    and Y. Yesha. Updating and querying databases
    that track mobile units. Distributed and Parallel
    Databases, 7(3), 1999.
  • ISSD99 D. Pfoser and C. S. Jensen. Capturing
    the Uncertainty of Moving-Object Representations,
    in Proc. of the Sixth International Symposium on
    Spatio Databases, Hong Kong, July 20-23, 1999,
    pp. 111-132.

63
Conclusions
  • We study the provision of correct, efficient, and
    scalable data access
  • We consider how uncertainty can be treated as a
    first-class citizen in DBMS
  • Other challenges include
  • Handling other uncertainty models
  • Probabilistic data streams
  • Location cloaking
  • Mining uncertain data

64
Thank You!
  • Reynold Cheng
  • Email csckcheng_at_comp.polyu.edu.hk
  • URL http//www.comp.polyu.edu.hk
  • ORION homepage
  • http//orion.cs.purdue.edu

65
How to define uncertainty pdf?
  • The form of uncertainty pdf depends on the
    application e.g., Gaussian distribution models
    measurement error.
  • If no information about pdf is known, a simple
    way is to assume uniform pdf a pessimistic
    estimation
  • Can also use more sophisticated techniques, based
    on time-series analysis on past data for pdf
    derivation CH89
  • CH89 C. Chatfield. The analysis of time series
    an introduction. Chapman and Hall, 1989.

66
Classical Decomposition
  • For a discrete time series, let Xt be a random
    variable at time t
  • Xt mt st Yt
  • mt trend, a slowly-moving function
  • moving-average filter, exponential smoothing,
    curve fitting/regression
  • st seasonal component periodic function
  • Yt noise component
  • Example mt2t1,stsin(t),YtN(0,1)
  • pdf(100) N(201sin(100),1)

67
Sensor Databases
Goal data retrieval in a correct, efficient and
scalable manner
68
Other Works on Attribute Uncertainty
  • Deshpande VLDB04c presented probabilistic
    prediction for sensor values.
  • Uncertainty in biometric databases was studied in
    ICDE06, ICDE07b.
  • In VLDB07, evaluation algorithms for skyline
    queries are presented.

69
Join over Uncertainty
  • How do we define comparison operators for
    uncertain data?

70
Join Operators
  • Comparison (,?, gt,lt) between two uncertain items
    is probabilistic.

Equality ()
Table 1
Table 2
71
PTI Pruning for Joins CIKM06
  • Goal Prune pages R and S without examining
    individual items
  • Solution Place p-bounds on R and S, and perform
    4 tests with p-bounds

72
Solving PTRQ with Interval Indexes
  • Use R-tree or interval index FOCS96, JCSS96,
    ADI00 to find intervals intersecting a,b
  • For each object retrieved, evaluate its
    probability of being within a,b
  • Return objects with probability T

73
Drawback of PTI
  • Extra overhead in storing x-bounds
  • Small intervals near edges limit gains

right-0.2-bound
left-0.2-bound
74
Clustering 2D points
cluster of large intervals
yRi
  • When 2D points are clustered, intervals of
    different variances are separated
  • Points clustered based on means and variances
    (variance-based clustering)

xy
(Li,Ri)
cluster of smaller intervals
xLi
75
Answering PTRQ with 2D R-Tree
  • Construct a 2D R-tree over uncertain data by
    indexing (meani,variancei)
  • Query the 2D R-Tree
  • For uniform pdf, a PTRQ can be converted to a
    2D-range query

76
Querying Uniform pdf
y Ri
Li
Ri
xy
Q (p 0.75)
b
a
y(1-p)xp a Intervals containing a
a ltx lt y lt b Intervals in a,b
x(1-p)yp ? b Intervals containing b
b-a p(y-x) Intervals containing a,b
a
b
x Li
a
b
1D View (Uniform pdf)
2D View
77
Experimental Setup
  • 100K uncertain items, with interval size
    uniformly distributed in 0,10000,
  • Assume uniform uncertainty pdf
  • 10K PTRQs with query lengths normally distributed
    and T ? 0.1,1
  • Each PTI node contains 5 p-bounds, where p
    ?0.1,0.3,0.5,0.7,0.9
  • No. of entries per disk page is 20

78
Indexing p-bounds with Probability Threshold
Index VLDB04a, VLDB05a
79
The p-bound VLDB04a, VLDB05a
p
p
p
Uncertainty region
0 ? p ? 0.5
p
80
Query-Data Duality and IUQ
Given a query issuers uncertainty U and
uncertain object A
g
g
where
where
81
Exploiting the Probability Threshold
  • T?(0,1 Probability Threshold
  • Returns objects whose probabilities for
    satisfying a ILDQ are ? T

82
Future Work
  • Uncertainty Management
  • Efficient Evaluation of probabilistic
    nearest-neighbor queries
  • Uncertainty management in sensor networks
  • Multi-dimensional extension of the ORION database
  • Data Stream Management
  • Energy-efficient tolerant queries
  • Uncertain data cleaning
  • Problems related to cleaning of uncertain
    databases to achieve better quality under limited
    budgets

83
Project 1 Privacy-Aware Location-based Services
  • Purpose
  • Study the trade-off between uncertainty and
    privacy, and design services with large-scale
    database indexing support
  • Grant
  • Privacy Protection in Location-based Services
    with Location Cloaking (RGC CERG. Ref PolyU
    5138/06E). Co-I E. Bertino and S. Prabhakar, HKD
    386,000.
  • Efficient Evaluation of Probabilistic
    Nearest-Neighbor Queries over Uncertain Data.
    Internal Research Grant (ICRG), 2008-09, PolyU.
    Ref G-YG27. HKD 120,000.
  • Query Processing on Historical Uncertain
    Spatiotemporal Data (Co-I, RGC CERG. Ref
    120206). PI Y. Tao
  • Affiliated Member of A Research Center for
    Ubiquitous Computing (Central Allocation Group
    Research Projects, RGC, 2006-09, HKBU 1/05C). PI
    Prof. J. Ng.
  • Publications
  • ICDE08a R. Cheng, J. Chen, M. Mokbel and C.
    Chow. Probabilistic Verifiers Evaluating
    Constrained Nearest-Neighbor Queries over
    Uncertain Data.
  • ICDE07a J. Chen and R. Cheng. Efficient
    Evaluation of Imprecise Location-Dependent
    Queries. In Proc. ICDE 2007.
  • TODS07 Y. Tao, X. Xiao and R. Cheng. Range
    Queries for Multidimensional Data. In IEEE TODS,
    2007, 32(3)15.
  • PET06 R. Cheng, Y. Zhang, E. Bertino, and S.
    Prabhakar. Preserving user location privacy in
    mobile data management infrastructures. In Proc.
    6th Workshop on Privacy Enhancing Technologies,
    2006.

84
Project 2 Quality and Resource Consumption of
Data Streams
  • Purpose
  • Study the trade-off between query result quality
    and resource consumption (e.g., battery power and
    network bandwidth) in sensor environments.
  • Grant
  • Adaptive Filters for Continuous Queries over
    Constantly-Evolving Data Streams (RGC CERG, Ref
    513307, 2008-09). Co-I K. Rothermel, HKD
    421,512.
  • Efficient Protocols for Quality-Aware Querying of
    Sensor Data in Pervasive Environments, RGC
    Germany/Hong Kong Joint Research Scheme
    2006/2007. Ref G_HK013/06, Co-I K. Rothermel,
    HKD 59,600.
  • Publications
  • SSDBM08 J. Chen and R. Cheng. Quality-Aware
    Probing of Uncertain Data with Resource
    Constraints. Accepted in SSDBM 2008, July, 2008.
  • IS07b R. Cheng, K.Y. Lam, S. Prabhakar and B.
    Liang. An Efficient Location Update Mechanism for
    Continuous Queries over Moving Objects. In
    Information Systems (IS), Vol. 32, No. 4, pp.
    593-620, Jun 2007.
  • IDEAS07 T. Farrell, R. Cheng. and K. Rothermel.
    Energy-Efficient Monitoring of Mobile Objects
    with Uncertainty-Aware Tolerances.Accepted in
    Intl. Database Engineering Applications
    Sympoisum (IDEAS 2007), Banff, 2007.
  • RTS07 S. Han, E. Chan, R. Cheng and K. Y. Lam.
    A Statistics-Based Sensor Selection Scheme for
    Continuous Probabilistic Queries in Sensor
    Network. In Real Time Systems Journal (RTS), Vol
    . 35, No. 1, pp. 33-58, Jan 2007.
  • VLDB05b R. Cheng, B. Kao, S. Prabhakar, A. Kwan
    and Y. Tu. Adaptive Stream Filters for
    Entity-based Queries with Non-Value Tolerance. In
    Very Large Databases Conf. (VLDB 2005),
    Trondheim, Norway, Aug 2005. Acceptance rate
    16.5, 53/322.
  • ICDE05 R. Cheng, Y. Xia, S. Prabhakar and R.
    Shah.  Change Tolerant Indexing over Constantly
    Evolving Data. In Intl. Conf. on Data Engineering
    (IEEE ICDE 2005), Tokyo, Japan, Apr 2005.
    Acceptance rate 12.9, 67/521.

85
Project 3 Routing Privacy
  • Purpose
  • Investigate and develop privacy-aware algorithms
    that optimize routing of packets across the
    Internet.
  • Grant
  • Efficiency of Privacy Preservation Mechanisms in
    Routing over the Internet. Internal Research
    Grant (ICRG), 2006-07, PolyU. Ref A-PH09. Co-I
    D. Yau, HKD 120,000.
  • Protecting Network Privacy with Spatial and
    Temporal Cloaking. Internal Research Grant
    (ICRG), 2007-08, PolyU. Ref A-PH39. Co-I D.
    Yau, HKD 120,000.
  • Publication
  • R. Cheng, D. Yau and J. Fu. Packet Cloaking
    Protecting Receiver Privacy Against Traffic
    Analysis. In the 3rd Workshop on Secure Network
    Protocols (NPSec), co-located with IEEE ICNP,
    Beijing, China, Oct 2007.

86
Architecture of ORION
87
Effect of Query Users Uncertainty Region Size
95
88
Research Outcome Impact (Google Scholar)
  • 1. Evaluating probabilistic queries over
    imprecise data. R Cheng, DV Kalashnikov, S
    Prabhakar - Proceedings of the 2003 ACM SIGMOD
    international conference, 2003. Cited by 149.
  • 2. Querying imprecise data in moving object
    environments. R Cheng, DV Kalashnikov, S
    Prabhakar - Knowledge and Data Engineering, IEEE
    Transactions on, 2004. Cited by 69.
  • 3. Efficient indexing methods for probabilistic
    threshold queries over uncertain data. R Cheng, Y
    Xia, S Prabhakar, R Shah, JS Vitter - Proc. VLDB,
    2004. Cited by 46.
  • 4. Indexing multi-dimensional uncertain data with
    arbitrary probability density functions. Y Tao, R
    Cheng, X Xiao, WK Ngai, B Kao, S. Prabhakar -
    Proc. VLDB 2005. Cited by 29.
  • 5. Managing uncertainty in sensor database. R
    Cheng, S Prabhakar - ACM SIGMOD Record, 2003.
    Cited by 21.
  • 7. Preserving user location privacy in mobile
    data management infrastructures. R Cheng, Y
    Zhang, E Bertino, S Prabhakar - 6th workshop on
    privacy enhancing technologies, 2006 Springer.
    Cited by 21.
  • 6. Maintaining temporal consistency of discrete
    objects in soft real-time database systems. B
    Kao, KY Lam, B Adelberg, R Cheng, T Lee. IEEE
    Transactions on Computers, 2003. Cited by 17.
  • 8. Adaptive stream filters for entity-based
    queries with non-value tolerance. R Cheng, B Kao,
    S Prabhakar, A Kwan, Y Tu Proc. VLDB, 2005.
    Cited by 16.
  • 9. Evaluation of concurrency control strategies
    for mixed soft real-time database systems. KY
    Lam, TW Kuo, B Kao, TSH Lee, R Cheng. Information
    Systems, 2002. Cited by 15.
  • 10. U-DBMS a database system for managing
    constantly-evolving data. R Cheng, S Singh, S
    Prabhakar Proc. VLDB, 2005. Cited by 14.

89
Research Outcome Impact (SCI Index)
  • 1. Title Querying imprecise data in moving
    object environments
  • Author(s) Cheng, R Kalashnikov, DV Prabhakar,
    S
  • Source IEEE TRANSACTIONS ON KNOWLEDGE AND DATA
    ENGINEERING Volume 16 Issue 9 Pages 1112-1127
    Published 2004
  • Times Cited 5
  • 2. Title Maintaining temporal consistency of
    discrete objects in soft real-time database
    systems
  • Author(s) Kao, B Lam, KY Adelberg, B, et al.
  • Source IEEE TRANSACTIONS ON COMPUTERS Volume
    52 Issue 3 Pages 373-389 Published MAR
    2003
  • Times Cited 2
  • 3. Title Managing uncertainty in sensor
    databases
  • Author(s) Cheng, R Prabhakar, S
  • Source SIGMOD RECORD Volume 32 Issue 4
    Pages 41-46 Published DEC 2003
  • Times Cited 1
  • 4. Title Evaluation of concurrency control
    strategies for mixed soft real-time database
    systems
  • Author(s) Lam, KY Kuo, TW Kao, B, et al.
  • Source INFORMATION SYSTEMS Volume 27
    Issue 2 Pages 123-149 Published APR 2002
  • Times Cited 1
Write a Comment
User Comments (0)
About PowerShow.com