Title: Managing Uncertainty in a Database System
1Managing Uncertainty in a Database System
The University of Hong Kong Seminar
- Dr. Reynold Cheng
- Department of Computing
- Hong Kong Polytechnic University
- Email csckcheng_at_comp.polyu.edu.hk
- URL http//www.comp.polyu.edu.hk/csckcheng/
- 31st March, 2008
2Location and Sensor Applications
What is the region that gives max temperature?
Find a cab within 2 miles of my location.
Service Provider
RF-ID
3System Goal
- Provides services with the 3 objectives
- Correctness
- Efficiency
- Scalability
4Data Uncertainty
- Due to limited network bandwidth and battery
power, readings are just sampled - The value of the entity being monitored (e.g.,
temperature, location) is changing - The database stores old values only
- Query results can be incorrect!
5Answering Minimum Query with Database Readings
Recorded Temperature
15
Current Temperature
10
- Database answer x
- Correct answer y
5
0
oC
x
y
6Bounding Uncertainty with Dead-Reckoning
- Data values cannot change drastically
- The system negotiates a bound d with the sensor
v-d,vd
System
(v, d)
sensor
v
- Trade-off between data uncertainty and update
frequency
7Answering MIN Query with Error-Bounded Readings
Recorded Temperature
15
Bound for Current Temperature
10
5
0
oC
x
y
8Answering MIN Query with Error-Bounded Readings
Recorded Temperature
probability distribution
15
Bound for Current Temperature
10
- (x,0.7), (y,0.3)
- Answers augmented with probabilistic guarantees
- Measurement error is another source of uncertainty
5
0
oC
x
y
9Uncertain databases
- Treat data uncertainty as a first-class citizen
- Model uncertainty of data attributes, e.g.,
- closed region probability distribution function
(pdf) - Probabilistic query
- answers with probabilities
- imprecise but correct
10Related Work
- Barbara, Garcia-Molina and Porter proposed a
relational data model that incorporates discrete
pdf in attribute values (Attribute Uncertainty)
TKDE92 - Wolfson DPD99 and Pfoser ISSD99 studied
range queries for imprecise locations of moving
objects. - Probabilistic Queries ICDE03, SIGMOD03, TKDE04,
VLDB04a - Deshpande VLDB04c presented probabilistic
prediction for sensor values. - Uncertainty in biometric databases was studied in
ICDE06, ICDE07b. - In VLDB07, evaluation algorithms for skyline
queries were presented.
11Our Contributions
- Database and Query Semantics
- Query classification SIGMOD03
- Quality metrics IS07a, SSDBM08
- Formal query semantics ICDE08b
- Query Evaluation and Indexing
- Range query VLDB04a, VLDB05a, TODS07
- Nearest-neighbor query ICDE03, TKDE04, ICDE08a
- Join CIKM06
- System Implementation
- The ORION database VLDB05c
- Uncertainty and location privacy
- Privacy-aware location services PET06
- Location-dependent query ICDE07a
- Uncertain data mining
- Clustering uncertain data PAKDD06, ICDM06,
DUNE07
12Other Uncertainty Models
- Probabilistic Database each tuple is augmented
with a probability value (tuple uncertainty) - The semi-structured model was studied by Hung,
Getoor and Subrahmanian in ICDE03. - Dalvi Suciu VLDB04b studied efficient query
operator evaluation with ranked results. - Dai Mamoulis SSTD05 studied spatial queries
over data points with existential uncertainty. - VLDB06, ICDE08b combined the studies of
attribute and tuple uncertainty. - A large branch of work deals with fuzzy modeling
IGP06.
13Outline
- Query Classification and Quality
- Probabilistic Range Queries
- Location-Dependent Queries
- Ongoing Projects
14Data Uncertainty
Attribute (temperature, locations) of object
Ti (GPS, sensor)
- fi(x) can be arbitrary, e.g., continuous,
uniform, Gaussian, discrete, histogram - Used in various domains, e.g.,
- location uncertainty DPD99, ISSD99
- biometric databases ICDE06, ICDE07b
15Classification of Probabilistic Queries
- Nature of answer
- Value-based returns a single value
- e.g., Average query (l,u, pdf)
- Entity-based returns a set of objects
- e.g., Range query ((Ti,pi), pigt0)
- Dependence
- Dependent interplay between objects decides
result e.g., Nearest-Neighbor query - Independent whether an object satisfies a query
is independent of others - e.g., Range query
16Classification of Probabilistic Queries
- Only probabilistic range query (entity-based
independent class) is briefly studied
WS99,ISSD99 before our work.
17Equality Join
pdf of a
- In continuous domain, 2 real values are equal at
a point with zero probability. - Resolution c a is equal to b if they are within
c of each other.
cdf of b
18Quality of Probabilistic Result
- Probabilistic queries notion of result "quality"
- Consider a range query (Is vi in l, u?)
- regular range query
- "yes" or "no"
- probabilistic range query
- Recently used in SSDBM08 as data cleaning
metric.
19Quality for Value- Dependent Queries
- Query result l,u, p(x) x ? l,u
- U3,4 less ambiguous than U1,100
- Differential entropy
- Measures uncertainty associated with r.v. X with
pdf p - H(X) attains a max value of log2(u-l)
20The ORION Database
- Based on an open-source database (PostgreSQL 8.0)
- Enhances SQL by providing uncertainty management
functionalities - Recently extended to support 2D data and tuple
uncertainty - The ORION project won the Pan-Pearl IT Project
Competition, China in 2007
21(No Transcript)
22Queries in ORION
23Outline
- Query Classification and Quality
- Probabilistic Range Queries
- Location-Dependent Queries
- Ongoing Projects
24ORION Query Evaluation
Recorded Temperature
Uncertainty for Current Temperature
30
20
10
0
oF
T1
T2
25Probabilistic Threshold Range Query (PTRQ)
- Users are likely to be concerned with results
with a high probability - Retrieve sensor ids with readings between 10oF to
25oF with probability 0.7 - PTRQ Given interval a,b and T, return Ti
where Prob(vi ? a,b) T
26Pruning in a 1D R-Tree SIGMOD84
Minimum Bounding Rectangle (MBR)
- Many irrelevant objects in the MBR (probability lt
T) may be processed. - Similar problems occur with interval indexes
(e.g., FOCS96, ADI00).
27Indexing Uncertain Data
- Probability Threshold Indexing (PTI)
- 1D R-tree with uncertainty rectangles
- Variance-based Clustering (VBC)
- Cluster uncertain data based on their means and
variances
28p-bounds in a PTI Node
left-0.2-bound
right-0.2-bound
? 0.2
0.8
29p-bounds in a PTI Node
left-0-bound (MBR)
right-0-bound (MBR)
30Pruning with p-bounds
left-0.2-bound
right-0.2-bound
- An MBR is not retrieved if there exists a value p
- T gt p
- a on the right of right-p-bound
- An MBR is not retrieved if there exists a value p
- T gt p
- b on the left of left-p-bound
31Implementation of PTI
32Advantages of PTI
- Ability to index any form of uncertainty pdf
- Simple implementation
- Support different queries, e.g.,
- Joins CIKM06
- Location-dependent range queries ICDE07a
- Facilitate query evaluation over
multi-dimensional data VLDB05a, TODS07
33Drawback of PTI
- Extra overhead in storing p-bounds
- Small intervals near edges limit gains
right-0.2-bound
left-0.2-bound
34Variance-based Clustering (VBC)
- Obtain the mean and variance of each object
- Construct the PTI by clustering the (mean,
variance) pairs of uncertain objects - For uniform pdf
- Index (Li,Ri) with a 2D R-tree
- Convert a,b to a trapezoidal query
35VBC for uniform pdf
cluster of large intervals
yRi
xy
(Li,Ri)
- When 2D points are indexed (e.g., by an R-tree),
intervals of different variances are separated
cluster of smaller intervals
xLi
36VBC for Uniform pdf
y Ri
xy
Q(T)
b
a
y(1-T)xT a Intervals containing a
x(1-T)yT ? b Intervals containing b
b-a T(y-x) Intervals containing a,b
a ltx lt y lt b Intervals in a,b
a
b
x Li
a
b
1D View (Uniform pdf)
2D View
37Experimental Setup
38Scalability of Indexes
- Both PTI and VBC outperform R-tree
- Answering PTRQ with R-tree requires more
computation - VBC needs about 50 less I/Os than PTI
39Query Probability Threshold
- R-tree does not benefit from the increasing value
of T - When T is 0.5, VBC is 4 times better than PTI
40Outline
- Query Classification and Quality
- Probabilistic Range Queries
- Location-Dependent Queries
- Ongoing Projects
41Location-Dependent Queries
- Find all vehicles within 2 miles of my current
location - We consider location uncertainty of a user who
issues the query (called query issuer) - measurement error of a GPS device
- privacy concern PET06
- Imprecise Location-Dependent Query (ILDQ)
42What is an ILDQ?
Query issuers actual position
Query issuers actual position
A
Query issuers actual position
Traditional location-dependent query
Query issuers actual position
Evaluate the probability of A for satisfying the
query.
43Basic Evaluation of ILDQ
A
Uncertainty of Query Issuer U with pdf fU(x,y)
R
44Pruning by the Minkowski Sum
May be pruned by exploiting probability threshold
A
R ? U
U
C
R
B
- The Minkowski Sum (R ? U) is evaluated by
computational geometry techniques BK00 - Prune objects with spatial structures (e.g.,
R-Tree)
45The p-expanded-query
A
p-expanded-query
U
Includes only point objects with probability ? p
R
R ? U
0-expanded-query
46Pruning uncertain objects
T-expanded-query
U
pA lt T ? A is pruned
R ? U
can be found by p-bounds!
47A 2D p-bound
p
Uncertainty region
0 ? p ? 0.5
48Deriving p-expanded-query with p-bounds
top-p-bound
p-expanded-query
U
R ? U
left-p-bound
49Pruning Strategy 1 Use p-expanded query
T-expanded-query
U
R ? U
50Pruning Strategy 2 Use an objects p-bound
Uncertain object
right-T-bound
- The objects p-bounds can be indexed by PTI
51Pruning Strategy 3 Use both p-bound and
p-expanded query
If x ? y lt T, then A can be pruned.
A
U
R ? U
right-x-bound (x gt T)
52Experimental Setup
531. Effect of Probability Threshold
60
542. Effect of Gaussian pdf
55Outline
- Query Classification and Quality
- Probabilistic Range Queries
- Location-Dependent Queries
- Ongoing Projects
56Project 1 Location Cloaking
- Purpose
- Study the use of location cloaking for privacy
protection - Investigate the trade-off between location
cloaking and service quality - Grants
- Privacy Protection in Location-based Services
with Location Cloaking (RGC CERG. Ref PolyU
5138/06E). Co-I E. Bertino and S. Prabhakar
(Purdue), HKD 386,000. - Query Processing on Historical Uncertain
Spatiotemporal Data (Co-I, RGC CERG. Ref
120206). PI Y. Tao (CUHK), HKD 961,920. - Efficient Evaluation of Probabilistic
Nearest-Neighbor Queries over Uncertain Data.
Internal Research Grant (ICRG), 2008-09, PolyU.
Ref G-YG27. HKD 120,000. - Efficiency of Privacy Preservation Mechanisms in
Routing over the Internet. Internal Research
Grant (ICRG), 2006-07, PolyU. Ref A-PH09. Co-I
D. Yau (Purdue), HKD 120,000. - Protecting Network Privacy with Spatial and
Temporal Cloaking. Internal Research Grant
(ICRG), 2007-08, PolyU. Ref A-PH39. Co-I D. Yau
(Purdue), HKD 120,000.
57Project 2 Data Stream Management
- Purpose
- Study continuous queries, data uncertainty, and
resource consumption issues in data stream
systems. - Grants
- Adaptive Filters for Continuous Queries over
Constantly-Evolving Data Streams (RGC CERG, Ref
513307, 2008-09). Co-I K. Rothermel (Stuttgart),
HKD 421,512. - Efficient Protocols for Quality-Aware Querying of
Sensor Data in Pervasive Environments, RGC
Germany/Hong Kong Joint Research Scheme
2006/2007. Ref G_HK013/06, Co-I K. Rothermel
(Stuttgart), HKD 59,600. - Affiliated Member of A Research Center for
Ubiquitous Computing (Central Allocation Group
Research Projects, RGC, 2006-09, HKBU 1/05C). PI
Prof. J. Ng (HKBU). - Member of the Infrastructure for Information
Fusion Project (StrucFus), with University of
Skovde, IIT Bombay, HKBU, University of Wuhan,,
2008-10. PI J. Mellin (U. Skovde).
58Our Publications (1)
- Uncertain Database and Query Models
- ICDE08b S. Singh, C. Mayfield, R. Shah, S.
Prabhakar, S. Hambrusch, J. Neville and R. Cheng.
Database Support for pdf Attributes. - IS07a R. Cheng, D. Kalashinkov and S.
Prabhakar. Evaluation of Probabilistic Queries
over Imprecise Data in Constantly-Evolving
Environments. In Information Systems (IS), Vol.
32, No. 1, pp. 104-130, Mar 2007. - SIGMOD03 R. Cheng, D. Kalashinkov and S.
Prabhakar. Evaluating Probabilistic Queries over
Uncertain Data. (Cited 149 times) - Probabilistic Range Queries
- ICDE07a J. Chen and R. Cheng. Efficient
Evaluation of Imprecise Location-Dependent
Queries. In Proc. ICDE 2007. - TODS07 Y. Tao, X. Xiao and R. Cheng. Range
Queries for Multidimensional Data. In IEEE TODS,
2007, 32(3)15. - VLDB05a Y. Tao, R. Cheng, X. Xiao, W. K. Ngai,
B. Kao, and S. Prabhakar. Indexing
multi-dimensional uncertain data with arbitrary
probability density functions. In VLDB 2005. - VLDB04a R. Cheng, Y. Xia, S. Prabhakar, R.
Shah, and J. S. Vitter. Efficient indexing
methods for probabilistic threshold queries over
uncertain data. In VLDB 2004. (Cited 46 times)
59Our Publications (2)
- Probabilistic Nearest-Neighbor Queries
- ICDE08a R. Cheng, J. Chen, M. Mokbel and C.
Chow. Probabilistic Verifiers Evaluating
Constrained Nearest-Neighbor Queries over
Uncertain Data. - TKDE04 R. Cheng, D. V. Kalashnikov, and S.
Prabhakar. Querying imprecise data in moving
object environments. IEEE TKDE, 16(9),2004.
(Cited 69 times) - ICDE03 R. Cheng, D. Kalashinkov and S.
Prabhakar. Querying imprecise data in moving
object environments. - Probabilistic Joins
- CIKM06 R. Cheng, S. Singh, S. Prabhakar, R.
Shah, J. Vitter and Y. Xia. Efficient Join
Processing over Uncertain Data. In ACM 15th Conf.
on Information and Knowledge Management (CIKM
2006), Arlington, USA 2006. - Uncertain Data Mining
- DUNE07 S. Lee, B. Kao and R. Cheng. Reducing
UK-means to K-means. In the 1st Workshop on Data
Mining of Uncertain Data (DUNE), co-located with
IEEE ICDM, Ohama, US, Oct 2007. - ICDM06 J. Ngai, B. Kao, C. Chui, R. Cheng, M.
Chau and K. Yip. Efficient Clustering of
Uncertain Data. In IEEE Intl. Conf. on Data
Mining (IEEE ICDM 2006), Hong Kong, Dec, 2006. - PAKDD06 M. Chau, R. Cheng, B. Kao and J. Ng.
Uncertain Data Mining An Example in Clustering
Location Data. In the Methodologies for Knowledge
Discovery and Data Mining, Pacific-Asia
Conference (PAKDD 2006), Singapore, April 2006. - WSA05 M. Chau, R. Cheng and B. Kao. Uncertain
Data Mining A New Research Direction. Invited
Paper, in the Workshop on the Sciences of The
Artificial (WSA) 2005, National Dong Hwa
University, Taiwan, Dec 2005.
60Our Publications (3)
- Data Stream Management
- SSDBM08 J. Chen and R. Cheng. Quality-Aware
Probing of Uncertain Data with Resource
Constraints. Accepted in SSDBM 2008, July, 2008. - IS07b R. Cheng, K.Y. Lam, S. Prabhakar and B.
Liang. An Efficient Location Update Mechanism for
Continuous Queries over Moving Objects. In
Information Systems (IS), Vol. 32, No. 4, pp.
593-620, Jun 2007. - IDEAS07 T. Farrell, R. Cheng. and K. Rothermel.
Energy-Efficient Monitoring of Mobile Objects
with Uncertainty-Aware Tolerances.Accepted in
Intl. Database Engineering Applications
Sympoisum (IDEAS 2007), Banff, 2007. - RTS07 S. Han, E. Chan, R. Cheng and K. Y. Lam.
A Statistics-Based Sensor Selection Scheme for
Continuous Probabilistic Queries in Sensor
Network. In Real Time Systems Journal (RTS), Vol
. 35, No. 1, pp. 33-58, Jan 2007. - VLDB05b R. Cheng, B. Kao, S. Prabhakar, A. Kwan
and Y. Tu. Adaptive Stream Filters for
Entity-based Queries with Non-Value Tolerance. In
Very Large Databases Conf. (VLDB 2005),
Trondheim, Norway, Aug 2005. Acceptance rate
16.5, 53/322. - ICDE05 R. Cheng, Y. Xia, S. Prabhakar and R.
Shah. Change Tolerant Indexing over Constantly
Evolving Data. In Intl. Conf. on Data Engineering
(IEEE ICDE 2005), Tokyo, Japan, Apr 2005. - Privacy-Aware System Support
- ICNP07 R. Cheng, D. Yau and J. Fu. Packet
Cloaking Protecting Receiver Privacy Against
Traffic Analysis. In the 3rd Workshop on Secure
Network Protocols (NPSec), co-located with IEEE
ICNP, Beijing, China, Oct 2007. - PET06 R. Cheng, Y. Zhang, E. Bertino, and S.
Prabhakar. Preserving user location privacy in
mobile data management infrastructures. In Proc.
6th Workshop on Privacy Enhancing Technologies,
2006.
61References (1)
- FOCS96 L. Arge and J. S. Vitter. On dynamic
interval management in external memory (extended
abstract). In FOCS, p. 560-569, 1996. - TKDE92 D. Barbara, H. Garcia-Molina and D.
Porter. The management of probabilistic data.
IEEE TKDE, 4(5)487-502, 1992. - BK00 M. Berg, M. Kreveld, M. Overmars and O.
Schwarzkopf. Computational Geometry Algorithms
and Applications. 2nd ed., Springer Verlag
(2000). - ICDE06 C. Bohm, A. Pryakhin, and M. Schubert.
The gauss-tree Efficient object identification
in databases of probabilistic feature vectors. In
Proc. ICDE, 2006. - SSTD05 X. Dai, M. L. Yiu, N. Mamoulis, Y. Tao,
and M. Vaitis. Probabilistic Spatial Queries on
Existentially Uncertain Data. Proc. SSTD, pp.
400-417, August 2005. - VLDB04b N. Dalvi and D. Suciu. Efficient Query
Evaluation on Probabilistic Databases. VLDB 2004. - VLDB04c A. Deshpande, C. Guestrin, S. Madden,
J. Hellerstein and W. Hong. Model-Driven Data
Acquisition in Sensor Networks. In VLDB, 2004. - IDG06 J. Galindo, A. Urrutia and M. Piattini.
Fuzzy Databases Modeling, Design, and
Implementation. Ideas Group Publishing, 2006. - SIGMOD84 A. Guttman. R-trees A dynamic index
structure for spatial searching. Proc. of the ACM
SIGMOD Intl. Conf., 1984.
62References (2)
- ICDE03 E. Hung, L. Getoor and V. S.
Subrahmanian. PXML A Probabilistic
Semistructured Data Model and Algebra. In ICDE
2003. - VLDB06 O. Mar, A. Sarma, A. Halevy, and J.
Widom. ULDBs databases with uncertainty and
lineage. In VLDB, 2006. - ICDE07b V. Ljosa and A. K. Singh. APLA
Indexing arbitrary probability distributions. In
Proc. ICDE, 2007. - ADI00 Y. Manolopoulos, Y. Theodoridis, and V.
J. Tsotras. Chapter 4 Access methods for
intervals. In Advanced Database Indexing, Kluwer,
2000. - VLDB07 J. Pei, B. Jiang, X. Lin, and Y. Yuan.
Probabilistic skylines on uncertain data. In
Proc. VLDB, 2007. - DPD99 O. Wolfson, P. Sistla, S. Chamberlain,
and Y. Yesha. Updating and querying databases
that track mobile units. Distributed and Parallel
Databases, 7(3), 1999. - ISSD99 D. Pfoser and C. S. Jensen. Capturing
the Uncertainty of Moving-Object Representations,
in Proc. of the Sixth International Symposium on
Spatio Databases, Hong Kong, July 20-23, 1999,
pp. 111-132.
63Conclusions
- We study the provision of correct, efficient, and
scalable data access - We consider how uncertainty can be treated as a
first-class citizen in DBMS - Other challenges include
- Handling other uncertainty models
- Probabilistic data streams
- Location cloaking
- Mining uncertain data
64Thank You!
- Reynold Cheng
- Email csckcheng_at_comp.polyu.edu.hk
- URL http//www.comp.polyu.edu.hk
- ORION homepage
- http//orion.cs.purdue.edu
65How to define uncertainty pdf?
- The form of uncertainty pdf depends on the
application e.g., Gaussian distribution models
measurement error. - If no information about pdf is known, a simple
way is to assume uniform pdf a pessimistic
estimation - Can also use more sophisticated techniques, based
on time-series analysis on past data for pdf
derivation CH89 - CH89 C. Chatfield. The analysis of time series
an introduction. Chapman and Hall, 1989.
66Classical Decomposition
- For a discrete time series, let Xt be a random
variable at time t - Xt mt st Yt
- mt trend, a slowly-moving function
- moving-average filter, exponential smoothing,
curve fitting/regression - st seasonal component periodic function
- Yt noise component
- Example mt2t1,stsin(t),YtN(0,1)
- pdf(100) N(201sin(100),1)
67Sensor Databases
Goal data retrieval in a correct, efficient and
scalable manner
68Other Works on Attribute Uncertainty
- Deshpande VLDB04c presented probabilistic
prediction for sensor values. - Uncertainty in biometric databases was studied in
ICDE06, ICDE07b. - In VLDB07, evaluation algorithms for skyline
queries are presented.
69Join over Uncertainty
- How do we define comparison operators for
uncertain data?
70Join Operators
- Comparison (,?, gt,lt) between two uncertain items
is probabilistic.
Equality ()
Table 1
Table 2
71PTI Pruning for Joins CIKM06
- Goal Prune pages R and S without examining
individual items - Solution Place p-bounds on R and S, and perform
4 tests with p-bounds
72Solving PTRQ with Interval Indexes
- Use R-tree or interval index FOCS96, JCSS96,
ADI00 to find intervals intersecting a,b - For each object retrieved, evaluate its
probability of being within a,b - Return objects with probability T
73Drawback of PTI
- Extra overhead in storing x-bounds
- Small intervals near edges limit gains
right-0.2-bound
left-0.2-bound
74Clustering 2D points
cluster of large intervals
yRi
- When 2D points are clustered, intervals of
different variances are separated
- Points clustered based on means and variances
(variance-based clustering)
xy
(Li,Ri)
cluster of smaller intervals
xLi
75Answering PTRQ with 2D R-Tree
- Construct a 2D R-tree over uncertain data by
indexing (meani,variancei) - Query the 2D R-Tree
- For uniform pdf, a PTRQ can be converted to a
2D-range query
76Querying Uniform pdf
y Ri
Li
Ri
xy
Q (p 0.75)
b
a
y(1-p)xp a Intervals containing a
a ltx lt y lt b Intervals in a,b
x(1-p)yp ? b Intervals containing b
b-a p(y-x) Intervals containing a,b
a
b
x Li
a
b
1D View (Uniform pdf)
2D View
77Experimental Setup
- 100K uncertain items, with interval size
uniformly distributed in 0,10000, - Assume uniform uncertainty pdf
- 10K PTRQs with query lengths normally distributed
and T ? 0.1,1 - Each PTI node contains 5 p-bounds, where p
?0.1,0.3,0.5,0.7,0.9 - No. of entries per disk page is 20
78Indexing p-bounds with Probability Threshold
Index VLDB04a, VLDB05a
79The p-bound VLDB04a, VLDB05a
p
p
p
Uncertainty region
0 ? p ? 0.5
p
80Query-Data Duality and IUQ
Given a query issuers uncertainty U and
uncertain object A
g
g
where
where
81Exploiting the Probability Threshold
- T?(0,1 Probability Threshold
- Returns objects whose probabilities for
satisfying a ILDQ are ? T
82Future Work
- Uncertainty Management
- Efficient Evaluation of probabilistic
nearest-neighbor queries - Uncertainty management in sensor networks
- Multi-dimensional extension of the ORION database
- Data Stream Management
- Energy-efficient tolerant queries
- Uncertain data cleaning
- Problems related to cleaning of uncertain
databases to achieve better quality under limited
budgets -
83Project 1 Privacy-Aware Location-based Services
- Purpose
- Study the trade-off between uncertainty and
privacy, and design services with large-scale
database indexing support - Grant
- Privacy Protection in Location-based Services
with Location Cloaking (RGC CERG. Ref PolyU
5138/06E). Co-I E. Bertino and S. Prabhakar, HKD
386,000. - Efficient Evaluation of Probabilistic
Nearest-Neighbor Queries over Uncertain Data.
Internal Research Grant (ICRG), 2008-09, PolyU.
Ref G-YG27. HKD 120,000. - Query Processing on Historical Uncertain
Spatiotemporal Data (Co-I, RGC CERG. Ref
120206). PI Y. Tao - Affiliated Member of A Research Center for
Ubiquitous Computing (Central Allocation Group
Research Projects, RGC, 2006-09, HKBU 1/05C). PI
Prof. J. Ng. - Publications
- ICDE08a R. Cheng, J. Chen, M. Mokbel and C.
Chow. Probabilistic Verifiers Evaluating
Constrained Nearest-Neighbor Queries over
Uncertain Data. - ICDE07a J. Chen and R. Cheng. Efficient
Evaluation of Imprecise Location-Dependent
Queries. In Proc. ICDE 2007. - TODS07 Y. Tao, X. Xiao and R. Cheng. Range
Queries for Multidimensional Data. In IEEE TODS,
2007, 32(3)15. - PET06 R. Cheng, Y. Zhang, E. Bertino, and S.
Prabhakar. Preserving user location privacy in
mobile data management infrastructures. In Proc.
6th Workshop on Privacy Enhancing Technologies,
2006.
84Project 2 Quality and Resource Consumption of
Data Streams
- Purpose
- Study the trade-off between query result quality
and resource consumption (e.g., battery power and
network bandwidth) in sensor environments. - Grant
- Adaptive Filters for Continuous Queries over
Constantly-Evolving Data Streams (RGC CERG, Ref
513307, 2008-09). Co-I K. Rothermel, HKD
421,512. - Efficient Protocols for Quality-Aware Querying of
Sensor Data in Pervasive Environments, RGC
Germany/Hong Kong Joint Research Scheme
2006/2007. Ref G_HK013/06, Co-I K. Rothermel,
HKD 59,600. - Publications
- SSDBM08 J. Chen and R. Cheng. Quality-Aware
Probing of Uncertain Data with Resource
Constraints. Accepted in SSDBM 2008, July, 2008. - IS07b R. Cheng, K.Y. Lam, S. Prabhakar and B.
Liang. An Efficient Location Update Mechanism for
Continuous Queries over Moving Objects. In
Information Systems (IS), Vol. 32, No. 4, pp.
593-620, Jun 2007. - IDEAS07 T. Farrell, R. Cheng. and K. Rothermel.
Energy-Efficient Monitoring of Mobile Objects
with Uncertainty-Aware Tolerances.Accepted in
Intl. Database Engineering Applications
Sympoisum (IDEAS 2007), Banff, 2007. - RTS07 S. Han, E. Chan, R. Cheng and K. Y. Lam.
A Statistics-Based Sensor Selection Scheme for
Continuous Probabilistic Queries in Sensor
Network. In Real Time Systems Journal (RTS), Vol
. 35, No. 1, pp. 33-58, Jan 2007. - VLDB05b R. Cheng, B. Kao, S. Prabhakar, A. Kwan
and Y. Tu. Adaptive Stream Filters for
Entity-based Queries with Non-Value Tolerance. In
Very Large Databases Conf. (VLDB 2005),
Trondheim, Norway, Aug 2005. Acceptance rate
16.5, 53/322. - ICDE05 R. Cheng, Y. Xia, S. Prabhakar and R.
Shah. Change Tolerant Indexing over Constantly
Evolving Data. In Intl. Conf. on Data Engineering
(IEEE ICDE 2005), Tokyo, Japan, Apr 2005.
Acceptance rate 12.9, 67/521.
85Project 3 Routing Privacy
- Purpose
- Investigate and develop privacy-aware algorithms
that optimize routing of packets across the
Internet. - Grant
- Efficiency of Privacy Preservation Mechanisms in
Routing over the Internet. Internal Research
Grant (ICRG), 2006-07, PolyU. Ref A-PH09. Co-I
D. Yau, HKD 120,000. - Protecting Network Privacy with Spatial and
Temporal Cloaking. Internal Research Grant
(ICRG), 2007-08, PolyU. Ref A-PH39. Co-I D.
Yau, HKD 120,000. - Publication
- R. Cheng, D. Yau and J. Fu. Packet Cloaking
Protecting Receiver Privacy Against Traffic
Analysis. In the 3rd Workshop on Secure Network
Protocols (NPSec), co-located with IEEE ICNP,
Beijing, China, Oct 2007.
86Architecture of ORION
87Effect of Query Users Uncertainty Region Size
95
88Research Outcome Impact (Google Scholar)
- 1. Evaluating probabilistic queries over
imprecise data. R Cheng, DV Kalashnikov, S
Prabhakar - Proceedings of the 2003 ACM SIGMOD
international conference, 2003. Cited by 149. - 2. Querying imprecise data in moving object
environments. R Cheng, DV Kalashnikov, S
Prabhakar - Knowledge and Data Engineering, IEEE
Transactions on, 2004. Cited by 69. - 3. Efficient indexing methods for probabilistic
threshold queries over uncertain data. R Cheng, Y
Xia, S Prabhakar, R Shah, JS Vitter - Proc. VLDB,
2004. Cited by 46. - 4. Indexing multi-dimensional uncertain data with
arbitrary probability density functions. Y Tao, R
Cheng, X Xiao, WK Ngai, B Kao, S. Prabhakar -
Proc. VLDB 2005. Cited by 29. - 5. Managing uncertainty in sensor database. R
Cheng, S Prabhakar - ACM SIGMOD Record, 2003.
Cited by 21. - 7. Preserving user location privacy in mobile
data management infrastructures. R Cheng, Y
Zhang, E Bertino, S Prabhakar - 6th workshop on
privacy enhancing technologies, 2006 Springer.
Cited by 21. - 6. Maintaining temporal consistency of discrete
objects in soft real-time database systems. B
Kao, KY Lam, B Adelberg, R Cheng, T Lee. IEEE
Transactions on Computers, 2003. Cited by 17. - 8. Adaptive stream filters for entity-based
queries with non-value tolerance. R Cheng, B Kao,
S Prabhakar, A Kwan, Y Tu Proc. VLDB, 2005.
Cited by 16. - 9. Evaluation of concurrency control strategies
for mixed soft real-time database systems. KY
Lam, TW Kuo, B Kao, TSH Lee, R Cheng. Information
Systems, 2002. Cited by 15. - 10. U-DBMS a database system for managing
constantly-evolving data. R Cheng, S Singh, S
Prabhakar Proc. VLDB, 2005. Cited by 14.
89Research Outcome Impact (SCI Index)
- 1. Title Querying imprecise data in moving
object environments - Author(s) Cheng, R Kalashnikov, DV Prabhakar,
S - Source IEEE TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING Volume 16 Issue 9 Pages 1112-1127
Published 2004 - Times Cited 5
- 2. Title Maintaining temporal consistency of
discrete objects in soft real-time database
systems - Author(s) Kao, B Lam, KY Adelberg, B, et al.
- Source IEEE TRANSACTIONS ON COMPUTERS Volume
52 Issue 3 Pages 373-389 Published MAR
2003 - Times Cited 2
- 3. Title Managing uncertainty in sensor
databases - Author(s) Cheng, R Prabhakar, S
- Source SIGMOD RECORD Volume 32 Issue 4
Pages 41-46 Published DEC 2003 - Times Cited 1
- 4. Title Evaluation of concurrency control
strategies for mixed soft real-time database
systems - Author(s) Lam, KY Kuo, TW Kao, B, et al.
- Source INFORMATION SYSTEMS Volume 27
Issue 2 Pages 123-149 Published APR 2002 - Times Cited 1