Title: Managing Uncertainty in MovingObject and Sensor Databases
1Managing Uncertainty in Moving-Object and Sensor
Databases
- Reynold Cheng
- Department of Computer Sciences
- Purdue University
2Sensor-based Applications
Database System
sensor
sensor
External Environment e.g., temperature, moving
objects, hazardous materials
Network Channel
queries
results
sensor
sensor
user
3Data Uncertainty
- The value of the entity being monitored (e.g.,
temperature, location) keeps changing - Readings are sent periodically, or on-demand
- Limited network bandwidth
- Scarce battery power
- At most points of the time the database stores
stale sensor values (sampling uncertainty)
4Data Uncertainty and Query Incorrectness
Recorded Temperature
30
Current Temperature
x1
y0
20
- x0 lt y0 x is minimum
- y1 lt x1 y is minimum
- Wrong query result!
10
x0
y1
0
oF
x
y
5Minimum Query over Bounded Uncertain Values
Recorded Temperature
Bound for Current Temperature
30
y0
20
- Both x and y have a chance of yielding the
minimum value - Which one has a better chance?
- Measurement error is another source of data
uncertainty
10
x0
0
oF
x
y
6Imprecise Answers
- In general, sensor uncertainty does not allow us
to get exact answer. - Answer is imprecise rather than exact.
- Possible to provide confidence to answers e.g.,
probability values - A probabilistic query returns answers with
probabilistic guarantees
7Probabilistic Minimum Queries
Recorded Temperature
Bound for Current Temperature
30
y0
20
- Answer (x,70), (y,30)
- Different from approximate approaches where part
of the exact data are examined, and provide
statistical guarantees on answers
10
x0
0
oF
x
y
8Talk Outline
- Modeling Sensor Uncertainty
- Classification of Probabilistic Queries
- Probabilistic Queries on Moving Objects
- Quality of Probabilistic Results
- Probability Threshold Queries
9Talk Outline
- Modeling Sensor Uncertainty
- Classification of Probabilistic Queries
- Probabilistic Queries on Moving Objects
- Quality of Probabilistic Results
- Probability Threshold Queries
10Sampling Uncertainty
- The value of the external entity is sampled at
discrete time - Can produce incorrect results (previous example)
- Can be bounded by dead-reckoning update WSCY99
- Each sensor keeps track of the difference (d)
between its current value and the value last sent - Send update to database when d gt deviation
threshold
d
last value
current value
11Measurement Uncertainty
- Measurement Error PJ99
- Due to inherent imprecision in hardware e.g., GPS
- Less serious than sampling error TWZC02
x
12Database Model
13Interval Uncertainty
Ti.a(t)
li(t)
ui(t)
Uncertain Interval Ui(t)
- Each database value is represented by an
uncertain interval - Example Ui(t) is interval bounding all values
within distance of (t-tupdate)?r of Ti.a - tupdate is the time that Ti.a is last updated
- r is the current rate of change of Ti.a
- Example dead-reckoning update WSCY99
14Probabilistic Uncertainty
fi(x,t) uncertain pdf
Ti.a(t)
li(t)
ui(t)
Uncertainty Interval Ui(t)
- Example moving object uncertainty WSCY99,
where fi(x,t) is a normal distribution on a route - Enable evaluation of probabilistic queries
- Can be extended to n dimensions
- Can handle common uncertainty models (line/circle)
15Talk Outline
- Modeling Sensor Uncertainty
- Classification of Probabilistic Queries
- Probabilistic Queries on Moving Objects
- Quality of Probabilistic Results
- Probability Threshold Queries
16Classification of Probabilistic Queries
- Nature of answer
- Value-based returns a single value e.g.,
average query (l,u, pdf) - Entity-based returns a set of objects e.g.,
range query ((Ti,pi), pigt0) - Aggregation
- Non-aggregate whether an object satisfies a
query is independent of others e.g., range query - Aggregate interplay between objects decides
result e.g., minimum query
17Classification of Probabilistic Queries
- In CKP03, we developed query evaluation
algorithms and quality metrics for each class - ERQ is the only query that has been studied
WSCY99 - Evaluating aggregate queries is much harder
18Talk Outline
- Modeling Sensor Uncertainty
- Classification of Probabilistic Queries
- Probabilistic Queries on Moving Objects
- Quality of Probabilistic Results
- Probability Threshold Queries
19Location-Based Services
Moving Object Database
20A Generic Uncertainty Modelfor Moving Objects
- Ui(t) uncertainty region of object Oi at time t
- A closed region where Oi is found only inside
this region. - fi(x,y,t) uncertainty pdf of object Oi
- pdf of Oi's location (x,y) at time t
fi(x,y,t)
Ui(t)
21Sensor Uncertainty andMoving-Object Uncertainty
22Line-Segment Uncertainty
- Assume objects move along straight line paths
- Ui(t) is a line segment WSCY99
- Length of segment is
2Smax(t-tupdate) - fi(x,y,t) is uniform
23Free-Moving Uncertainty
- Assume objects are free to move in any direction
- Ui(t) is a circle WSCY99
- Center of circle is the last reported location.
- radius Smax(t-tupdate)
- fi(x,y,t) is uniform
24Probabilistic Nearest Neighbor Query (PNNQ)
- An entity-based aggregate query executed over
- moving-object uncertainty
- INPUT
- A query point called q
- A set of n objects O1,O2,, On with uncertainty
regions and pdfs known at time t0 - OUTPUT
- A set of (Oi,pi) tuples
- pi is the non-zero probability that Oi is the
nearest neighbor of q at time t0
25PNNQ Solution
- Grow circle Cq(r) from n1 to f
O5
O3
Cq(r)
f
O1
n1
q
pri(r) pdf where Oi is at boundary of Cq(r)
Pi(r) cdf of pri(r) i.e., probability Oi is in
Cq(r)
O4
O2
Only need to know ni, f, pri(r), Pi(r)
O6
26PNNQ for Free-Moving Uncertainty ni and f
- Case 1 q is outside Ui(t0)
Case 2 q is inside Ui(t0)
27PNNQ for Free-Moving UncertaintyPi(r) and
pri(r) (for Case 1)
28Simulation Model
- 100 - 500 PNNQs over 100,000 free-moving
uncertainty objects - Moving in a 1000 x 1000 mile2 space
- Max velocity ? 0, 250mph
- Each object can change speed and direction of
movement
29Effectiveness of PNNQ
- NNQ evaluates stored location values
- Let the true nearest neighbor be m
100
0
- When NNQ finds m, PNNQ obtains pm at 80-100
- When NNQ misses m, PNNQ obtains pm at 30-40
30Talk Outline
- Modeling Sensor Uncertainty
- Classification of Probabilistic Queries
- Probabilistic Queries on Moving Objects
- Quality of Probabilistic Results
- Probability Threshold Queries
31Result Imprecision and Data Uncertainty
- A probabilistic query returns imprecise answers
(i.e., augmented with probability values),
instead of exact answers - Imprecision of query answer is caused by data
uncertainty - Large uncertainty can generate ambiguous answers
32Result Imprecision and Data Uncertainty
- Which of the four sensors give the minimum
reading? - Which result gives us a better answer?
33Factors Affecting Quality of Results
- We have identified four factors that affect
- the quality of probabilistic query results
- Size of uncertainty intervals
- Probability distribution
- Type of query
- Number of answers to a query
34Entity-Based Non-Aggregate Queries
- regular range query
- "yes" or "no" with 100
- probabilistic query ERQ
- yes with pi 95 OK
- yes with pi 5 OK (95 it is not in l, u)
- yes with pi 50 not OK (not certain)
Is reading of sensor i in range l,u ?
35Entity-based Aggregate Queries (1)
- Result set R (Ti, pi)
- e.g. (T1, 30), (T2, 40), (T3, 30)
- Objects cannot be treated independently as in ERQ
- Intuitively, an answer with probability close to
100 is much better than many answers with
similar probabilities (4-sensor example) - Captured by entropy
36Entity-based Aggregate Queries (2)
- H(X) entropy of r.v. X (X1 ,,Xn with p(X1) ,,
p(Xn)) S49 - entropy is the smallest (i.e., 0) iff ? i p(Xi)
1 - entropy is the largest (i.e., log2(n)) iff all
Xi's are equally likely - Answer has high quality if H(R) is low
37Value-based Queries
- Result l, u, p(x) x ? l,u
- e.g. minimum is in 10,20, p(x) U10,20
- Differential entropy
- Measures uncertainty associated with r.v. X with
pdf p
What is the minimum value among n sensors?
38Improving Result Quality
- Given limited resources, how can we best improve
the quality of probabilistic query results? - server requests updates from specific sensors
- reduce data uncertainty, improve result quality
- Global update policy
- Pick from all sensors
- Database is fresh globally
- Query-centric update policy
- Choose only sensors that affect result quality
- Database is only fresh respective to query
- Update Policies for EMinQ show that query-centric
policies perform better than global policies
39Talk Outline
- Modeling Sensor Uncertainty
- Classification of Probabilistic Queries
- Probabilistic Queries on Moving Objects
- Quality of Probabilistic Results
- Probability Threshold Queries
40Problems of Probabilistic Queries
- Evaluating a probabilistic query can be expensive
- Need to manage intervals and pdfs
- Probability values are obtained through costly
integration operations - Preliminary work on indexing of uncertain data
and quantization of probabilistic answers - For many queries, only interesting to know Does
the probability of the answer exceed a given
threshold?
41Probability Threshold Queries (PTQ)
- Return Ti when probability that Ti.a is inside
a,b is greater than p - Example Return the ids of the sensors that have
values over 30oF with a probability of over 0.7
42Efficient Execution of PTQ
- In CXPSV04, we studied two indexing methods for
executing PTQ efficiently. - Basic idea incorporate uncertainty pdf into
interval index - Pruning is based on intervals and pdfs
43Probability Threshold Indexing (PTI)
- Augment probability information to internal nodes
of an interval R-tree - Pairs of x-bounds are stored in each node
- MBR not retrieved if a left x-bound or right
x-bound exists where p gt x
44Two-Dimensional View of PTQ
ng
45Related Work (Sensors)
- Vrbsky Liu VL94 studied approximate answers
for set-valued and single-valued queries over
exact data values. - Olston Widom OW02 discussed the concept of
error-tolerant applications, which trade-off
precision and performance of querying replicated
data, in order to maintain global database
freshness. - Khanna Tan KT01 proposed an online algorithm
to identify elements with minimum update cost so
that a query can be answered within an error
bound. - Manolopoulos et. al MTT00 discuss efficient
interval trees for executing intersection queries
over intervals. - Sistla et al. SWCD98 defined may and must
operators to specify a coarse degree of
imprecision. The uncertainty operators defined by
us are more quantitative.
46Related Work (Moving Objects)
- Wolfson Sistla WSCY99 described uncertainty
models for moving objects - When distance moved larger than a threshold, send
update to server - Uncertainty is either a line or a circle.
- The same authors also discussed probabilistic
range queries for moving objects, assuming
straight routes and normal distributions the
only study about probabilistic queries to our
best knowledge. - Pfoser Jensen PJ99 deduced that uncertainty
between two successive location updates is a
ellipse. - Trajcevski et al. TWZC02 discussed the querying
of trajectories of locations. - Prabhakar et al. PXKAH02 use maximum velocity
information to expand bounding boxes and reduce
updates in an R-tree. - Mokbel et al. MAHS discuss research issues
related to scalable location-aware services.
47Uncertainty, Privacy and Quality of Service
- Location privacy not able to use location data
to identify user - Protected by injecting uncertainty to location
information - Can result in poor service quality
- CP04 proposed a framework relating privacy,
uncertainty and service quality
48Other Queries over Uncertainty
- Probabilistic threshold join queries over
uncertain sensor data - Semantics of a join operation over uncertain
data? - Incorporating uncertainty pdf into traditional
join algorithms e.g., index-loop joins and hash
joins - Probabilistic threshold queries for
nearest-neighbor queries over moving objects - nearest-neighbor queries
- k-nearest-neighbor queries
- reverse-nearest-neighbor queries
49General Queries over Uncertainty
- General probabilistic query
- Evaluate probabilistic queries composed of
relational algebra operators - How to specify data uncertainty in SQL?
- How to present probabilistic answers?
- Indexing over time-varying uncertain intervals
- an uncertainty interval expands with time until
the next value is known
50Conclusions
- Inherent uncertainty in data items that model
continuously-changing entities in the outside
world leads to incorrect query answers - However, if uncertainty can be bounded with pdfs
known, answers can be provided with more
confidence (e.g., probability)
51Conclusions
- General uncertainty models for sensor data
- A classification scheme of probabilistic queries
- Application of sensor model to moving-objects,
and PNNQ evaluation algorithms - Quality of probabilistic answers
- Query-centric update policies
- Uncertainty indexing for probabilistic queries
- Probabilistic threshold queries
52Recent Papers Citing Our Work
- Goce Trajcevski, Ouri Wolfson and Sam
Chamberlain. Managing Uncertainty in Moving
Objects Databases, to appear in September issue
of ACM TODS. - A. Deligiannakis, Y. Kotidis and N. Roussopoulos.
Processing Approximate Aggregate Queries in
Wireless Sensor Networks, Submitted to
Information Systems Journal. - Iosif Lazaridis and Sharad Mehrotra. Approximate
Selection Queries over Imprecise Data. In
International Conference on Data Engineering
(ICDE), March, 2004. - H. Mokhtar and J. Su. Universal Trajectory
Queries for Moving Object Databases. In
Proceedings of IEEE International Conference on
Mobile Data Management, Berkeley, CA, January
19-22. - Bin Lin, Hoda Mokhtar, Rafael Palaez-Aguilera,
Jianwen Su. Querying Moving Objects with
Uncertainty. In Proceedings of IEEE Semiannual
Vehicular Technology Conference, Orlando,
Florida, USA, October 2003. - Eiman Elnahrawy and Badri Nath. Cleaning and
Querying Noisy Sensors. In WSNA 2003, San Diego. - Goce Trajcevski. Probablistic Range Queries in
Moving Objects Databases with Uncertainty. In
MobiDE 2003.
53References (1)
- C89 C. Chatfield. The analysis of time series
an introduction. Chapman and Hall, 1989. - CP04 R. Cheng and S. Prabhakar. Using
Uncertainty to Provide Privacy-Preserving and
High-Quality Location-Based Services. Workshop
on Location Systems Privacy and Control, Mobile
HCI04. - CXPSV04 R. Cheng, Y. Xia, S. Prabhakar, R.
Shah, and J. S. Vitter. Efficient indexing
methods for probabilistic threshold queries over
uncertain data. Submitted to Proc. of the 26th
Intl. Conf. VLDB, 2004. - CKP04 R. Cheng, D. V. Kalashnikov, and S.
Prabhakar. Querying imprecise data in moving
object environments. IEEE Transactions on
Knowledge and Data Engineering, 2004. - CP03 R. Cheng and S. Prabhakar. Managing
uncertainty in sensor databases. In SIGMOD Record
issue on Sensor Technology, December 2003. - CKP03 R. Cheng, D. Kalashnikov, and S.
Prabhakar. Evaluating probabilistic queries over
imprecise data. In Proc. of the ACM SIGMOD Intl.
Conf. on Management of Data, 2003. - CPK03 R. Cheng, S. Prabhakar and D. V.
Kalashnikov. Querying imprecise data in moving
object environments. In Proc. of the 19th IEEE
ICDE, India, 2003. - KT01 S. Khanna and W.C. Tan. On computing
functions with uncertainty. In 20th ACM Symposium
on Principles of Database Systems, 2001. - M94 J. Matousek. Geometric range searching. In
ACM Computing Survey, 26(4)421-461, 1994. - MTT00 Y. Manolopoulos, Y. Theodoridis, and V.J.
Tsotras. Chapter 4 Access methods for intervals.
In Advanced Database Indexing. Kluwer, 2000.
54References (2)
- MAHS M. Mokbel, W. Aref, S. E. Hambrusch, S.
Prabhakar. Towards Scalable Location-Aware
Services Requirements and Research Issues, Proc.
of the ACM Symposium on Geographic Information
Systems (ACM GIS), 110-117. - PJ99 D. Pfoser and C. S. Jensen. Capturing the
Uncertainty of Moving-Object Representations, in
Proc. of the Sixth International Symposium on
Spatio Databases, Hong Kong, July 20-23, 1999,
pp. 111-132. - OW02 C. Olston and J. Widom. Best-effort cache
synchronization with source cooperation. In
Proc. Of the ACM SIGMOD 2002. - PXKAH02 S. Prabhakar, Y. Xia, D. Kalashnikov,
W. Aref, and S. Hambrusch. Query indexing and
velocity constrained indexing Scalable
techniques for continuous queries on moving
objects. IEEE Transactions on Computers,
51(10)11241140, October 2002. - S49 C. Shannon. The Mathematical Theory of
Communicaion. University of Illinois Press,
1949. - SWCD98 P. A. Sistla, O. Wolfson, S.
Chamberlain, and S. Dao. Querying the uncertain
position of moving objects. In Temporal
Databases Research and Practice. 1998. - TWZC02 G. Trajcevski, O. Wolfson, F. Zhang and
S. Chamberlain. The Geometry of Uncertainty in
Moving Objects Databases. In Proc. Intl. Conf. on
Extending Database Technology (EDBT02), Prague,
Czech Republic, March 2002. Springer LNCS 2287,
pp. 233-250. - VL94 S. Vrbsky and J.W.S. Liu. Producing
approximate answers to set- and single-valued
queries. The Journal of Systems and Software,
27(3),1994. - WSCY99 O. Wolfson, P. Sistla, S. Chamberlain,
and Y. Yesha. Updating and querying databases
that track mobile units. Distributed and Parallel
Databases, 7(3), 1999.
55Related Publications
- R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J.
S. Vitter. Efficient indexing methods for
probabilistic threshold queries over uncertain
data. In Proc. of the 26th Intl. Conf. on
VLDB,2004. - R. Cheng and S. Prabhakar. Managing uncertainty
in sensor databases. In SIGMOD Record issue on
Sensor Technology, December 2003. - R. Cheng, D. Kalashnikov, and S. Prabhakar.
Evaluating probabilistic queries over imprecise
data. In Proc. of the ACM SIGMOD Intl. Conf. on
Management of Data, 2003. - R. Cheng, D. V. Kalashnikov, and S. Prabhakar.
Querying imprecise data in moving object
environments. IEEE Transactions on Knowledge and
Data Engineering (To appear), 2004. - R. Cheng, S. Prabhakar, and D. V. Kalashnikov.
Querying imprecise data in moving object
environments. In Proc. of the 19th IEEE Intl.
Conf. on Data Engineering, India, 2003. - R. Cheng and S. Prabhakar. Sensors, uncertainty
models and probabilistic queries. In Encyclopedia
of Database Technologies and Applications, 2005. - R. Cheng and S. Prabhakar. Using Uncertainty to
Provide Privacy-Preserving and High-Quality
Location-Based Services. Workshop on Location
Systems Privacy and Control, Mobile HCI04.
56Other Publications
- R. Cheng. Updates and view maintenance in soft
real-time database systems. In the 2nd ACM Hong
Kong Postgraduate Research Day (2nd ACM HK PG
Day, Best Paper Award), University of Hong Kong,
1999. - B. Kao and R. Cheng. Disk I/O Scheduling. In K.
Y. Lam and T.W. Kuo, editors, Real-Time Database
Systems Architecture and Issues. Kluwer Academic
Publishers, Boston, 2000. - B. Kao, K. Y. Lam, B. Adelberg, R. Cheng, and T.
Lee. Maintaining temporal consistency of discrete
objects in soft real-time database systems. IEEE
Transactions on Computers, 52(3), March 2003. - B. Kao, K.Y. Lam, B. Adelberg, R. Cheng, and Tony
Lee. Updates and view maintenance in soft
real-time database systems. In the Eighth
International Conference on Information and
Knowledge Management, 1999. - K. Y. Lam, T. W. Kuo, B. Kao, Tony S.H. Lee, and
R. Cheng. Evaluation of concurrency control
strategies for mixed soft real-time database
systems. Information Systems, 2001. - C.L. Yip, K.K. Loo, B. Kao, D. Cheung, and C.K.
Cheng. LGen - a lattice-based candidate set
generation algorithm for i/o efficient
association rule mining. In the Third
Pacific-Asia Conference on Knowledge Discovery
and Data Mining(PAKDD 99), 1999.
57Submitted Papers
- R. Cheng, Y. Xia, S. Prabhakar and R. Shah.
Change Tolerant Indexing for Constantly Evolving
Data. IEEE Intl. Conf. on Data Engineering. - K. Y. Lam, R. Cheng, B. Liang and J. Chau. Sensor
Node Selection for Execution of Continuous
Probabilistic Queries in Wireless Sensor
Networks. 2nd ACM Intl. Workshop on Video
Surveillance and Sensor Networks. - K. Y. Lam, R. Cheng, S. Prabhakar, and B. Liang.
Efficient maintenance of temporal consistency of
continuous queries on moving objects. IEEE
Transactions on Mobile Computing.