Managing Uncertainty in MovingObject and Sensor Databases - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Managing Uncertainty in MovingObject and Sensor Databases

Description:

fi(x,y,t): uncertainty pdf of object Oi. pdf of Oi's location (x,y) at time t. fi(x,y,t) ... fi(x,y,t) is uniform: 24. Probabilistic Nearest Neighbor Query (PNNQ) ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 58
Provided by: reyn64
Category:

less

Transcript and Presenter's Notes

Title: Managing Uncertainty in MovingObject and Sensor Databases


1
Managing Uncertainty in Moving-Object and Sensor
Databases
  • Reynold Cheng
  • Department of Computer Sciences
  • Purdue University

2
Sensor-based Applications
Database System
sensor
sensor
External Environment e.g., temperature, moving
objects, hazardous materials
Network Channel
queries
results
sensor
sensor
user
3
Data Uncertainty
  • The value of the entity being monitored (e.g.,
    temperature, location) keeps changing
  • Readings are sent periodically, or on-demand
  • Limited network bandwidth
  • Scarce battery power
  • At most points of the time the database stores
    stale sensor values (sampling uncertainty)

4
Data Uncertainty and Query Incorrectness
Recorded Temperature
30
Current Temperature
x1
y0
20
  • x0 lt y0 x is minimum
  • y1 lt x1 y is minimum
  • Wrong query result!

10
x0
y1
0
oF
x
y
5
Minimum Query over Bounded Uncertain Values
Recorded Temperature
Bound for Current Temperature
30
y0
20
  • Both x and y have a chance of yielding the
    minimum value
  • Which one has a better chance?
  • Measurement error is another source of data
    uncertainty

10
x0
0
oF
x
y
6
Imprecise Answers
  • In general, sensor uncertainty does not allow us
    to get exact answer.
  • Answer is imprecise rather than exact.
  • Possible to provide confidence to answers e.g.,
    probability values
  • A probabilistic query returns answers with
    probabilistic guarantees

7
Probabilistic Minimum Queries
Recorded Temperature
Bound for Current Temperature
30
y0
20
  • Answer (x,70), (y,30)
  • Different from approximate approaches where part
    of the exact data are examined, and provide
    statistical guarantees on answers

10
x0
0
oF
x
y
8
Talk Outline
  • Modeling Sensor Uncertainty
  • Classification of Probabilistic Queries
  • Probabilistic Queries on Moving Objects
  • Quality of Probabilistic Results
  • Probability Threshold Queries

9
Talk Outline
  • Modeling Sensor Uncertainty
  • Classification of Probabilistic Queries
  • Probabilistic Queries on Moving Objects
  • Quality of Probabilistic Results
  • Probability Threshold Queries

10
Sampling Uncertainty
  • The value of the external entity is sampled at
    discrete time
  • Can produce incorrect results (previous example)
  • Can be bounded by dead-reckoning update WSCY99
  • Each sensor keeps track of the difference (d)
    between its current value and the value last sent
  • Send update to database when d gt deviation
    threshold

d
last value
current value
11
Measurement Uncertainty
  • Measurement Error PJ99
  • Due to inherent imprecision in hardware e.g., GPS
  • Less serious than sampling error TWZC02

x
12
Database Model
13
Interval Uncertainty
Ti.a(t)
li(t)
ui(t)
Uncertain Interval Ui(t)
  • Each database value is represented by an
    uncertain interval
  • Example Ui(t) is interval bounding all values
    within distance of (t-tupdate)?r of Ti.a
  • tupdate is the time that Ti.a is last updated
  • r is the current rate of change of Ti.a
  • Example dead-reckoning update WSCY99

14
Probabilistic Uncertainty
fi(x,t) uncertain pdf
Ti.a(t)
li(t)
ui(t)
Uncertainty Interval Ui(t)
  • Example moving object uncertainty WSCY99,
    where fi(x,t) is a normal distribution on a route
  • Enable evaluation of probabilistic queries
  • Can be extended to n dimensions
  • Can handle common uncertainty models (line/circle)

15
Talk Outline
  • Modeling Sensor Uncertainty
  • Classification of Probabilistic Queries
  • Probabilistic Queries on Moving Objects
  • Quality of Probabilistic Results
  • Probability Threshold Queries

16
Classification of Probabilistic Queries
  • Nature of answer
  • Value-based returns a single value e.g.,
    average query (l,u, pdf)
  • Entity-based returns a set of objects e.g.,
    range query ((Ti,pi), pigt0)
  • Aggregation
  • Non-aggregate whether an object satisfies a
    query is independent of others e.g., range query
  • Aggregate interplay between objects decides
    result e.g., minimum query

17
Classification of Probabilistic Queries
  • In CKP03, we developed query evaluation
    algorithms and quality metrics for each class
  • ERQ is the only query that has been studied
    WSCY99
  • Evaluating aggregate queries is much harder

18
Talk Outline
  • Modeling Sensor Uncertainty
  • Classification of Probabilistic Queries
  • Probabilistic Queries on Moving Objects
  • Quality of Probabilistic Results
  • Probability Threshold Queries

19
Location-Based Services
Moving Object Database
20
A Generic Uncertainty Modelfor Moving Objects
  • Ui(t) uncertainty region of object Oi at time t
  • A closed region where Oi is found only inside
    this region.
  • fi(x,y,t) uncertainty pdf of object Oi
  • pdf of Oi's location (x,y) at time t

fi(x,y,t)
Ui(t)
21
Sensor Uncertainty andMoving-Object Uncertainty
22
Line-Segment Uncertainty
  • Assume objects move along straight line paths
  • Ui(t) is a line segment WSCY99
  • Length of segment is
    2Smax(t-tupdate)
  • fi(x,y,t) is uniform

23
Free-Moving Uncertainty
  • Assume objects are free to move in any direction
  • Ui(t) is a circle WSCY99
  • Center of circle is the last reported location.
  • radius Smax(t-tupdate)
  • fi(x,y,t) is uniform

24
Probabilistic Nearest Neighbor Query (PNNQ)
  • An entity-based aggregate query executed over
  • moving-object uncertainty
  • INPUT
  • A query point called q
  • A set of n objects O1,O2,, On with uncertainty
    regions and pdfs known at time t0
  • OUTPUT
  • A set of (Oi,pi) tuples
  • pi is the non-zero probability that Oi is the
    nearest neighbor of q at time t0

25
PNNQ Solution
  • Grow circle Cq(r) from n1 to f

O5
O3
Cq(r)
f
O1
n1
q
pri(r) pdf where Oi is at boundary of Cq(r)
Pi(r) cdf of pri(r) i.e., probability Oi is in
Cq(r)
O4
O2
Only need to know ni, f, pri(r), Pi(r)
O6
26
PNNQ for Free-Moving Uncertainty ni and f
  • Case 1 q is outside Ui(t0)

Case 2 q is inside Ui(t0)
27
PNNQ for Free-Moving UncertaintyPi(r) and
pri(r) (for Case 1)
28
Simulation Model
  • 100 - 500 PNNQs over 100,000 free-moving
    uncertainty objects
  • Moving in a 1000 x 1000 mile2 space
  • Max velocity ? 0, 250mph
  • Each object can change speed and direction of
    movement

29
Effectiveness of PNNQ
  • NNQ evaluates stored location values
  • Let the true nearest neighbor be m

100
0
  • When NNQ finds m, PNNQ obtains pm at 80-100
  • When NNQ misses m, PNNQ obtains pm at 30-40

30
Talk Outline
  • Modeling Sensor Uncertainty
  • Classification of Probabilistic Queries
  • Probabilistic Queries on Moving Objects
  • Quality of Probabilistic Results
  • Probability Threshold Queries

31
Result Imprecision and Data Uncertainty
  • A probabilistic query returns imprecise answers
    (i.e., augmented with probability values),
    instead of exact answers
  • Imprecision of query answer is caused by data
    uncertainty
  • Large uncertainty can generate ambiguous answers

32
Result Imprecision and Data Uncertainty
  • Which of the four sensors give the minimum
    reading?
  • Which result gives us a better answer?

33
Factors Affecting Quality of Results
  • We have identified four factors that affect
  • the quality of probabilistic query results
  • Size of uncertainty intervals
  • Probability distribution
  • Type of query
  • Number of answers to a query

34
Entity-Based Non-Aggregate Queries
  • regular range query
  • "yes" or "no" with 100
  • probabilistic query ERQ
  • yes with pi 95 OK
  • yes with pi 5 OK (95 it is not in l, u)
  • yes with pi 50 not OK (not certain)

Is reading of sensor i in range l,u ?
35
Entity-based Aggregate Queries (1)
  • Result set R (Ti, pi)
  • e.g. (T1, 30), (T2, 40), (T3, 30)
  • Objects cannot be treated independently as in ERQ
  • Intuitively, an answer with probability close to
    100 is much better than many answers with
    similar probabilities (4-sensor example)
  • Captured by entropy

36
Entity-based Aggregate Queries (2)
  • H(X) entropy of r.v. X (X1 ,,Xn with p(X1) ,,
    p(Xn)) S49
  • entropy is the smallest (i.e., 0) iff ? i p(Xi)
    1
  • entropy is the largest (i.e., log2(n)) iff all
    Xi's are equally likely
  • Answer has high quality if H(R) is low

37
Value-based Queries
  • Result l, u, p(x) x ? l,u
  • e.g. minimum is in 10,20, p(x) U10,20
  • Differential entropy
  • Measures uncertainty associated with r.v. X with
    pdf p

What is the minimum value among n sensors?
38
Improving Result Quality
  • Given limited resources, how can we best improve
    the quality of probabilistic query results?
  • server requests updates from specific sensors
  • reduce data uncertainty, improve result quality
  • Global update policy
  • Pick from all sensors
  • Database is fresh globally
  • Query-centric update policy
  • Choose only sensors that affect result quality
  • Database is only fresh respective to query
  • Update Policies for EMinQ show that query-centric
    policies perform better than global policies

39
Talk Outline
  • Modeling Sensor Uncertainty
  • Classification of Probabilistic Queries
  • Probabilistic Queries on Moving Objects
  • Quality of Probabilistic Results
  • Probability Threshold Queries

40
Problems of Probabilistic Queries
  • Evaluating a probabilistic query can be expensive
  • Need to manage intervals and pdfs
  • Probability values are obtained through costly
    integration operations
  • Preliminary work on indexing of uncertain data
    and quantization of probabilistic answers
  • For many queries, only interesting to know Does
    the probability of the answer exceed a given
    threshold?

41
Probability Threshold Queries (PTQ)
  • Return Ti when probability that Ti.a is inside
    a,b is greater than p
  • Example Return the ids of the sensors that have
    values over 30oF with a probability of over 0.7

42
Efficient Execution of PTQ
  • In CXPSV04, we studied two indexing methods for
    executing PTQ efficiently.
  • Basic idea incorporate uncertainty pdf into
    interval index
  • Pruning is based on intervals and pdfs

43
Probability Threshold Indexing (PTI)
  • Augment probability information to internal nodes
    of an interval R-tree
  • Pairs of x-bounds are stored in each node
  • MBR not retrieved if a left x-bound or right
    x-bound exists where p gt x

44
Two-Dimensional View of PTQ
ng
45
Related Work (Sensors)
  • Vrbsky Liu VL94 studied approximate answers
    for set-valued and single-valued queries over
    exact data values.
  • Olston Widom OW02 discussed the concept of
    error-tolerant applications, which trade-off
    precision and performance of querying replicated
    data, in order to maintain global database
    freshness.
  • Khanna Tan KT01 proposed an online algorithm
    to identify elements with minimum update cost so
    that a query can be answered within an error
    bound.
  • Manolopoulos et. al MTT00 discuss efficient
    interval trees for executing intersection queries
    over intervals.
  • Sistla et al. SWCD98 defined may and must
    operators to specify a coarse degree of
    imprecision. The uncertainty operators defined by
    us are more quantitative.

46
Related Work (Moving Objects)
  • Wolfson Sistla WSCY99 described uncertainty
    models for moving objects
  • When distance moved larger than a threshold, send
    update to server
  • Uncertainty is either a line or a circle.
  • The same authors also discussed probabilistic
    range queries for moving objects, assuming
    straight routes and normal distributions the
    only study about probabilistic queries to our
    best knowledge.
  • Pfoser Jensen PJ99 deduced that uncertainty
    between two successive location updates is a
    ellipse.
  • Trajcevski et al. TWZC02 discussed the querying
    of trajectories of locations.
  • Prabhakar et al. PXKAH02 use maximum velocity
    information to expand bounding boxes and reduce
    updates in an R-tree.
  • Mokbel et al. MAHS discuss research issues
    related to scalable location-aware services.

47
Uncertainty, Privacy and Quality of Service
  • Location privacy not able to use location data
    to identify user
  • Protected by injecting uncertainty to location
    information
  • Can result in poor service quality
  • CP04 proposed a framework relating privacy,
    uncertainty and service quality

48
Other Queries over Uncertainty
  • Probabilistic threshold join queries over
    uncertain sensor data
  • Semantics of a join operation over uncertain
    data?
  • Incorporating uncertainty pdf into traditional
    join algorithms e.g., index-loop joins and hash
    joins
  • Probabilistic threshold queries for
    nearest-neighbor queries over moving objects
  • nearest-neighbor queries
  • k-nearest-neighbor queries
  • reverse-nearest-neighbor queries

49
General Queries over Uncertainty
  • General probabilistic query
  • Evaluate probabilistic queries composed of
    relational algebra operators
  • How to specify data uncertainty in SQL?
  • How to present probabilistic answers?
  • Indexing over time-varying uncertain intervals
  • an uncertainty interval expands with time until
    the next value is known

50
Conclusions
  • Inherent uncertainty in data items that model
    continuously-changing entities in the outside
    world leads to incorrect query answers
  • However, if uncertainty can be bounded with pdfs
    known, answers can be provided with more
    confidence (e.g., probability)

51
Conclusions
  • General uncertainty models for sensor data
  • A classification scheme of probabilistic queries
  • Application of sensor model to moving-objects,
    and PNNQ evaluation algorithms
  • Quality of probabilistic answers
  • Query-centric update policies
  • Uncertainty indexing for probabilistic queries
  • Probabilistic threshold queries

52
Recent Papers Citing Our Work
  • Goce Trajcevski, Ouri Wolfson and Sam
    Chamberlain. Managing Uncertainty in Moving
    Objects Databases, to appear in September issue
    of ACM TODS.
  • A. Deligiannakis, Y. Kotidis and N. Roussopoulos.
    Processing Approximate Aggregate Queries in
    Wireless Sensor Networks, Submitted to
    Information Systems Journal.
  • Iosif Lazaridis and Sharad Mehrotra. Approximate
    Selection Queries over Imprecise Data. In
    International Conference on Data Engineering
    (ICDE), March, 2004.
  • H. Mokhtar and J. Su. Universal Trajectory
    Queries for Moving Object Databases. In
    Proceedings of IEEE International Conference on
    Mobile Data Management, Berkeley, CA, January
    19-22.
  • Bin Lin, Hoda Mokhtar, Rafael Palaez-Aguilera,
    Jianwen Su. Querying Moving Objects with
    Uncertainty. In Proceedings of IEEE Semiannual
    Vehicular Technology Conference, Orlando,
    Florida, USA, October 2003.
  • Eiman Elnahrawy and Badri Nath. Cleaning and
    Querying Noisy Sensors. In WSNA 2003, San Diego.
  • Goce Trajcevski. Probablistic Range Queries in
    Moving Objects Databases with Uncertainty. In
    MobiDE 2003.

53
References (1)
  • C89 C. Chatfield. The analysis of time series
    an introduction. Chapman and Hall, 1989.
  • CP04 R. Cheng and S. Prabhakar. Using
    Uncertainty to Provide Privacy-Preserving and
    High-Quality Location-Based Services. Workshop
    on Location Systems Privacy and Control, Mobile
    HCI04.
  • CXPSV04 R. Cheng, Y. Xia, S. Prabhakar, R.
    Shah, and J. S. Vitter. Efficient indexing
    methods for probabilistic threshold queries over
    uncertain data. Submitted to Proc. of the 26th
    Intl. Conf. VLDB, 2004.
  • CKP04 R. Cheng, D. V. Kalashnikov, and S.
    Prabhakar. Querying imprecise data in moving
    object environments. IEEE Transactions on
    Knowledge and Data Engineering, 2004.
  • CP03 R. Cheng and S. Prabhakar. Managing
    uncertainty in sensor databases. In SIGMOD Record
    issue on Sensor Technology, December 2003.
  • CKP03 R. Cheng, D. Kalashnikov, and S.
    Prabhakar. Evaluating probabilistic queries over
    imprecise data. In Proc. of the ACM SIGMOD Intl.
    Conf. on Management of Data, 2003.
  • CPK03 R. Cheng, S. Prabhakar and D. V.
    Kalashnikov. Querying imprecise data in moving
    object environments. In Proc. of the 19th IEEE
    ICDE, India, 2003.
  • KT01 S. Khanna and W.C. Tan. On computing
    functions with uncertainty. In 20th ACM Symposium
    on Principles of Database Systems, 2001.
  • M94 J. Matousek. Geometric range searching. In
    ACM Computing Survey, 26(4)421-461, 1994.
  • MTT00 Y. Manolopoulos, Y. Theodoridis, and V.J.
    Tsotras. Chapter 4 Access methods for intervals.
    In Advanced Database Indexing. Kluwer, 2000.

54
References (2)
  • MAHS M. Mokbel,  W. Aref,  S. E. Hambrusch,  S.
    Prabhakar. Towards Scalable Location-Aware
    Services Requirements and Research Issues, Proc.
    of the ACM Symposium on Geographic Information
    Systems (ACM GIS), 110-117.
  • PJ99 D. Pfoser and C. S. Jensen. Capturing the
    Uncertainty of Moving-Object Representations, in
    Proc. of the Sixth International Symposium on
    Spatio Databases, Hong Kong, July 20-23, 1999,
    pp. 111-132.
  • OW02 C. Olston and J. Widom. Best-effort cache
    synchronization with source cooperation. In
    Proc. Of the ACM SIGMOD 2002.
  • PXKAH02 S. Prabhakar, Y. Xia, D. Kalashnikov,
    W. Aref, and S. Hambrusch. Query indexing and
    velocity constrained indexing Scalable
    techniques for continuous queries on moving
    objects. IEEE Transactions on Computers,
    51(10)11241140, October 2002.
  • S49 C. Shannon. The Mathematical Theory of
    Communicaion. University of Illinois Press,
    1949.
  • SWCD98 P. A. Sistla, O. Wolfson, S.
    Chamberlain, and S. Dao. Querying the uncertain
    position of moving objects. In Temporal
    Databases Research and Practice. 1998.
  • TWZC02 G. Trajcevski, O. Wolfson, F. Zhang and
    S. Chamberlain. The Geometry of Uncertainty in
    Moving Objects Databases. In Proc. Intl. Conf. on
    Extending Database Technology (EDBT02), Prague,
    Czech Republic, March 2002. Springer LNCS 2287,
    pp. 233-250.
  • VL94 S. Vrbsky and J.W.S. Liu. Producing
    approximate answers to set- and single-valued
    queries. The Journal of Systems and Software,
    27(3),1994.
  • WSCY99 O. Wolfson, P. Sistla, S. Chamberlain,
    and Y. Yesha. Updating and querying databases
    that track mobile units. Distributed and Parallel
    Databases, 7(3), 1999.

55
Related Publications
  • R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J.
    S. Vitter. Efficient indexing methods for
    probabilistic threshold queries over uncertain
    data. In Proc. of the 26th Intl. Conf. on
    VLDB,2004.
  • R. Cheng and S. Prabhakar. Managing uncertainty
    in sensor databases. In SIGMOD Record issue on
    Sensor Technology, December 2003.
  • R. Cheng, D. Kalashnikov, and S. Prabhakar.
    Evaluating probabilistic queries over imprecise
    data. In Proc. of the ACM SIGMOD Intl. Conf. on
    Management of Data, 2003.
  • R. Cheng, D. V. Kalashnikov, and S. Prabhakar.
    Querying imprecise data in moving object
    environments. IEEE Transactions on Knowledge and
    Data Engineering (To appear), 2004.
  • R. Cheng, S. Prabhakar, and D. V. Kalashnikov.
    Querying imprecise data in moving object
    environments. In Proc. of the 19th IEEE Intl.
    Conf. on Data Engineering, India, 2003.
  • R. Cheng and S. Prabhakar. Sensors, uncertainty
    models and probabilistic queries. In Encyclopedia
    of Database Technologies and Applications, 2005.
  • R. Cheng and S. Prabhakar. Using Uncertainty to
    Provide Privacy-Preserving and High-Quality
    Location-Based Services. Workshop on Location
    Systems Privacy and Control, Mobile HCI04.

56
Other Publications
  • R. Cheng. Updates and view maintenance in soft
    real-time database systems. In the 2nd ACM Hong
    Kong Postgraduate Research Day (2nd ACM HK PG
    Day, Best Paper Award), University of Hong Kong,
    1999.
  • B. Kao and R. Cheng. Disk I/O Scheduling. In K.
    Y. Lam and T.W. Kuo, editors, Real-Time Database
    Systems Architecture and Issues. Kluwer Academic
    Publishers, Boston, 2000.
  • B. Kao, K. Y. Lam, B. Adelberg, R. Cheng, and T.
    Lee. Maintaining temporal consistency of discrete
    objects in soft real-time database systems. IEEE
    Transactions on Computers, 52(3), March 2003.
  • B. Kao, K.Y. Lam, B. Adelberg, R. Cheng, and Tony
    Lee. Updates and view maintenance in soft
    real-time database systems. In the Eighth
    International Conference on Information and
    Knowledge Management, 1999.
  • K. Y. Lam, T. W. Kuo, B. Kao, Tony S.H. Lee, and
    R. Cheng. Evaluation of concurrency control
    strategies for mixed soft real-time database
    systems. Information Systems, 2001.
  • C.L. Yip, K.K. Loo, B. Kao, D. Cheung, and C.K.
    Cheng. LGen - a lattice-based candidate set
    generation algorithm for i/o efficient
    association rule mining. In the Third
    Pacific-Asia Conference on Knowledge Discovery
    and Data Mining(PAKDD 99), 1999.

57
Submitted Papers
  • R. Cheng, Y. Xia, S. Prabhakar and R. Shah.
    Change Tolerant Indexing for Constantly Evolving
    Data. IEEE Intl. Conf. on Data Engineering.
  • K. Y. Lam, R. Cheng, B. Liang and J. Chau. Sensor
    Node Selection for Execution of Continuous
    Probabilistic Queries in Wireless Sensor
    Networks. 2nd ACM Intl. Workshop on Video
    Surveillance and Sensor Networks.
  • K. Y. Lam, R. Cheng, S. Prabhakar, and B. Liang.
    Efficient maintenance of temporal consistency of
    continuous queries on moving objects. IEEE
    Transactions on Mobile Computing.
Write a Comment
User Comments (0)
About PowerShow.com