Title: Uncertain Spatiotemporal Databases
1Uncertain Spatiotemporal Databases
- Yufei Tao
- Chinese University of Hong Kong
2Spatial databases
- Query processing Retrieval of stationary
locations. - Range search
- Nearest neighbor retrieval
- Object location changes are captured by explicit
updates to the database.
3Retrospect
- First (?) SIGMOD/VLDB/ICDE/PODS paper about query
processing on spatiotemporal data - G. Kollios, D. Gunopulos, and V. Tsotras. On
Indexing Mobile Objects. PODS, 1999. - 2006 1999 7 years.
4The current literature
- Historical retrieval
- A direct (but non-trivial) extension of the
traditional temporal databases - The database stores the historical location of
each object at every single past timestamp.
5The current literature (cont.)
- Find all vehicles that were in the 1km vicinity
of the crime scene at 7pm on Nov 1 2006. - Find the 10 vehicles that were closest to the
crime scene at 7pm on Nov 1 2006.
6The current literature (cont.)
- Future prediction
- A novel area specific to spatiotemporal
- The database stores the current location and
velocity of each object. - The objective is to retrieve objects expected to
satisfy certain predicates in the future.
7The current literature (cont.)
Current movements
Locations 1 timestamp later
- Find the aircrafts expected to appear in the air
space of Hong Kong in 10 minutes. - Find the aircraft nearest to UA801 in 10
minutes.
8The current literature (cont.)
Current movements
- Future prediction captures the conventional
spatial databases as a special case. - Queries concern objects current locations only.
- Hence, object velocities do not matter.
9The current literature (cont.)
- Indexing
- HR-, MVR-, 3D R-trees,
- TPR-, TPR-, STRIPES, Bx-, Bdual-trees,
- Range search
- Nearest neighbor
- Time parameterized NN
- Continuous NN
- Location based NN
- Selectivity estimation
- Histograms
- Sampling
- Stream processing
-
- Fact Harder and harder to impress the database
community.
10A new twist?
- All the above work assumes that the database has
the exact location of each object. - This is rarely possible.
11Uncertainty
An objects location is described by a
probability density function.
12Probabilistic range search
Find the clients that are currently in the town
center with at least 50 probability.
13Qualification probability
Qualification probability
Example uniform
14Qualification probability (cont.)
- Probabilistic modeling in practice
15Qualification probability (cont.)
must be calculated numerically
Calculation time of an appearance probability in
2D space 1.3ms
Time for a random I/O access 10ms
16Exciting opportunities
- Uncertain range search
- Reynold et al. VLDB 04, Tao et al. VLDB 05
- Uncertain nearest neighbor search
- Reynold et al. TKDE 04
- Uncertain selectivity estimation
- No work
- Uncertain stream processing
- No work
-
- All existing work considers only uncertain
stationary objects. - Location updates need to be reported to the
database through explicit updates. - Combined with future prediction or historical
retrieval. - No existing work.
17In this talk
- Goal Motivate query processing on uncertain
spatiotemporal objects - Uncertain range search on stationary objects
- If time permits
- Indexing of historical uncertain data
18Nonfuzzy queries
- Find all objects that appear in a search region
rq with a probability ? tq.
19Fuzzy queries
- Find all objects whose distances to an uncertain
object q are ? eq with a probability ? tq.
Example q the cab with suspects objects
police cars
20Goal
- Support any pdf.
- Convex/concave uncertainty regions
- Minimize the number of page accesses
- Minimize the number of qualification probability
calculations. - Minimize the total cost (I/O CPU)
21Review MBR and why
- Minimum bounding rectangle (MBR)
22Main Idea
- For each object, pre-compute some auxiliary
information that can be used to - efficiently decide whether an object appears in a
region with at least a certain probability - without calculating its actual appearance
probability. - Auxiliary information Probabilistically
constrained rectangle
23Probabilistically Constrained Rectangle (PCR)
24PCR in pruning
Probability threshold tq 0.8.
25PCR in pruning
Probability threshold tq 0.2.
26PCR in validating
tq 0.9
tq 0.8
tq 0.1
o.pcr(0.1)
27PCR in validating (cont.)
tq 0.6
tq 0.8
tq 0.7
o.pcr(0.1)
28Still down the road
- Only a finite number of PCRs can be stored for
each object. Which ones to store? - How to index these PCRs to reduce query cost?
- Cost model analysis?
- Fuzzy range search (repeat all the above issues)
29In this talk
- Goal Motivate query processing on uncertain
spatiotemporal objects - Uncertain range search on stationary objects
- If time permits
- Indexing of historical uncertain data
30Historical uncertain spatiotemporal databases
- Application
- A database stores the locations of each vehicle
on a 10-minute basis in the whole past year. - Possible queries
- Find all vehicles that appeared in the town
center sometime during 6pm-7pm with a probability
at least 50. - Find all vehicles that appeared in the town
center during the entire period of 6pm-7pm with a
probability at least 50. - Nearest neighbor search counterparts
31Existing Uncertainty Models
32Existing Uncertainty Models (cont.)
- The ellipse model Pfoser et al. SSTD 99
33Existing Uncertainty Models (cont.)
- The cylinder model Trajcevski et al. TODS 04
34A general modeling
- Each object o is associated with an o.pdf(x, t),
which captures the probability that o is at
location x at any timestamp t in history.
Conceptually, one pdf like this at every past
timestamp.
35Query semantics Trajcevski et al. TODS 04
- Based on the cylinder model
possibly sometime
possibly always
36Query semantics (cont.)
- Based on the cylinder model
always possibly
definitely always
37Query semantics (cont.)
- Based on the cylinder model
definitely sometime
sometime definitely
38Query processing
Conceptually, one pdf like this at every past
timestamp.
- Difficulties
- There are infinitely number of timestamps in the
past. How to compute the pdfs at all of them? - How to index them?
- Can PCRs help?
- Algorithms for supporting all the previous
semantics based on the index structure?
39Conclusions
- Uncertain spatiotemporal query processing
- Data
- Stationary, historical, and moving
- Queries
- Range search, NN, selectivity estimation,
- New query types fuzzy range search,
- Combinations of the above two.