Title: Sunil Prabhakar
1(VCI) Velocity Constrained Indexing and (QI)
Query Indexing
- Sunil Prabhakar
- Purdue University
- (Y. Xia, D. Kalashnikov, W. Aref, S. Hambrusch)
2Moving Object Databases
- Pervasive Location-Aware Computing Environments
(PLACE) http//www.cs.purdue.edu/place.html - Proliferation of mobile, wireless, and GPS
technologies. - What if objects can determine their location and
send it to a server along with other data and
queries? - Locator Services, fleet mgmt, groups,
personalized navigation, personalized (localized)
information, targeted advertising, tracking
children, traffic routing, services for the
blind, OZ
3Satellite
Repository Server
Repository Server
Data Broadcast
Satellite Uplink
Regional Server
Regional Server
Mobile Object
4Issues
- Continuous queries over spatio-temporal data.
- How to scale to millions of objects and queries?
- Data Imprecision
- Infrastructure
- How to communicate? Similar to Cell Phones?
- How to determine location?
- Security and Privacy
5Scalable Execution (Indexing)
- Query indexing (IEEE Trans. On Computers)
- Index queries instead of data -- 100 fold
improvement for continuous queries - Velocity-Constrained indexing
- Exploit limit (pessimistic) on object speed
- Topographical Tree index (submitted)
- Index the space (I.e. buildings, highways, etc.)
- Reduces need for update to index
- Main-memory execution (DEXA 02)
- Grid-based solutions for continuous range queries
- Spatial join
6Imprecise Data
- Data is inherently imprecise
- How does this affect queries?
- Limit uncertainty
- Nearest-neighbor queries (ICDE 03)
- Probabilistic answer
- Efficient computation -- VCI, quantization.
- Extend to more general sensor setting (submitted)
- Classification of queries
- Quantification of query quality
- Which sensors to update if resources are limited?
- Based upon impact on quality of current queries.
7Data Broadcast
- Broadcasting spatial data with an index to
minimize - tuning time
- latency
- Single Channel (SSTD 01)
- Optimal query latency
- Multiple Channels (in submission)
8PLACE Prototype
- Location-based services
- Currently using mySQL and SQLServer
- GPS enabled IPAQs, laptops etc.
- Update location and run queries (continuous)
- Studying
- various execution policies
- Scalable location updating
- Call back of mobile devices
- Move to Predator
9Goals for this talk
- Efficient evaluation of continuous range queries
over moving objects. - Monitoring queries
- Set of region queries
- This set rarely changes
- Evaluation over period of time (not once)
- Example Tracking aircraft
- Zones where an aircraft can be tracked down by
enemy radars are specified as continuous region
queries - Alert is given when a friendly aircraft is in
such a zone
10Model
- Objects are points, queries are rectangles
- queries lt objects
- Objects report locations periodically or when
they move significantly - Locations stored in file on
- Evaluation of queries is periodical with fixed
time step
11Organization of talk
- Introduction and motivation
- Related work
- Query indexing
- VCI indexing
- Conclusions
12Related Work (1)
- Trajectories are mapped to points in a higher
dimensional space - Map trajectories to points in a
higher-dimensional space. - Index the higher-dimensional space.
- Queries are transformed to counter the data
transformation. - G. Kollios, D. Gunopulos, and V.J. Tsotras. On
indexing mobile objects. PODS 1999 - J. Tayeb, O.Ulusoy, and O. Wolfson. A Quadtree
based dynamic attribute indexing method. Computer
Journal 98
13Related Work (2)
- Index the past trajectories of moving objects as
line segments - STR-tree ( spatial-temporal R-tree).
- Tb-tree ( trajectory-bundle tree).
D. Pfoser, C. S. Jensen and Y. Theodoridis, Novel
approaches to the indexing of moving object
trajectories. VLDB 2000
14Related Work (3)
- Indexing the current and Anticipated future
-
- TPR-tree (time Parameterized r-tree).
(Parameterize the index structure using velocity
vector.) - S. Saltenis, C. S. Jensen, S. T. Leutenegger, M.
A. Lopez, indexing the positions of continuously
moving objects. SIGMOD 2000
15Related Work (4)
- Precision/ Uncertainty control
- Spatial-temporal data models / data types
None of these addresses the timely execution of
multiple concurrent queries on a collection of
moving objects.
16Traditional Approaches
- An index is built on the data to improve query
performance. - Drawback constant updates to index
- Insert/ delete
- Reconstruct
- Modify
- Brute force
17Traditional Approaches
Parameters Parameters Number of I/O Operations. Number of I/O Operations. Number of I/O Operations. Number of I/O Operations.
m q Reconstruct Ins/Del Modify Brute Force
1,000 1,000 211,817 5,865 3,806 1,010
1,000 10,000 228,308 22,356 20,298 5,100
10,000 1,000 211,317 13,413 22,581 1,010
10,000 10,000 228,508 59,904 39,072 5,100
Brute Force has the lowest I/O cost, but high CPU
cost(we will see that later)
18Organization of talk
- Introduction and motivation
- Related work
- Query indexing
- VCI indexing
- Conclusions
19Query Indexing
- Based upon the observation that continuous
queries are stable, but the data is constantly
evolving (moving objects), proposed solution
Query Indexing - Building index on queries instead of data
- Incremental evaluation for continuous queries
- Optimization Exploit safe regions
20Safe Regions
- Safe Regions possible ranges of movement of an
object without affecting its relevance to any
query - Reduce objects that need to be processed
- Reduce communication cost
- Three types considered
- Max Dist
- Max Sphere
- Max Rect
21Examples of Safe Regions
MaxDist
MaxSphere
MaxRect
22Experimental Evaluation
- 100,000 objects as collection of 5 normal
distributions each with 20,000 objects - Centers of clusters are uniformly distributed,
deviation 0.05 - Queries follow same distribution but with
deviations of 0.1 and 1.0 - 1 lt number of queries lt 10,000
- Query size is 0.01 x 0.01
- Max velocities Zipf with overall Vmax250mph
(1000 miles square space)
23Reduction Rate VS. Query Density
q 10,000
q 1,000
N 100, 000 , m 10,000 Safe regions not
recomputed!
24Performance of Q-Index
N 100,000, m 10,000, q 1,000 Performance
is almost two orders of magnitude better than
traditional approaches.
25Performance VS. number of objects moved
m 1,000
m 10,000
N 100, 000 , q 1,000 As the number of
objects that move at each timestep increases, the
I/O cost increases, until gracefully degrading to
a sequential scan.
26Q-index not main memory resident
N 100, 000 , q 10,000 , m 1000 Brute force
outperforms Q-index in terms of I/O cost, but it
pays a HIGH computation cost that offsets the
reduced I/O.
27Impact of CPU on Brute Force
m q Incremental Brute Force Q Index Max Sphere Max Rect
1000 10,000 3.6s 1.7s 0.9s 0.5s
10,000 10,000 37s 3.1s 1.3s 1.1s
28Impact of Velocity
v 125mph
v 250mph
N 100, 000, m 10,000 , q 1,000 A slight
change in the effectiveness of the optimization
when velocity increases
29Impact of Density
- N 100, 000, m 1,000 , q 10,000
- Reduce region to 10 x 10 miles, speed to 50 mph
- The safe region optimizations are less effective
when density is increased. - Q-Index approach is still an order of magnitude
better than traditional approaches.
30QIndex - performance
- Advantages
- Scales to large numbers of continuous queries
- Scales linearly with number of moving objects
- Relatively insensitive to rate of movement
- Safe Region optimizations are very effective
- Disadvantage
- Sensitive to arrival of new queries!
31Organization of talk
- Introduction and motivation
- Related work
- Query indexing
- Velocity Constrained Indexing
- Conclusions
32Velocity Constrained Indexing(VCI)
- Maintain an index on moving objects
- Problem ordinary/stationary index built on
objects needs to be updates excessively as object
moves. - Main idea of VCI reduce of updates to index by
exploiting limitations of object speed
33VCI and Query Expansion
- Each object has a known maximum speed
- VCI is an R-tree based index with max velocity
info - Each node stores the max velocity over all
objects that it covers.
34MBR Expansion
- VCI build at time t0
- At time t gt t0 it cannot be used, unless updated
- Vmax fields allow us to use it!
- No point moves farther R vmax(t-t0)
- Solution expand all MBRs by R
35Example expansion
Time t0
Time t
MBR
R
Opt. QE vs. MBRE
36Post-processing
- False positives possible
- Postprocessing step to eliminate them
- Post-processing obtains current positions of
objects retrieved by Qexp - This can be expensive!
- We propose several optimizations.
371. Post-processing optimization
- PP is needed for all objects that fall within the
expanded queries. - Note no object moves farther than
- R vmax(t-t0)
- If circle C of radius R around an object X
completely inside query Q, then it is not
necessary to post-process X (for Q).
382. Post-processing optimization
- If not careful, PP can incur an I/O per matching
object - ID all objects that need PP, then retrieve
objects once and check against all queries - Sort objects on page number to avoid multiple
retrievals of the same page - Cluster index to reduce total of pages to be
retrieved
393. Clustered VCI
- Clustering done efficiently after VCI creation
- Depth first traversal
- Each object copied to corresponding place in new
file - Pointers in leaf nodes adjusted
- Improves performance by a factor of 3
40Problem large expansions
- Quality of index degrades with time
- Rebuild after some time (expensive)
- Refresh update leaf nodes with current locations
- Depth first traversal and updating MBRs
- Retains old index structure
- Experimentally works very well
41Performance of VCI, No Clustering
42Performance of VCI, With Clustering
43Analysis
- Pre-processing cost increases since queries
getting larger - Post-processing cost increases since more objects
need to be processed - Total cost approaches sequential scan after 150
time steps - After that better to do sequential scan
- For clustered VCI this number is 400!
44Impact of Refresh on VCI
45Refresh analysis
- Pre-processing reduced since MBRs better fit
underlying data and clock is reset - Post-processing reduced since index is tighter
and less objects need to be processed
46Sensitivity to Query Density
47VCI - Sensitivity to parameters
- Insensitive to
- changes in set of queries (i.e. 100 queries
replaced by other 100 queries) - actual movement of objects (QE only, refresh is
sensitive) - portion of objects that moves (QE only, not
refresh) - Sensitive to
- queries (good for roughly upto 100 queries)
- coverage of objects by queries
- Time scale scales linearly as function of Vmax
(expansions are proportional, Rvmax(t-t0) )
48Combined Scheme
- Both indexes are created and maintained
- Qindex is used to process existing queries very
efficiently - New queries are processed with VCI
- When enough new queries arrive, bulk load into
Qindex to amortize cost.
49Combined Index Schemes
- Initially, N 100,000 m 1,000 q10,000.
- Queries come at a rate of 10 queries per 3
minutes - New queries handled by VCI, when new queries
reach - a threshold (100), ingest the new queries into
Q-Index.
50Conclusion
- Qindex and VCI together offer a robust and
scalable solution for efficient continuous query
evaluation - The performance is significantly better than
traditional approaches
51Research Interests
- Efficient I/O Management
- Applications-specific (Multidimensional,
Multimedia) - Large-scale storage (Tertiary Storage)
- Moving Object/Sensor DB
- Scalability, Indexing, Querying,
- Multimedia Databases
- Quality-of-Service issues
- Security
- Intrusion Detection, Watermarking
- Smart Searching for Tooling
52Reduction Rate VS. Number of Moving Objects
m 1,000
m 10,000
N 100, 000 , q 1,000