Title: Spatial DBMS Research
1Spatial DBMS Research
GISt lunch meeting
Wilko Quak
2Overview
- Introduction to DBMS Query Processing
- Benchmarking a spatial DBMS
- The GeoInfoNed project
- MonetDB
- Discussion
3Introduction to DBMS query processing
- Slides borrowed from Dr. Yang He
4Query processing overview
- Review relational algebra
- Query processing
- introduction
- stages of query processing
- query optimisation
- relational algebra tree
5Relational algebra (1)
- a relational languages proposed by Codd
- implementable
- basis of high-level (SQL) query execution
- a collection of simple, 'low-level operations
used to manipulate relations - input is one or more relations
- output is one relation
6Relational algebra (2)
- Relational operations
- unary operators
- Restrict (Select) ?
- Project ?
- binary operators
- Cartesian product X
- Union ?
- Intersection ?
- Difference -
- Join
- Divide ?
P
7Example relations
- e.g. two relations Student and Registration
Student ( SID, Name, Gender ) Registration ( SID,
CID, Mark )
Registration
Student
8Queries examples (1)
- e.g. Identify all male students
- in SQL
- in relational algebra
-
- ? (GenderM) ( Student )
SELECT SID, Name, Gender FROM Student WHERE
Gender'M'
Select
9Queries examples (2)
- e.g. List students name and gender.
- in SQL
- In relational algebra
- ? Name, Gender ( Student )
SELECT Name, Gender FROM Student
Project
10Queries examples (3)
- e.g. Show student ID, name, their course ID and
marks - in SQL
- in relational algebra
SELECT s.SID, Name, CID, Mark FROM Student s,
Registration r WHERE s.SID r.SID
or
Project
Natural Join
11Queries in relational algebra
- A user query may require several operations to be
performed - relational algebra is a procedural language so
query operations are evaluated in the order
specified - a complex query can be executed in different
ways, so an efficient one should be used as
efficiency is an important DBMS requirement
query optimisation
12Query processing
- Four stages involved in query processing
- query decomposition or parsing
- query optimization
- code generation
- runtime query execution
13Query optimization (1)
- refers to the activity of choosing an efficient
execution strategy or plan for processing a query - rule-based and cost-based strategies
- database statistics in system catalog used for
cost estimation - is a prime objective of the query processing
14Query optimization (3)
- In a query processing, disk access takes most
time - The main objective of the query optimisation is
to minimize the number of disk accesses - Many DBMSs use heuristic rules for query
optimization - e.g. Perform selection and projection
operations as early as possible to reduce the
cardinality of the relation and the subsequent
process of that relation
15Query processing an example
- e.g. Show student ID, name, their course ID and
marks - in SQL
- it can be transformed into relational algebra
query
SELECT s.SID, Name, CID, Mark FROM Student s,
Registration r WHERE s.SID r.SID
or
The first one is better much less disk access
than the second
16Relational algebra query tree (2)
( ?SID, Name(Student) ) ( ?
(Registration) )
SID,CID,Mark
17Spatial Query processing
- In spatial query processing the operator is a
spatial operator, for the rest it is the same as
non-spatial query processing - Spatial Select
- Find all objects within given rectangle 99.99
- Spatial Join (overlay in GIS terms)
- Find all restaurants within national parks
18DBMS Benchmarking
- Categorization of DBMS usage
- Implications for benchmarking
- benchmark choices
19Categories of DBMS users
- Static usage
- Predefined queries with changing parameters
- Queries can be hand optimized
- Dynamic usage (browsing)
- Many different queries
- Query optimizer is important
- Access via object-relational mapping (e.g.
Hibernate) - Not discussed here
All categories need different benchmarking
20Benchmarking static DBMS usage
- Notes
- Critical factor is testing the query processor.
- Query optimizer is not important
- Benchmark
- Make small set of simple queries that test one
operation
21Benchmarking dynamic DBMS usage
- Notes
- Critical factor is testing the query optimizer.
- Very hard to get quality reproducible results.
- It is very hard to assess the quality of the
query optimizer but a small testset might give
some insight
select city.name,river.name from city,river where
city.inhabitants gt X and distance(city.geometr
y,river.geometry) lt Y
22Other benchmarking considerations
- Functionality
- Usability
- update behaviour
23GeoInfoNed RGI-232
24GeoInfoNed -- What and Why
- Build a spatially enabled DBMS because
- A DBMS is at the core of many system. If you
improve the core the whole system improves. - There is a need for an (open source)
experimentation platform for Geo DBMS research.
25Who
- CWI Leading DBMS experts with MonetDB
- TUDelft/OTB Knowledge of spatial processes
- CycloMedia Huge dataset and interesting
problems - RWS/AGI Large and diverse datasets and
interesting problems
26How
- At CWI there is the MonetDB DBMS. First we will
extend it with basic spatial types (According to
OpenGIS). - Together with our Problem Holder partners we
will find directions for more extensions. - MonetDB already has support for Image Data, XML
storage and querying etc.
27Example
- Is there a relationship between traffic accidents
and objects near the road?
?
!!
GeoInfoNed
28MonetDB Introduction
- Hardware trends
- MonetDB design considerations
- MonetDB architecture
Slides borrowed from CWI
29Hardware Trends
50 p/year - cpu speed - mem size - mem
bandwidth - disk bandwidth
1 p/year - mem latency
10 p/year - disk latency
30Latency is the enemy!
- Commercial DBMS products (oracle, DB2, SQLserver)
stem from OLTP roots - focus on minimizing random I/Os gt depend on
latency! - MonetDB built for bulk access
- optimize CPU and memory performance
Latency is one of the killing factors in Frisos
simplicial homology implementation
31MonetDB design considerations
- Multi-model database kernel support
- Extensible data types, operators, accelerators
- Database hot-set is memory resident
- Simple data structures are better
- Index management should be automatic
- Do not replicate the operating system
- Optimize when you know the situation
- Cooperative transaction management
32MonetDB product family
End-user application
XQuery
SQL
PHP
JDBC
Perl
ODBC
Python
C-mapi lib
Here a MATLAB interface and Franks life would be
easier
MAPI protocol
Monet kernels
33MonetDB - Physical data organization
- Binary Association Tables
34Discussion