Title: Spatial Access Methods
1Spatial Access Methods Query Processing
- Matei Lunca Geographic Information Analysis
2004 - Richardson Van Oosterom - Advances In Spatial
Data Handling
2Inhoud
- Extend RDMS for GIS/GIA
- Trees
- Query types
- The curse of dimensionality
- Approximate matches
3Geographic Information Retrieval
- Spatial Access Methods
- Algoritmes voor opslaan en vinden van ruimtelijke
gegevens 3-D met sterke relatie en dus niet via
gewone structuren zoals B-Trees op te slaan - Query Processing
- Datastructuur en DB zoekacties in deze context
- GIS vragen zoals buffer rond rivier
4Extending RDMS for GIS/GIA
- In GIS objects organized by location and
extension in space - Because of arbitrary complexity of spatial
objects access methods for 2D objects such as
minimum bounding rectangles needed - Curse of dimensionality!
5Requirements of spatial access methods
- Dynamic
- Random access and queries must be supported
- Space efficient
- Complex spatial data can in many cases not be
partitioned because of relations between objects,
thus data blocks may be large and not fit In
memory - Efficiency independent of operators/ distribution
- For multiple DB storing different types of data
to be joined - Compatible with concurrency
6Practical requirements
- Costs of computing and communicating data
- Minimize external access costs (I/O)
- Indexing Trees
- Pointers at leaves/nodes
- Searching going down tree
- Fast for range queries
- Hashing address buckets
- No ordering needed
7Challenges in Indexing
- Most DB support
- B-Trees
- Hash tables
- Few DB support
- R-Trees
- Region quadtree
- Why is implementation so difficult?
- Integration with query optimizer
- Providing query operators that utilize the index
- Cost model (efficiency known before
implementation) - Concurrency control and recovery techniques
8Space Driven VS Data Driven
- Space Driven Trees
- Decomposition independent from data insertion
order - Region quadtree
- Data Driven Trees
- Space decomposed based on input data
- Point quadtree
- K-D Tree
9Space/Data Driven Structures
- Space driven structures Grids
- Twin grid file
- Shuffles points between the primary and secondary
file to minimize the total size - Multilayer grid file
- Uses two or more grid files, storing objects in
the first grid file where no splitting across
hyperplanes is needed - Data driven structures - R-Tree
10Trees
- X-Tree
- TR-Tree
- IQ, PX MDX-Trees
- PX-Tree
- TV-Tree
- VAM-Split Trees
11Trees X-Tree
- Adapts R-Trees to high dimensional data
- Overlap-free split based on split history
- R/R-Trees lead to high overlap
- diminish advantages of hierarchical partitions
- When algorithm would lead to unbalanced directory
the X-Tree omits the split and the node becomes a
super node - Supernodes are nodes enlarged by a multiple of
the block size that avoid splits that would
result in an inefficient structure by linear
scanning
12Trees X-Tree (2)
- Dynamically use overlap-minimizing splits
- Supernodes accessed sequentially if no good split
decision found for a directory node
13Trees TR-Tree
- Improved R-Tree
- Represent exact geometry spatial attributes
- Reduce memory operations
- Store components of 1 decomposed object
- Internal node
- Pointer child node
- Minimum bounding rectangle of trapezoids in child
- Leaf node
- Trapezoids
14Trees TR-Tree (2)
- Representation of Bavaria
15Trees IQ-, PX- MDX-Trees
- IQ-Tree
- Index structure for query processing in
high-dimensional data spaces - Compresses data to improve query processing
- PX-Tree Multi-Disc X-Tree
- Parallel access method
- Short response time high query throughput
16Trees TV-Tree
- R-Tree-like varying length feature vector
- Telescope vector
- Divide attributes into
- Those common to all subtree items
- Those used for branching
- Those ignored
- Knowledge about the behaviour of single
attributes (their selectivity) is necessary
17Trees VAM-Split Trees
- VAM-Split R-Tree
- VAM-Split KD-Tree
- Static index structures
- All objects must be available when index is
created - Splits are performed at maximum variance value
- Built in memory before permanently stored on disk
- Size limited to the amount of (virtual) memory
available
18Other Trees
- The Cell Tree
- Levels of data split by arbitrary hyperplanes
- Concave objects decomposed into convex pieces,
which are indexed in every cell that they overlap - The K-D Tree
- Levels of data are split along different
dimensions into non-overlapping cells - Objects indexed in all cells they intersect
19Other Trees (2)
- Generalized BD Tree
- Stores objects as hierarchy of minimum bounding
boxes - The P-Tree
- Hyperplanes split space hierarchically by
polytopes - multidimensional boxes with nonrectangular
sides - R-Tree special case in which all polytopes are
boxes - R-files
- Divide space into hierarchy of nested boxes in
which objects are indexed in lowest cell which
contains them
20Cost Models
- Curse of dimensionality performance
deteriorations - Cost model for query processing in
high-dimensional data spaces for careful
optimization of parameters of an index - Data space quantization
- Data compression - VA File, IQ Tree
- Reduce I/O by representing attributes in less
bits - Page size
- Dimension assignment
21High-dimensional data spaces massive data sets
- Exotic data, cardinality/dimensionality
- Terabyte, petabyte
- Common problem overfit the data
- Common challenge fit model/pattern robustly
- Compression, statistics, stochastic analysis,
discrete mathematics, harmonic analysis - Complexity noisiness lead to constructing
statistical/fuzzy models
22The Pyramid-Technique
- Maps data from D-dimensional space to 1D so
B-Trees can be used to manage data - Data space is divided into 2D pyramids
- Pyramids partitioned into data pages of B-Tree
- No inverse transformation needed because data and
D-dimensional key stored
23The Pyramid-Technique (2)
- Complex queries
- Pyramid value calculated from query input
- Querying the tree with this value
- Result D-dimensional points sharing pyramid
value that must be scanned for the search item - Efficient query processing only in lt 8 D
24Query processing
- Direct VS indirect spatial search
- Direct locating objects in an geographical area
- Indirect queries based on non-spatial
attributes - Show geography complying non-spatial requirements
25Query processing steps
- Query input
- Filter step
- Spatial index
- Candidate set
- Refinement step
- Load spatial extent
- Test spatial extent
- Hits/false drops
- Query result output
26Graphical Query Example
27Graphical Query Example
28Query types
- Point query/point-in-polygon query
- Parameter coordinates
- What objects exists at these coordinates?
- Window/range query
- Parameter region defined by coordinates
- What objects are located in this region?
- Distance and Buffer Zone queries
- Parameters buffer object and distance
- What objects are there within given distance from
buffer?
29Query types (2)
- Path queries (network structure required)
- Parameters network locations
- What is the shortest route from A to B?
- Join and Range queries
- Spatial objects and relationships
- Spatial predicates points, windows, buffers,
paths - Overlaying roads and waterworks GIS layers and
displaying the result according to relative
height (river, bridge, aqueduct) is a spatial join
30Query types (3)
- Feature approach feature vectors
- Neighborhood search
- Spatial-Query-by-Sketch
- Multimedia (2D) search instead of alphanumeric
31Spatial-Query-by-Sketch Sketcho 1.1b
32Spatial-Query-by-Sketch Sketcho 1.1b
33Spatial-Query-by-Sketch Sketcho 1.1b
34Spatial-Query-by-Sketch Sketcho 1.1b
35Similarity search
- Approximate surface
- by parametric functions
- Assigning appropriate
- class to query object
- Section Coding each polygons circumcircle is
decomposed into sectors normalized - Similarity distance feature vectors
36Similarity search (2)
- Shape Histograms (feature vectors!)
- Bins complete disjoint cells of space
- Shell Model
- Concentric uniform shells around the center
- Independent of rotation around the center
- Sector Model
- Distribute uniformly on surface (Voronoi)
37Shape Histograms
38Special Query Types
- Spatial continuous queries
- In dynamic environments continuous pooling
necessary, because otherwise query results
meaningless - Result, expiry time given current motion vector,
and change that can cause expiration - Spatio-temporal queries
- Spatiotemporal Database Systems (STDBS) track and
presenting data about moving objects, such as GPS - Probabilistic models are also available that
attempt to plot future values in order to give
faster response
39Query pre-processing
- Pre-optimize index structure
- With specific knowledge if we use a TIN for
river network studies, valleys are more important
and could be stored at high nodes in tree - Avoid characteristic areas dont store exact
geometry of a chasm, but no-go denomination
40Query processing strategies
- Parallel searches (nice split)
- In varying data structures
- Shape-based strategy
- Models the direction region
- Converts processing of direction predicates into
processing of topological operations between open
shapes and closed geometry objects - Eliminates computation related to world boundary
41Approximate Search/Match
42Screenshots - LTRMP
43Hoofdpunten
- Spatial context definieren/representeren
- Space Driven VS Data Driven
- Ieder toepassing zijn eigen techniek
- Approximate/Fuzzy approach
- Tree
- Hashing
- 3D histogram