Title: Overview of SPGiST
1Overview of SP-GiST
- Walid G. Aref
- Department of Computer Science
- Purdue University
2Indexing
- With the emergence of non-traditional database
applications, the need for non-traditional types
of indexes is inevitable - Example applications
3Challenges in Indexing
- Current database systems support
- B-trees, hash tables
- Very few systems support R-trees
- Fewer systems support a variant of the region
quadtree
4What is Wrong?
- Building and integrating an index type into the
database system is an overwhelming task - Integration with the query optimizer, when to use
the index and when not, cost model, selectivity
estimation - Providing query operators that utilize the index
- Concurrency control and recovery techniques
- Very few index structures/research address all
these issues
5SP-GiST Space-partitioning Generalized Search
Trees
- Software engineering solution to support a wide
class of indexes inside a database management
system - GiST Supports B-tree-like indexes, e.g., R-trees
- SP-GiST Supports the class of space-partitioning
trees, e.g., variants of the quadtree, variants
of the trie, k-d tree
6SP-GiST Space-partitioning Generalized Search
Trees
- The framework provides the basic services inside
a database system, e.g., - Concurrency control and recovery
- Query operators
- Bulk-loading and insertion
- Integration with the query optimizer, costing,
selectivity estimation - Node clustering
- SP-GiST supports the class of space-partitioning
(SP) trees - Suitable for emerging database applications
- An extensible index structure that can be
instantiated to realize any member in the class
of SP trees - Example index structures realizable by SP-GiST
- Disk-based versions of variants of the trie, all
variants of quadtrees and octrees, the k-d tree,
the bin-tree,
7SP-GiST Extensible Interfaces
- Internal Methods
- Supported by the DBMS system
- reflect similarities among the various SP-trees,
e.g., insert, delete, search, bulk-load
algorithms - Interface Parameters and External Methods
- Extensible index interfaces
- Reflect structural and behavioral differences
among various SP-trees - Need to be supported by the user to instantiate a
new type of SP-tree index
8Examples on SP Trees
Data Driven SP Trees Space is decomposed based
on the input data
Space Driven SP Trees Space decomposition is
independent from the order of data insertion
k-d Tree
Trie
9Examples on Index Realization inside SP-GiST
10Main Characteristics of Space-Partitioning Trees
- Decompose the space recursively into a fixed
number of disjoint partitions - There are two types of space-partitioning trees
- Space-driven space-partitioning trees
- e.g., the trie and region quadtree
- Data-driven space-partitioning trees
- e.g., the point quadtree and the k-d tree
11Index Realization using SP-GiST
- Interface Parameters
- Key type Type of the data in the leaf-level of
the tree, - E.g., word, point
- Number of space partitions
- E.g., 4 for a quadtree, 26 for a trie, etc.
- Node predicate gives the predicate to use when
navigating the SP-tree - E.g., letter a, (x,y) inside node.rectangle
- Bucket size determines the maximum number of
items that a leaf node can hold
12Index Realization using SP-GiST
- Resolution determines maximum number of space
decomposition, set based on the space and the
granularity required
13Interface Parameters for SP-GiST
- PathShrink NeverShrink, LeafShrink, TreeShrink
14Interface Parameters for SP-GiST
- NodeShrink Determines if empty partitions should
be kept in the tree or not - E.g., tree vs. forest
15Index Realization using SP-GiST
- External Methods
- Consistent a boolean function to guide the
search in the tree - PickSplit defines the way for splitting nodes in
the tree - Cluster defines how tree nodes are clustered
into disk pages (SP-GiST provides a default
clustering algorithm)
16SP-GiST Internal Methods
- Internal methods are provided by the SP-GiST
framework and does not need recoding - Search Traverses the tree using the Consistent
external method - Insert Delete Builds the tree using the
PickSplit and Cluster external methods - Bulk Load Uses a general bulk loading algorithm
ICDE 2004, Direct Buffering Bulk Loading
(DBDL), that can bulk load any SP-tree - Bulk Insert Uses a general bulk insertion
algorithm ICDE 2004, Buffer Tree Bulk Insertion
(BTBI) that can bulk insert a group of objects in
any SP-tree
17SP-GiST Details
- More detail
- Ilyas and Aref, JIIS 2001, SSDBM 2001
- www.cs.purdue.edu/faculty/aref.html