Title: Foundations of Database Systems
1Foundations of Database Systems
- Indexing
- Instructor Zhijun Wang
DBMS and Applications
1
2Announcements
- Quiz2 will be given next week
DBMS and Applications
2
3Physical Database Design
DBMS and Applications
3
4File structures
- Selecting among alternative file structures is
one of the most important choices in physical
database design. - In order to choose intelligently, you must
understand characteristics of available file
structures.
DBMS and Applications
4
5Sequential Files
- Simplest kind of file structure
- Unordered insertion order
- Ordered key order
- Simple to maintain
- Provide good performance for processing large
numbers of records
DBMS and Applications
5
6Unordered Sequential File
DBMS and Applications
6
7Ordered Sequential File
DBMS and Applications
7
8Indexes
- Indexes are special data structures used to
improve database performance - SQL Server automatically creates an index on all
primary and foreign keys - Additional indexes may be assigned on other
columns that are - Frequently used in WHERE clauses
- Used for sorting data
DBMS and Applications
8
9Indexes
- SQL Server supports two kinds of indexes
- Clustered index the data are stored in the
bottom level of the index and in the same order
as that index - Nonclustered index the bottom level of an index
contains pointers to the data - Clustered indexes are faster than nonclustered
indexes for updating and retrieval
DBMS and Applications
9
10Creating an Index By GUI in Enterprise Manager
DBMS and Applications
10
11Hash Files
- Support fast access unique key value
- Converts a key value into a physical record
address - Mod function typical hash function
- Divisor large prime number close to the file
capacity - Physical record number hash function plus the
starting physical record number
DBMS and Applications
11
12Example Hash Function Calculations for StdSSN Key
Assume the file capacity is 100, 97 is the
biggest prime number less than to 100.
Physical Record
DBMS and Applications
12
13Hash File after Insertions
DBMS and Applications
13
14Linear Probe Collision Handling During an Insert
Operation
DBMS and Applications
14
15Multi-Way Tree (Btrees) Files
- A popular file structure supported by most DBMSs.
- Btree provides good performance on both
sequential search and key search.
DBMS and Applications
15
16Properties of Btrees
- Balanced all leaf nodes reside on the same level
- Bushy the number of branches is large, 50-200
- Block-oriented each node in a tree is a block
- Dynamic the shape of a Btree is changed as logic
records are inserted and deleted - Ubiquitous the Btree is a widely implemented and
used file strcture. -
DBMS and Applications
16
17Structure of a Btree of Height 3
DBMS and Applications
17
18Btree Node Containing Keys and Pointers
DBMS and Applications
18
19Btree Insertion Examples
DBMS and Applications
19
20Btree Deletion Examples
DBMS and Applications
20
21B Tree The Most Widely-used Index
- Supports equality and range-searches efficiently.
- Minimum 50 occupancy (except for root).
- Each node contains d lt m lt 2d entries.
- The parameter d is called the order of the tree.
DBMS and Applications
21
22An Example of a B Tree
- Search begins at root, and key comparisons direct
it to a leaf. - Search for 5, 15, all data entries gt 24 ...
- Based on the search for 15 in the appropriate
leaf, we know it is not in the tree!
DBMS and Applications
22
23Inserting a Data Entry into a B Tree
- Find correct leaf L.
- Put data entry onto L.
- If L has enough space, done!
- Else, must split L (into L and a new node L2)
- Redistribute entries evenly, copy up middle key.
- Insert index entry (into parent of L) pointing to
L2. - This can happen recursively
- To split index node, redistribute entries evenly,
but push up middle key. (Contrast with leaf
splits.) - Splits grow tree root split increases height.
- Tree growth gets wider or one level taller at
top.
DBMS and Applications
23
24Inserting 8 into Example B Tree
- Observe how minimum occupancy is guaranteed in
both leaf and index pg splits. - Note difference between copy-up and push-up be
sure you understand the reasons for this.
DBMS and Applications
24
25Example B Tree After Inserting 8
- Notice that root was split, leading to increase
in height. - In this example, we can avoid split by
re-distributing entries - however, this is usually not done in practice.
DBMS and Applications
25
26Deleting a Data Entry from a B Tree
- Start at root, find leaf L where entry belongs.
- Remove the entry.
- If L is at least half-full, done!
- If L has only d-1 entries,
- Try to re-distribute, borrowing from sibling
(adjacent node with same parent as L). - If re-distribution fails, merge L and sibling.
- If merge occurred, must delete entry (pointing to
L or sibling) from parent of L. - Merge could propagate to root, decreasing height.
DBMS and Applications
26
27Example Tree After (Inserting 8, Then) Deleting
20 and 22 ...
- Deleting 20 is easy.
- Deleting 22 is done with re-distribution.
- Notice how middle key (27) is copied up.
DBMS and Applications
27
28... And Then Deleting 24
- Must merge leaf nodes.
- Observe tossof index entry 27 (on right), and
pull down of index entry 19 (below).
DBMS and Applications
28
29Example of Non-leaf Re-distribution
- Tree is shown below during deletion of 24.
- In contrast to previous example, can
re-distribute entry from left child of root to
right child.
DBMS and Applications
29
30After Re-distribution
- Intuitively, entries are re-distributed by
pushing through the splitting entry in the
parent node. - It suffices to re-distribute index entry with key
20 weve re-distributed 17 as well for
illustration.
DBMS and Applications
30
31Cost of Operations
- The height of Btree dominates the number of
physical record accesses operation. - Logarithmic search cost
- Upper bound of height log function
- Log base minimum number of keys in a node
- The cost to insert a key the cost to locate
the nearest key the cost to change nodes.
DBMS and Applications
31
32BTree
- Provides improved performance on sequential and
range searches. - In a Btree, all keys are redundantly stored in
the leaf nodes. - To ensure that physical records are not replaced,
the Btree variation is usually implemented.
DBMS and Applications
32
33Index Matching
- Determining usage of an index for a query
- Complexity of condition determines match.
- Single column indexes , lt, gt, lt, gt, IN ltlist
of valuesgt, BETWEEN, IS NULL, LIKE Pattern
(meta character not the first symbol) - Composite indexes more complex and restrictive
rules
DBMS and Applications
33
34Bitmap Index
- Can be useful for stable columns with few values
- Bitmap
- String of bits 0 (no match) or 1 (match)
- One bit for each row
- Bitmap index record
- Column value
- Bitmap
- DBMS converts bit position into row identifier.
DBMS and Applications
34
35Bitmap Index Example
Bitmap Index on FacRank
Faculty Table
DBMS and Applications
35
36Bitmap Join Index
- Bitmap identifies rows of a related table.
- Represents a precomputed join
- Can define for a join column or a non-join column
- Typically used in query dominated environments
such as data warehouses
DBMS and Applications
36
37Summary of File Structures
DBMS and Applications
37
38Index Selection
- Most important decision
- Difficult decision
- Choice of clustered and nonclustered indexes
DBMS and Applications
38
39Clustering Index Example
DBMS and Applications
39
40Nonclustering Index Example
DBMS and Applications
40
41Inputs and Outputs of Index Selection
DBMS and Applications
41
42Trade-offs in Index Selection
- Balance retrieval against update performance
- Nonclustering index usage
- Few rows satisfy the condition in the query
- Join column usage if a small number of rows
result in child table - Clustering index usage
- Larger number of rows satisfy a condition than
for nonclustering index - Use in sort merge join algorithm to avoid sorting
- More expensive to maintain
DBMS and Applications
42
43Difficulties of Index Selection
- Application weights are difficult to specify.
- Distribution of parameter values needed
- Behavior of the query optimization component must
be known. - The number of choices is large.
- Index choices can be interrelated.
DBMS and Applications
43
44Selection Rules
- Rule 1 A primary key is a good candidate for a
clustering index. - Rule 2 To support joins, consider indexes on
foreign keys. - Rule 3 A column with many values may be a good
choice for a non-clustering index if it is used
in equality conditions. - Rule 4 A column used in highly selective range
conditions is a good candidate for a
non-clustering index.
DBMS and Applications
44
45Selection Rules
- Rule 5 A frequently updated column is not a good
index candidate. - Rule 6 Volatile tables (lots of insertions and
deletions) should not have many indexes. - Rule 7 Stable columns with few values are good
candidates for bitmap indexes if the columns
appear in WHERE conditions. - Rule 8 Avoid indexes on combinations of columns.
Most optimization components can use multiple
indexes on the same table.
DBMS and Applications
45
46Index Creation
- To create the indexes, the CREATE INDEX statement
can be used. - The word following the INDEX keyword is the name
of the index. - CREATE INDEX is not part of SQL1999.
- Example
-
DBMS and Applications
46