Foundations of Database Systems - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Foundations of Database Systems

Description:

Foundations of Database Systems Indexing Instructor: Zhijun Wang DBMS and Applications * * This depicts a bitmap column index for a sample Faculty table. – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 47
Provided by: hkpu
Category:

less

Transcript and Presenter's Notes

Title: Foundations of Database Systems


1
Foundations of Database Systems
  • Indexing
  • Instructor Zhijun Wang

DBMS and Applications
1
2
Announcements
  • Quiz2 will be given next week

DBMS and Applications
2
3
Physical Database Design
DBMS and Applications
3
4
File structures
  • Selecting among alternative file structures is
    one of the most important choices in physical
    database design.
  • In order to choose intelligently, you must
    understand characteristics of available file
    structures.

DBMS and Applications
4
5
Sequential Files
  • Simplest kind of file structure
  • Unordered insertion order
  • Ordered key order
  • Simple to maintain
  • Provide good performance for processing large
    numbers of records

DBMS and Applications
5
6
Unordered Sequential File
DBMS and Applications
6
7
Ordered Sequential File
DBMS and Applications
7
8
Indexes
  • Indexes are special data structures used to
    improve database performance
  • SQL Server automatically creates an index on all
    primary and foreign keys
  • Additional indexes may be assigned on other
    columns that are
  • Frequently used in WHERE clauses
  • Used for sorting data

DBMS and Applications
8
9
Indexes
  • SQL Server supports two kinds of indexes
  • Clustered index the data are stored in the
    bottom level of the index and in the same order
    as that index
  • Nonclustered index the bottom level of an index
    contains pointers to the data
  • Clustered indexes are faster than nonclustered
    indexes for updating and retrieval

DBMS and Applications
9
10
Creating an Index By GUI in Enterprise Manager
DBMS and Applications
10
11
Hash Files
  • Support fast access unique key value
  • Converts a key value into a physical record
    address
  • Mod function typical hash function
  • Divisor large prime number close to the file
    capacity
  • Physical record number hash function plus the
    starting physical record number

DBMS and Applications
11
12
Example Hash Function Calculations for StdSSN Key
Assume the file capacity is 100, 97 is the
biggest prime number less than to 100.
Physical Record
DBMS and Applications
12
13
Hash File after Insertions
DBMS and Applications
13
14
Linear Probe Collision Handling During an Insert
Operation
DBMS and Applications
14
15
Multi-Way Tree (Btrees) Files
  • A popular file structure supported by most DBMSs.
  • Btree provides good performance on both
    sequential search and key search.

DBMS and Applications
15
16
Properties of Btrees
  • Balanced all leaf nodes reside on the same level
  • Bushy the number of branches is large, 50-200
  • Block-oriented each node in a tree is a block
  • Dynamic the shape of a Btree is changed as logic
    records are inserted and deleted
  • Ubiquitous the Btree is a widely implemented and
    used file strcture.

DBMS and Applications
16
17
Structure of a Btree of Height 3
DBMS and Applications
17
18
Btree Node Containing Keys and Pointers
DBMS and Applications
18
19
Btree Insertion Examples
DBMS and Applications
19
20
Btree Deletion Examples
DBMS and Applications
20
21
B Tree The Most Widely-used Index
  • Supports equality and range-searches efficiently.
  • Minimum 50 occupancy (except for root).
  • Each node contains d lt m lt 2d entries.
  • The parameter d is called the order of the tree.

DBMS and Applications
21
22
An Example of a B Tree
  • Search begins at root, and key comparisons direct
    it to a leaf.
  • Search for 5, 15, all data entries gt 24 ...
  • Based on the search for 15 in the appropriate
    leaf, we know it is not in the tree!

DBMS and Applications
22
23
Inserting a Data Entry into a B Tree
  • Find correct leaf L.
  • Put data entry onto L.
  • If L has enough space, done!
  • Else, must split L (into L and a new node L2)
  • Redistribute entries evenly, copy up middle key.
  • Insert index entry (into parent of L) pointing to
    L2.
  • This can happen recursively
  • To split index node, redistribute entries evenly,
    but push up middle key. (Contrast with leaf
    splits.)
  • Splits grow tree root split increases height.
  • Tree growth gets wider or one level taller at
    top.

DBMS and Applications
23
24
Inserting 8 into Example B Tree
  • Observe how minimum occupancy is guaranteed in
    both leaf and index pg splits.
  • Note difference between copy-up and push-up be
    sure you understand the reasons for this.

DBMS and Applications
24
25
Example B Tree After Inserting 8
  • Notice that root was split, leading to increase
    in height.
  • In this example, we can avoid split by
    re-distributing entries
  • however, this is usually not done in practice.

DBMS and Applications
25
26
Deleting a Data Entry from a B Tree
  • Start at root, find leaf L where entry belongs.
  • Remove the entry.
  • If L is at least half-full, done!
  • If L has only d-1 entries,
  • Try to re-distribute, borrowing from sibling
    (adjacent node with same parent as L).
  • If re-distribution fails, merge L and sibling.
  • If merge occurred, must delete entry (pointing to
    L or sibling) from parent of L.
  • Merge could propagate to root, decreasing height.

DBMS and Applications
26
27
Example Tree After (Inserting 8, Then) Deleting
20 and 22 ...
  • Deleting 20 is easy.
  • Deleting 22 is done with re-distribution.
  • Notice how middle key (27) is copied up.

DBMS and Applications
27
28
... And Then Deleting 24
  • Must merge leaf nodes.
  • Observe tossof index entry 27 (on right), and
    pull down of index entry 19 (below).

DBMS and Applications
28
29
Example of Non-leaf Re-distribution
  • Tree is shown below during deletion of 24.
  • In contrast to previous example, can
    re-distribute entry from left child of root to
    right child.

DBMS and Applications
29
30
After Re-distribution
  • Intuitively, entries are re-distributed by
    pushing through the splitting entry in the
    parent node.
  • It suffices to re-distribute index entry with key
    20 weve re-distributed 17 as well for
    illustration.

DBMS and Applications
30
31
Cost of Operations
  • The height of Btree dominates the number of
    physical record accesses operation.
  • Logarithmic search cost
  • Upper bound of height log function
  • Log base minimum number of keys in a node
  • The cost to insert a key the cost to locate
    the nearest key the cost to change nodes.

DBMS and Applications
31
32
BTree
  • Provides improved performance on sequential and
    range searches.
  • In a Btree, all keys are redundantly stored in
    the leaf nodes.
  • To ensure that physical records are not replaced,
    the Btree variation is usually implemented.

DBMS and Applications
32
33
Index Matching
  • Determining usage of an index for a query
  • Complexity of condition determines match.
  • Single column indexes , lt, gt, lt, gt, IN ltlist
    of valuesgt, BETWEEN, IS NULL, LIKE Pattern
    (meta character not the first symbol)
  • Composite indexes more complex and restrictive
    rules

DBMS and Applications
33
34
Bitmap Index
  • Can be useful for stable columns with few values
  • Bitmap
  • String of bits 0 (no match) or 1 (match)
  • One bit for each row
  • Bitmap index record
  • Column value
  • Bitmap
  • DBMS converts bit position into row identifier.

DBMS and Applications
34
35
Bitmap Index Example
Bitmap Index on FacRank
Faculty Table
DBMS and Applications
35
36
Bitmap Join Index
  • Bitmap identifies rows of a related table.
  • Represents a precomputed join
  • Can define for a join column or a non-join column
  • Typically used in query dominated environments
    such as data warehouses

DBMS and Applications
36
37
Summary of File Structures
DBMS and Applications
37
38
Index Selection
  • Most important decision
  • Difficult decision
  • Choice of clustered and nonclustered indexes

DBMS and Applications
38
39
Clustering Index Example
DBMS and Applications
39
40
Nonclustering Index Example
DBMS and Applications
40
41
Inputs and Outputs of Index Selection
DBMS and Applications
41
42
Trade-offs in Index Selection
  • Balance retrieval against update performance
  • Nonclustering index usage
  • Few rows satisfy the condition in the query
  • Join column usage if a small number of rows
    result in child table
  • Clustering index usage
  • Larger number of rows satisfy a condition than
    for nonclustering index
  • Use in sort merge join algorithm to avoid sorting
  • More expensive to maintain

DBMS and Applications
42
43
Difficulties of Index Selection
  • Application weights are difficult to specify.
  • Distribution of parameter values needed
  • Behavior of the query optimization component must
    be known.
  • The number of choices is large.
  • Index choices can be interrelated.

DBMS and Applications
43
44
Selection Rules
  • Rule 1 A primary key is a good candidate for a
    clustering index.
  • Rule 2 To support joins, consider indexes on
    foreign keys.
  • Rule 3 A column with many values may be a good
    choice for a non-clustering index if it is used
    in equality conditions.
  • Rule 4 A column used in highly selective range
    conditions is a good candidate for a
    non-clustering index.

DBMS and Applications
44
45
Selection Rules
  • Rule 5 A frequently updated column is not a good
    index candidate.
  • Rule 6 Volatile tables (lots of insertions and
    deletions) should not have many indexes.
  • Rule 7 Stable columns with few values are good
    candidates for bitmap indexes if the columns
    appear in WHERE conditions.
  • Rule 8 Avoid indexes on combinations of columns.
    Most optimization components can use multiple
    indexes on the same table.

DBMS and Applications
45
46
Index Creation
  • To create the indexes, the CREATE INDEX statement
    can be used.
  • The word following the INDEX keyword is the name
    of the index.
  • CREATE INDEX is not part of SQL1999.
  • Example

DBMS and Applications
46
Write a Comment
User Comments (0)
About PowerShow.com