CPSC 310 Database Systems - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

CPSC 310 Database Systems

Description:

(b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf ... Often, coalescing is not implemented. Too hard and not worth it! ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 42

Provided by: JeffU4

Category:

more less

Transcript and Presenter's Notes

Title: CPSC 310 Database Systems

1
CPSC 310 Database Systems

Lecturer Anxiao (Andrew) Jiang
Lecture Fifteen More on Indexes

2
More on Indexes

Secondary Indexes
B-Trees

3
Secondary Indexes

Sometimes we want multiple indexes on a relation.
Ex search Candies(name,manf) both by name and by
manufacturer
Typically the file would be sorted using the key
(ex name) and the primary index would be on that
field.
The secondary index is on any other attribute
(ex manf).
Secondary index also facilitates finding records,
but cannot rely on them being sorted

4
Sparse Secondary Index?

No!
Since records are not sorted on that key, cannot
predict the location of a record from the
location of any other record.
Thus secondary indexes are always dense.

5
Sequence field

Sparse index

6
Design of Secondary Indexes

Always dense, usually with duplicates
Consists of key-pointer pairs ("key" means search
key, not relation key)
Entries in index file are sorted by key
Therefore second-level index is sparse

7
Secondary indexes
Sequence field
dense first- level
8
Secondary Index and Duplicate Keys

Scheme in previous diagram wastes space in the
present of duplicate keys
If a search key value appears n times in the data
file, then there are n entries for it in the
index.

9
Duplicate values secondary indexes
one option...

Problem
excess overhead!
disk space
search time

10
Buckets

To avoid repeating values, use a level of
indirection
Put buckets between the secondary index file and
the data file
One entry in index for each search key K its
pointer goes to a location in a "bucket file",
called the bucket for K
Bucket holds pointers to all records with search
key K

11
Duplicate values secondary indexes
buckets
12
Why bucket idea is useful

Indexes Records
name primary Emp (name,dept,floor,...)
dept secondary
floor secondary

13
Query SELECT name FROM Emp WHERE dept
'Toy' AND floor 2

Intersect Toy dept bucket and floor 2
bucket to get set of matching EmpsSaves disk
I/O's

14
Summary of Indexes So Far

Advantages
simple
index is sequential file, good for scans
Disadvantages
either inserts are expensive
or lose sequentiality (cf. next slide)
Instead use B-tree data structure to implement
index

Example Index (sequential)
continuous
free space

10
20
30
40
50
60
70
80
90
16
B-Trees

Several related data structures
Key features are
automatically adjust number of levels of indexes
as size of data file changes
storage on blocks is managed to keep every block
between half full and full gt no overflow blocks
needed
We'll actually study B trees

17
B-Tree Structure

an example of a balanced search tree every
root-to-leaf path has same length
each node (vertex) in the tree is a block, which
contains search keys and pointers
parameter n, which is largest value so that n1
pointers and n keys fit in one block
Ex If block size is 4096 bytes, keys are 4
bytes, and pointers are 8 bytes, then n 340.

18
Constraints on B-Tree Nodes

Keys in leaf nodes are copies of keys from data
file, in sorted order
Root contains between 2 and n1 index node
pointers
Each internal node contains between
?(n1)/2? and n1 index node pointers
Each non-leaf node consists of ptr1,key1,ptr2,key2
,,keym-1,ptrm
where ptri points to index node with keys
between keyi-1 and keyi

19
Constraints (cont'd)

Each leaf contains between ?(n1)/2? and n data
record pointers, plus a "next leaf" pointer
Associated with each data record pointer is a
key, and the pointer points to the data record
with that key

20
In textbooks notation n3

Leaf
Non-leaf

30 35
30
35
30
30
21
Sample non-leaf

to keys to keys to keys to keys
lt 57 57? klt81 81?klt95 ?95

57 81 95
22
Sample leaf node

From non-leaf node
to next leaf
in sequence

57 81 95
To record with key 57 To record with key
81 To record with key 95
23
n3

Full node min. node
Non-leaf
Leaf

120 150 180
30
3 5 11
30 35
counts even if null
24
B-Tree Example n3

Root

100
120 150 180
30
3 5 11
120 130
180 200
100 101 110
150 156 179
30 35
25
Insert into Btree

(a) simple case
space available in leaf
(b) leaf overflow
(c) non-leaf overflow
(d) new root

(a) Insert key 32

n3
100
30
3 5 11
30 31
27

(a) Insert key 7

n3
100
30
3 5 11
30 31
28

n3
100
120 150 180
180 200
150 156 179
29

(d) New root, insert 45

n3
10 20 30
1 2 3
10 12
20 25
30 32 40
30
Deletion from B-tree

(a) Simple case - no example
(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys
(d) Cases (b) or (c) at non-leaf

(b) Coalesce with sibling
Delete 50

n4
10 40 100
10 20 30
40 50
32

(c) Redistribute keys
Delete 50

n4
10 40 100
10 20 30 35
40 50
33

(d) Non-leaf coalese
Delete 37

n4
25
10 20
30 40
25 26
30 37
40 45
20 22
10 14
1 3
34
B-tree deletions in practice

Often, coalescing is not implemented
Too hard and not worth it!

35
Applications of B-Trees

B-tree is used to implement indexes
The data record pointers in the leaves correspond
to the data record pointers in sequential indexes
Some example uses
B-tree search key is primary key for data file,
leaf pointers form a dense index on the file
B-tree search key is primary key for data file,
leaf pointers form a sparse index on the file
B-tree search key is not primary key, leaf
pointers form a dense index on the file

36
B-Trees with Duplicate Keys

Change definition of B-tree
If key K appears in an internal node, then K is
the smallest "new" key in the subtree S rooted at
the pointer that follows K in the node
"New" means K does not appear in the part of the
B-tree to the left of S but it does appear in S
Allow null key in certain situations

37
Example B-Tree with Duplicates
17
-- 37 43
7
2 3 5
23 23
43 47
13 17 23
23 37 41
7 13
38
Lookup in B-Trees

Assume no duplicate keys.
Assume B-tree is a dense index.
To find the record with key K, search starting
at the root and ending at a leaf
if current node is not a leaf and has keys K1,
K2, , Kn, find the largest key, Ki, in the
sequence that is K.
follow the (i1)-st pointer to a node at the next
level and repeat
when a leaf node is reached, find the key with
value K and follow the associated pointer to the
data record

39
Range Queries with B-Trees

Range query a query in which a range of values
is sought. Examples
SELECT FROM R WHERE R.k gt 40
SELECT FROM R WHERE R.k gt 10 AND R.k lt 25
To find all keys in the range a,b
Do a lookup on a leads to leaf where a could be
Search the leaf for all keys a
If we find a key gt b, we are done
Else follow next-leaf pointer and continue
searching in the next leaf
Continue until finding a key gt b or no more leaves

40
Efficiency of B-Trees

B-trees allow lookup, insertion and deletion of
records with very few disk I/Os
Number of disk I/Os is number of levels in the
B-tree plus cost of any reorganization
If n is at least 10, then splitting/merging
blocks will be rare and usually limited to the
leaves
For typical sizes of keys, pointers, blocks and
files, 3 levels suffice (see next slide)
Also can keep root block of B-tree in memory

41
Size of B-Tree