CPSC 310 Database Systems - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

CPSC 310 Database Systems

Description:

(b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf ... Often, coalescing is not implemented. Too hard and not worth it! ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 42
Provided by: JeffU4
Category:

less

Transcript and Presenter's Notes

Title: CPSC 310 Database Systems


1
CPSC 310 Database Systems
  • Lecturer Anxiao (Andrew) Jiang
  • Lecture Fifteen More on Indexes

2
More on Indexes
  • Secondary Indexes
  • B-Trees

3
Secondary Indexes
  • Sometimes we want multiple indexes on a relation.
  • Ex search Candies(name,manf) both by name and by
    manufacturer
  • Typically the file would be sorted using the key
    (ex name) and the primary index would be on that
    field.
  • The secondary index is on any other attribute
    (ex manf).
  • Secondary index also facilitates finding records,
    but cannot rely on them being sorted

4
Sparse Secondary Index?
  • No!
  • Since records are not sorted on that key, cannot
    predict the location of a record from the
    location of any other record.
  • Thus secondary indexes are always dense.

5
Sequence field
  • Sparse index

6
Design of Secondary Indexes
  • Always dense, usually with duplicates
  • Consists of key-pointer pairs ("key" means search
    key, not relation key)
  • Entries in index file are sorted by key
  • Therefore second-level index is sparse

7
Secondary indexes
Sequence field
dense first- level
8
Secondary Index and Duplicate Keys
  • Scheme in previous diagram wastes space in the
    present of duplicate keys
  • If a search key value appears n times in the data
    file, then there are n entries for it in the
    index.

9
Duplicate values secondary indexes
one option...
  • Problem
  • excess overhead!
  • disk space
  • search time

10
Buckets
  • To avoid repeating values, use a level of
    indirection
  • Put buckets between the secondary index file and
    the data file
  • One entry in index for each search key K its
    pointer goes to a location in a "bucket file",
    called the bucket for K
  • Bucket holds pointers to all records with search
    key K

11
Duplicate values secondary indexes
buckets
12
Why bucket idea is useful
  • Indexes Records
  • name primary Emp (name,dept,floor,...)
  • dept secondary
  • floor secondary

13
Query SELECT name FROM Emp WHERE dept
'Toy' AND floor 2
  • Intersect Toy dept bucket and floor 2
    bucket to get set of matching EmpsSaves disk
    I/O's

14
Summary of Indexes So Far
  • Advantages
  • simple
  • index is sequential file, good for scans
  • Disadvantages
  • either inserts are expensive
  • or lose sequentiality (cf. next slide)
  • Instead use B-tree data structure to implement
    index

15
  • Example Index (sequential)
  • continuous
  • free space

10
20
30
40
50
60
70
80
90
16
B-Trees
  • Several related data structures
  • Key features are
  • automatically adjust number of levels of indexes
    as size of data file changes
  • storage on blocks is managed to keep every block
    between half full and full gt no overflow blocks
    needed
  • We'll actually study B trees

17
B-Tree Structure
  • an example of a balanced search tree every
    root-to-leaf path has same length
  • each node (vertex) in the tree is a block, which
    contains search keys and pointers
  • parameter n, which is largest value so that n1
    pointers and n keys fit in one block
  • Ex If block size is 4096 bytes, keys are 4
    bytes, and pointers are 8 bytes, then n 340.

18
Constraints on B-Tree Nodes
  • Keys in leaf nodes are copies of keys from data
    file, in sorted order
  • Root contains between 2 and n1 index node
    pointers
  • Each internal node contains between
  • ?(n1)/2? and n1 index node pointers
  • Each non-leaf node consists of ptr1,key1,ptr2,key2
    ,,keym-1,ptrm
  • where ptri points to index node with keys
    between keyi-1 and keyi

19
Constraints (cont'd)
  • Each leaf contains between ?(n1)/2? and n data
    record pointers, plus a "next leaf" pointer
  • Associated with each data record pointer is a
    key, and the pointer points to the data record
    with that key

20
In textbooks notation n3
  • Leaf
  • Non-leaf

30 35
30
35
30
30
21
Sample non-leaf
  • to keys to keys to keys to keys
  • lt 57 57? klt81 81?klt95 ?95

57 81 95
22
Sample leaf node
  • From non-leaf node
  • to next leaf
  • in sequence

57 81 95
To record with key 57 To record with key
81 To record with key 95
23
n3
  • Full node min. node
  • Non-leaf
  • Leaf

120 150 180
30
3 5 11
30 35
counts even if null
24
B-Tree Example n3
  • Root

100
120 150 180
30
3 5 11
120 130
180 200
100 101 110
150 156 179
30 35
25
Insert into Btree
  • (a) simple case
  • space available in leaf
  • (b) leaf overflow
  • (c) non-leaf overflow
  • (d) new root

26
  • (a) Insert key 32

n3
100
30
3 5 11
30 31
27
  • (a) Insert key 7

n3
100
30
3 5 11
30 31
28
  • (c) Insert key 160

n3
100
120 150 180
180 200
150 156 179
29
  • (d) New root, insert 45

n3
10 20 30
1 2 3
10 12
20 25
30 32 40
30
Deletion from B-tree
  • (a) Simple case - no example
  • (b) Coalesce with neighbor (sibling)
  • (c) Re-distribute keys
  • (d) Cases (b) or (c) at non-leaf

31
  • (b) Coalesce with sibling
  • Delete 50

n4
10 40 100
10 20 30
40 50
32
  • (c) Redistribute keys
  • Delete 50

n4
10 40 100
10 20 30 35
40 50
33
  • (d) Non-leaf coalese
  • Delete 37

n4
25
10 20
30 40
25 26
30 37
40 45
20 22
10 14
1 3
34
B-tree deletions in practice
  • Often, coalescing is not implemented
  • Too hard and not worth it!

35
Applications of B-Trees
  • B-tree is used to implement indexes
  • The data record pointers in the leaves correspond
    to the data record pointers in sequential indexes
  • Some example uses
  • B-tree search key is primary key for data file,
    leaf pointers form a dense index on the file
  • B-tree search key is primary key for data file,
    leaf pointers form a sparse index on the file
  • B-tree search key is not primary key, leaf
    pointers form a dense index on the file

36
B-Trees with Duplicate Keys
  • Change definition of B-tree
  • If key K appears in an internal node, then K is
    the smallest "new" key in the subtree S rooted at
    the pointer that follows K in the node
  • "New" means K does not appear in the part of the
    B-tree to the left of S but it does appear in S
  • Allow null key in certain situations

37
Example B-Tree with Duplicates
17
-- 37 43
7
2 3 5
23 23
43 47
13 17 23
23 37 41
7 13
38
Lookup in B-Trees
  • Assume no duplicate keys.
  • Assume B-tree is a dense index.
  • To find the record with key K, search starting
    at the root and ending at a leaf
  • if current node is not a leaf and has keys K1,
    K2, , Kn, find the largest key, Ki, in the
    sequence that is K.
  • follow the (i1)-st pointer to a node at the next
    level and repeat
  • when a leaf node is reached, find the key with
    value K and follow the associated pointer to the
    data record

39
Range Queries with B-Trees
  • Range query a query in which a range of values
    is sought. Examples
  • SELECT FROM R WHERE R.k gt 40
  • SELECT FROM R WHERE R.k gt 10 AND R.k lt 25
  • To find all keys in the range a,b
  • Do a lookup on a leads to leaf where a could be
  • Search the leaf for all keys a
  • If we find a key gt b, we are done
  • Else follow next-leaf pointer and continue
    searching in the next leaf
  • Continue until finding a key gt b or no more leaves

40
Efficiency of B-Trees
  • B-trees allow lookup, insertion and deletion of
    records with very few disk I/Os
  • Number of disk I/Os is number of levels in the
    B-tree plus cost of any reorganization
  • If n is at least 10, then splitting/merging
    blocks will be rare and usually limited to the
    leaves
  • For typical sizes of keys, pointers, blocks and
    files, 3 levels suffice (see next slide)
  • Also can keep root block of B-tree in memory

41
Size of B-Tree
  • Assume
  • 4096 bytes per block
  • 4 bytes per key (e.g., integer)
  • 8 bytes per pointer
  • no header info in the block
  • Then n 340 (can keep n keys and n1 pointers in
    a block)
  • Assume on average a block has 255 pointers
  • Count
  • one node at level 1 (the root)
  • 255 nodes at level 2
  • 255255 65,025 nodes at level 3 (leaves)
  • each leaf has 255 pointers, so total number of
    records is more than 16 million
Write a Comment
User Comments (0)
About PowerShow.com