SQL: Queries, Programming, Triggers - PowerPoint PPT Presentation

About This Presentation
Title:

SQL: Queries, Programming, Triggers

Description:

See the notes for information on how the s are organized. – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 73
Provided by: RaghuRa78
Category:

less

Transcript and Presenter's Notes

Title: SQL: Queries, Programming, Triggers


1
Database Systems II Index Structures
2
Introduction
  • We have discussed the organization of records in
    secondary storage blocks.
  • Records have an address, either logical or
    physical.
  • But SQL queries reference attribute values, not
    record addresses.
  • SELECT FROM R WHERE a10
  • How to find the records that have certain
    specified attribute values?

3
Introduction
recordID1 recordID2. . .
?
value
blocks holding records
matchingrecords
value
index
4
Index-Structure Basics
  • Storage structures consist of files.
  • Data files store, e.g., the records of a
    relation.
  • Search key one or more attributes for which we
    want to be able to search efficiently.
  • Index file over a data file for some search key
    associates search key values with pointers to
    (recordID rid) data file records that have this
    value.
  • Sequential file records sorted according to
    their primary key.

5
Index-Structure Basics
Sequential File
6
Index-Structure Basics
  • Three alternatives for data entries k- record
    with key value k- ltk, rid of record with search
    key value kgt- ltk, list of rids of records with
    search key kgt
  • Choice is orthogonal to the indexing technique
    used to locate entries k
  • Two major indexing techniques-
    tree-structures- hash tables.

7
Index-Structure Basics
  • Dense index one index entry for every record in
    the data file.
  • Sparse index index entries only for some of the
    record in the data file. Typically, one entry per
    block of the data file.
  • Primary index determines the location of data
    file records, i.e. order of index entries same as
    order of data records.
  • Secondary index does not determine data location.
  • Can only have one primary index, but multiple
    secondary indexes.

8
Index-Structure Basics
Sequential File
Dense Index
9
Index-Structure Basics
Sequential File
Sparse Index
10
Index-Structure Basics
  • Duplicate key values
  • sparse index
  • data entry for first new key from block

10
20
30
40
11
Index-Structure Basics
  • Sparse index - requires less index space per
    record, - can keep more of index in memory,-
    needed for secondary indexes.
  • Dense index - can tell if any record exists
    without accessing data file,- better for
    insertions.

12
Index-Structure Basics
  • Index file can become very large, e.g. at least
    one tenth of data file size for records with ten
    attributes of same length.
  • To speed-up index access, add a second index
    level on top of the first index level, a third
    level on top of the second one, . . .
  • First level can be dense, other levels are sparse.

13
Index-Structure Basics
Sequential File
Sparse 2nd level
14
Index-Structure Basics
  • Index structure needs to support Equality Queries
    and Range Queries.
  • Equality query one attribute value specified,
    e.g. docID 100, or age 18.
  • Range query attribute range specified, e.g.
    30 lt age lt 40.
  • Index structures must also support DB
    modifications, i.e. insertions, deletions and
    updates.

15
ISAM
  • ISAM Index Sequential Access Method
  • Hierarchy of index files (tree structure)

Non-leaf
blocks
Leaf
blocks
Primary blocks
16
ISAM
  • Leaf blocks contain data entries.
  • Non-leaf blocks contain pairs (ki,pi), where ki
    is a search key value and pi a pointer to the
    (first of the) records with that search key
    value.

P
K
P
K
P
P
K
m
0
1
2
1
m
2
17
ISAM
  • File Creation
  • Leaf (data) blocks allocated sequentially, sorted
    by search key.
  • Then non-leaf blocks allocated, then space for
    overflow blocks.
  • Index entries ltsearch key value, block idgt
    they direct search for data entries, which are
    in leaf blocks.

18
ISAM
Example
19
ISAM
  • Index Operations
  • SearchStart at root use key comparisons to go
    to leaf.
  • Insert Find leaf data entry belongs to, and put
    it there.
  • DeleteFind and remove from leaf if empty
    overflow block, de-allocate.

20
ISAM
  • Example
  • After inserting 23, 48, 41, 42

Root
40
Index
blocks
20
33
51
63
Primary
Leaf
46
55
10
15
20
27
33
37
40
51
97
63
blocks
41
48
23
Overflow
blocks
42
21
ISAM
  • Example
  • After deleting 42, 51, 97.
  • 51 appears in index level, but not in leaf
    level.

Root
40
20
33
51
63
46
55
10
15
20
27
33
37
40
63
41
48
23
22
ISAM
  • Discussion
  • Inserts / deletes affect only leaf pages. ?
    static tree structure
  • Tree can degenerate into a linear list of
    overflow blocks.
  • In this case, ISAM looses all advantages compared
    to a simple, non-hierarchical index file.
  • Can we maintain a balanced tree structure
    dynamically under insertions / deletions?

23
B-Trees
  • Introduction
  • Tree node corresponds to block.
  • B-trees are balanced, i.e. all leaves at same
    level. This guarantees efficient access.
  • B-trees guarantee minimum space utilization.
  • n (order) maximum number of keys per node,
    minimum number of keys is roughly n/2.
  • Exception root may have one key only.
  • m 1 pointers in node, m actual number of keys.

24
B-Trees
Introduction
? leaf nodes are linked in sequential order
? this B-tree variant is normally referred
to as B-tree
25
B-Trees
  • Introduction
  • Node format (p1,k1, . . ., pn,kn,pn1)pi
    pointer, ki search key
  • Node with m pointers has m children and
    corresponding sub-trees.
  • n1-th index entry has only pointer. At leaf
    level, this pointer references the next leaf
    node.
  • Search key property i-th subtree contains data
    entries with search key k ltki, i1-th subtree
    contains data entries with search key k gt ki.

26
B-Trees
Example
Root n 3
100
120 150 180
30
3 5 11
120 130
180 200
100 101 110
150 156 179
30 35
27
B-Trees
Example
57 81 95
to keys to keys to keys to keys lt
57 57? klt81 81?klt95 ?95
Non-leaf (inner) node
28
B-Trees
Example
From non-leaf node to next
leaf in sequence
57 81 95
Leaf node
To record with key 57 To record with key
81 To record with key 85
29
B-Trees
Space utilization
n 3
full node min. node Non-leaf Leaf
120 150 180
30
3 5 11
30 35
counts even if null
30
B-Trees
Space utilization
Number of pointers/keys for B-tree
Max Max Min Min ptrs keys
ptrs?data keys
Non-leaf (non-root)
n1
n
?(n1)/2?
?(n1)/2?- 1
Leaf (non-root)
n1
n
?(n1)/2?
?(n1)/2?
Root
n1
n
1
1
31
B-Trees
Equality Queries
  • To search for key k, start from root.
  • At a given node, find nearest key ki and follow
    left (pi) or right (pi1) pointer depending on
    comparison of k and ki.
  • Continue, until leaf node reached.
  • Explores one path from root to leaf node.
  • Height of B-tree is where N number of records
    indexed
  • ? runtime complexity

32
B-Trees
Insertions
  • Always insert in corresponding leaf.
  • Tree grows bottom-up.
  • Four different cases
  • space available in leaf,
  • leaf overflow,
  • non-leaf overflow,
  • new root.

33
B-Trees
Insertions
  • Space available in leaf

100
n 3
30
Insert key 32
3 5 11
30 31
34
B-Trees
Insertions
  • Leaf overflowsplit overflowing node intotwo of
    (almost) same sizeand copy middle (separating)
    key to father node

100
n 3
30
Insert key 7
3 5 11
30 31
35
B-Trees
Insertions
  • Non-leaf overflow split overflowing
    node and push middle key up to
    father node

100
120 150 180
Insert key 160
180 200
150 156 179
36
B-Trees
Insertions
  • New root split can propagate up
    to the root and result in new root

Insert key 45
10 20 30
1 2 3
10 12
20 25
30 32 40
37
B-Trees
Deletions
  • Locate corresponding leaf node.
  • Delete specified entry.
  • Four different cases
  • Leaf node has still enough entries,
  • Coalesce with neighbor (sibling),
  • Re-distribute keys,
  • Coalesce or re-distribute at non-leaf.

38
B-Trees
Deletions
  • Coalesce with neighbor (sibling)if node
    underflowsand sibling has enoughspace,
    coalescethe two nodes

Delete key 50 n4
10 40 100
10 20 30
40 50
39
B-Trees
Deletions
  • Redistribute keys if node underflowsand sibling
    has extraentry, re-distributeentries of the
    two nodes

Delete key 50 n4
10 40 100
10 20 30 35
40 50
40
B-Trees
Deletions
  • Non-leaf coalesce

Delete key 37 n4
25
10 20
30 40
25 26
30 37
40 45
20 22
10 14
1 3
41
B-Trees
B-Trees in Practice
  • Often, coalescing is not implemented.
  • It is too hard and typically does not gain a lot
    of performance.

42
B-Trees
B-Trees in Practice
  • Typical order 200, typical space utilization
    67. I.e., average fanout 133.
  • Typical capacities
  • Height 4 1334 312,900,700 records,
  • Height 3 1333 2,352,637 records.
  • Can often hold top levels in buffer pool
  • Level 1 1 blocks 8 Kbytes,
  • Level 2 133 blocks 1 Mbyte,
  • Level 3 17,689 blocks 133 Mbytes.
  • ? O(1) processing for equality queries

43
B-Trees
B-Trees in Practice
  • Order (n) concept replaced by physical space
    criterion in practice (at least half-full).
  • Inner nodes can typically hold many more entries
    than leaf nodes.
  • Variable sized records and search keys mean
    different nodes will contain different numbers of
    entries.
  • Even with fixed length fields, multiple records
    with the same search key value (duplicates) can
    lead to variable-sized data entries.

44
Hash Tables
Introduction
  • Tree-based index structures map search key values
    to record addresses via a tree structure.
  • Hash tables perform the same mapping via a hash
    function, which computes the record address.
  • Search key K
  • Hash function h
  • B number of buckets.

45
Hash Tables
Introduction
  • Good hash function should have the following
    property expected number of keys the same
    (similar) for all buckets.
  • This is difficult to accomplish for search keys
    that have a highly skewed distribution, e.g.
    names.
  • Common hash function K x1 x2 xn n byte
    character string
  • ?B often chosen as prime number

46
Hash Tables
Secondary-Storage Hash Tables
  • Bucket collection of blocks.
  • Initially, bucket consists of one block.
  • Records hashed to b are stored in bucket b.
  • If bucket capacity exceeded, link chain of
    overflow buckets.
  • Assume that address of first block of bucket i
    can be computed given i.
  • E.g., main memory array of pointers to blocks.

47
Hash Tables
Secondary-Storage Hash Tables
  • Hash tables can perform their mapping directly or
    indirectly.

48
Hash Tables
Insertions
  • To insert record with search key K.
  • Compute h(K) i.
  • Insert record into first block of bucket i that
    has enough space.
  • If none of the current blocks has space, add a
    new block to the overflow chain, and store new
    record there.

49
Hash Tables
Insertions
bucket capacity 2 records
50
Hash Tables
  • Deletions
  • To delete record with search key K.
  • Compute h(K) i.
  • Locate record(s) with search key K in bucket i.
  • If possible, move up remaining records within
    block.
  • If possible, move remaining records from overflow
    chain to the previous block and de-allocate block.

51
Hash Tables
Deletions
0 1 2 3
a
Deleteef
b
d
c
c
e
f
g
52
Hash Tables
  • Queries
  • To find record(s) with search key K.
  • Compute h(K) i.
  • Locate record(s) with search key K in bucket i,
    following the overflow chain.
  • In the absence of overflow blocks, only one block
    I/O necessary, i.e. O(1) runtime.
  • This is (much) better than B-trees.
  • But hash tables do not support range queries!

53
Hash Tables
  • Queries
  • In order to keep overflow chains short, keep
    space utilization between 50 and 80.
  • If space utilization lt 50 waste space.
  • If space utilization gt 80 overflow chains
    become significant.
  • Depends on hash function and on bucket capacity.

54
Hash Tables
  • So far, only static hash tables, i.e. the number
    B of buckets never changes.
  • With growing number of records, space utilization
    cannot be kept in the desired range.
  • Dynamic hash tables adapt B to the actual number
    of records stored.
  • Goal approximately one block per bucket.
  • Two dynamic methods
  • Extensible Hashing, and
  • Linear Hashing.

55
Extensible Hash Tables
  • Introduction
  • Add a level of indirection for the buckets, a
    directory containing pointers to blocks, one for
    each value of the hash function.
  • Size of directory doubles in each growth step.

. . .
h(K)
to bucket
. . .
56
Extensible Hash Tables
  • Introduction
  • Several buckets can share a data block, if they
    do not contain too many records.
  • Hash function computes sequences of k bits, but
    bucket numbers use only the i first of these
    bits. i is the level of the hash table.

k
00110101
i, grows over time
57
Extensible Hash Tables
  • Insertions
  • To insert record with search key K.
  • Compute h(K) and take its first i bits. Global
    level i is part of the data structure.
  • Retrieve the corresponding directory entry.
  • Follow that pointer leading to block b. b has a
    local level j lt i.
  • If b has enough space, insert record there.
  • Otherwise, split b into two blocks.

58
Extensible Hash Tables
  • Insertions
  • If j lt i, distribute records in b based on
    (j1)st bit of h(K) if 0, old block b, if 1 new
    block b.
  • Increment the local level of b and b by one.
  • Adjust the pointer in the directory that pointed
    to b but must now point to b.
  • If j i, first increment i by one. Double the
    directory size and duplicate all entries. Proceed
    as in case j lt i.

59
Extensible Hash Tables
Example
1
i
0001
1
1001
1100
Insert 1010
60
Extensible Hash Tables
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
61
Extensible Hash Tables
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
62
Extensible Hash Tables
  • Overflow Chains
  • May still needoverflow chainsin the presence
    of too manyduplicates ofhash values.
  • Split does not help if allentries belong to
    sameof two resulting blocks!

if we split
insert 1100
1101
1100
1100
1100
63
Extensible Hash Tables
Overflow Chains
add overflow block
insert 1100
1100
1101
1101
1101
1100
64
Extensible Hash Tables
  • Deletions
  • To delete record with search key K.
  • Using the directory, locate corresponding block b
    and delete record from there.
  • If possible, merge block b with buddy block b
    and adjust the directory pointers to b and b.
  • If possible, halve the directory.? reverse
    insertion procedure

65
Extensible Hash Tables
  • Discussion
  • Can manage growing number of buckets without
    wasting too much space.
  • Assume that directory fits into main memory.
  • Never need to access more than one data block (as
    long as there are no overflow chains) for a
    query.
  • Doubling the directory is a very expensive
    operation. Interrupts other operations and may
    require secondary storage.

66
Linear Hash Tables
  • Introduction
  • No directory.
  • Hash function computes sequences of k bits. Take
    only the i last of these bits and interpret them
    as bucket number m.
  • n number of last bucket, first number is 0.

k
00110101
i, grows over time
67
Linear Hash Tables
  • Insertions
  • If m lt n, store record in bucket m. Otherwise,
    store it in bucket number
  • If bucket overflows, add overflow block.
  • If space utilization becomes too high, add one
    bucket at the end and increment n by 1.
  • ? file grows linearly

68
Linear Hash Tables
  • Insertions
  • Bucket we add is usually not in the range of hash
    keys where an overflow occurred.
  • When n gt 2i, increment i by 1.
  • i is the number of rounds of doubling the size of
    the Linear Hash table.
  • No need to move entries.

69
Linear Hash Tables
Example
k 4, i 2
  • insert 0101

Future growth buckets
0101
1111
1010
00 01 10 11
If h(k)i ? n, then look at bucket h(k)i
else, look at bucket h(k)i 2i -1
n 01
70
Linear Hash Tables
Example
k 4, i 2
Future growth buckets
0101
1111
1010
00 01 10 11
n 01
71
Linear Hash Tables
Example
k 4
i 2
1111
1010
0101
0101
00 01
10 11
. . .
n 11
72
Linear Hash Tables
  • Discussion
  • Can manage growing number of buckets without
    wasting too much space.
  • No directory, i.e. no indirection in access and
    no expensive doubling operation.
  • Significant need for overflow chains, even if no
    duplicates among last i bits of hash values.
Write a Comment
User Comments (0)
About PowerShow.com