SQL: Queries, Programming, Triggers - PowerPoint PPT Presentation

1 / 72

About This Presentation

Title:

SQL: Queries, Programming, Triggers

Description:

See the notes for information on how the s are organized. – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 73

Provided by: RaghuRa78

Category:

more less

Transcript and Presenter's Notes

Title: SQL: Queries, Programming, Triggers

1
Database Systems II Index Structures
2
Introduction

We have discussed the organization of records in
secondary storage blocks.
Records have an address, either logical or
physical.
But SQL queries reference attribute values, not
record addresses.
SELECT FROM R WHERE a10
How to find the records that have certain
specified attribute values?

3
Introduction
recordID1 recordID2. . .
?
value
blocks holding records
matchingrecords
value
index
4
Index-Structure Basics

Storage structures consist of files.
Data files store, e.g., the records of a
relation.
Search key one or more attributes for which we
want to be able to search efficiently.
Index file over a data file for some search key
associates search key values with pointers to
(recordID rid) data file records that have this
value.
Sequential file records sorted according to
their primary key.

5
Index-Structure Basics
Sequential File
6
Index-Structure Basics

Three alternatives for data entries k- record
with key value k- ltk, rid of record with search
key value kgt- ltk, list of rids of records with
search key kgt
Choice is orthogonal to the indexing technique
used to locate entries k
Two major indexing techniques-
tree-structures- hash tables.

7
Index-Structure Basics

Dense index one index entry for every record in
the data file.
Sparse index index entries only for some of the
record in the data file. Typically, one entry per
block of the data file.
Primary index determines the location of data
file records, i.e. order of index entries same as
order of data records.
Secondary index does not determine data location.
Can only have one primary index, but multiple
secondary indexes.

8
Index-Structure Basics
Sequential File
Dense Index
9
Index-Structure Basics
Sequential File
Sparse Index
10
Index-Structure Basics

Duplicate key values
sparse index
data entry for first new key from block

10
20
30
40
11
Index-Structure Basics

Sparse index - requires less index space per
record, - can keep more of index in memory,-
needed for secondary indexes.
Dense index - can tell if any record exists
without accessing data file,- better for
insertions.

12
Index-Structure Basics

Index file can become very large, e.g. at least
one tenth of data file size for records with ten
attributes of same length.
To speed-up index access, add a second index
level on top of the first index level, a third
level on top of the second one, . . .
First level can be dense, other levels are sparse.

13
Index-Structure Basics
Sequential File
Sparse 2nd level
14
Index-Structure Basics

Index structure needs to support Equality Queries
and Range Queries.
Equality query one attribute value specified,
e.g. docID 100, or age 18.
Range query attribute range specified, e.g.
30 lt age lt 40.
Index structures must also support DB
modifications, i.e. insertions, deletions and
updates.

15
ISAM

ISAM Index Sequential Access Method
Hierarchy of index files (tree structure)

Non-leaf
blocks
Leaf
blocks
Primary blocks
16
ISAM

Leaf blocks contain data entries.
Non-leaf blocks contain pairs (ki,pi), where ki
is a search key value and pi a pointer to the
(first of the) records with that search key
value.

P
K
P
K
P
P
K
m
0
1
2
1
m
2
17
ISAM

File Creation
Leaf (data) blocks allocated sequentially, sorted
by search key.
Then non-leaf blocks allocated, then space for
overflow blocks.
Index entries ltsearch key value, block idgt
they direct search for data entries, which are
in leaf blocks.

18
ISAM
Example
19
ISAM

Index Operations
SearchStart at root use key comparisons to go
to leaf.
Insert Find leaf data entry belongs to, and put
it there.
DeleteFind and remove from leaf if empty
overflow block, de-allocate.

20
ISAM

Example
After inserting 23, 48, 41, 42

Root
40
Index
blocks
20
33
51
63
Primary
Leaf
46
55
10
15
20
27
33
37
40
51
97
63
blocks
41
48
23
Overflow
blocks
42
21
ISAM

Example
After deleting 42, 51, 97.
51 appears in index level, but not in leaf
level.

Root
40
20
33
51
63
46
55
10
15
20
27
33
37
40
63
41
48
23
22
ISAM

Discussion
Inserts / deletes affect only leaf pages. ?
static tree structure
Tree can degenerate into a linear list of
overflow blocks.
In this case, ISAM looses all advantages compared
to a simple, non-hierarchical index file.
Can we maintain a balanced tree structure
dynamically under insertions / deletions?

23
B-Trees

Introduction
Tree node corresponds to block.
B-trees are balanced, i.e. all leaves at same
level. This guarantees efficient access.
B-trees guarantee minimum space utilization.
n (order) maximum number of keys per node,
minimum number of keys is roughly n/2.
Exception root may have one key only.
m 1 pointers in node, m actual number of keys.

24
B-Trees
Introduction
? leaf nodes are linked in sequential order
? this B-tree variant is normally referred
to as B-tree
25
B-Trees

Introduction
Node format (p1,k1, . . ., pn,kn,pn1)pi
pointer, ki search key
Node with m pointers has m children and
corresponding sub-trees.
n1-th index entry has only pointer. At leaf
level, this pointer references the next leaf
node.
Search key property i-th subtree contains data
entries with search key k ltki, i1-th subtree
contains data entries with search key k gt ki.

26
B-Trees
Example
Root n 3
100
120 150 180
30
3 5 11
120 130
180 200
100 101 110
150 156 179
30 35
27
B-Trees
Example
57 81 95
to keys to keys to keys to keys lt
57 57? klt81 81?klt95 ?95
Non-leaf (inner) node
28
B-Trees
Example
From non-leaf node to next
leaf in sequence
57 81 95
Leaf node
To record with key 57 To record with key
81 To record with key 85
29
B-Trees
Space utilization
n 3
full node min. node Non-leaf Leaf
120 150 180
30
3 5 11
30 35
counts even if null
30
B-Trees
Space utilization
Number of pointers/keys for B-tree
Max Max Min Min ptrs keys
ptrs?data keys
Non-leaf (non-root)
n1
n
?(n1)/2?
?(n1)/2?- 1
Leaf (non-root)
n1
n
?(n1)/2?
?(n1)/2?
Root
n1
n
1
1
31
B-Trees
Equality Queries

To search for key k, start from root.
At a given node, find nearest key ki and follow
left (pi) or right (pi1) pointer depending on
comparison of k and ki.
Continue, until leaf node reached.
Explores one path from root to leaf node.
Height of B-tree is where N number of records
indexed
? runtime complexity

32
B-Trees
Insertions

Always insert in corresponding leaf.
Tree grows bottom-up.
Four different cases
space available in leaf,
leaf overflow,
non-leaf overflow,
new root.

33
B-Trees
Insertions

Space available in leaf

100
n 3
30
Insert key 32
3 5 11
30 31
34
B-Trees
Insertions

Leaf overflowsplit overflowing node intotwo of
(almost) same sizeand copy middle (separating)
key to father node

100
n 3
30
Insert key 7
3 5 11
30 31
35
B-Trees
Insertions

Non-leaf overflow split overflowing
node and push middle key up to
father node

100
120 150 180
Insert key 160
180 200
150 156 179
36
B-Trees
Insertions

New root split can propagate up
to the root and result in new root

Insert key 45
10 20 30
1 2 3
10 12
20 25
30 32 40
37
B-Trees
Deletions

Locate corresponding leaf node.
Delete specified entry.
Four different cases
Leaf node has still enough entries,
Coalesce with neighbor (sibling),
Re-distribute keys,
Coalesce or re-distribute at non-leaf.

38
B-Trees
Deletions

Coalesce with neighbor (sibling)if node
underflowsand sibling has enoughspace,
coalescethe two nodes

Delete key 50 n4
10 40 100
10 20 30
40 50
39
B-Trees
Deletions

Redistribute keys if node underflowsand sibling
has extraentry, re-distributeentries of the
two nodes

Delete key 50 n4
10 40 100
10 20 30 35
40 50
40
B-Trees
Deletions

Non-leaf coalesce

Delete key 37 n4
25
10 20
30 40
25 26
30 37
40 45
20 22
10 14
1 3
41
B-Trees
B-Trees in Practice

Often, coalescing is not implemented.
It is too hard and typically does not gain a lot
of performance.

42
B-Trees
B-Trees in Practice

Typical order 200, typical space utilization
67. I.e., average fanout 133.
Typical capacities
Height 4 1334 312,900,700 records,
Height 3 1333 2,352,637 records.
Can often hold top levels in buffer pool
Level 1 1 blocks 8 Kbytes,
Level 2 133 blocks 1 Mbyte,
Level 3 17,689 blocks 133 Mbytes.
? O(1) processing for equality queries

43
B-Trees
B-Trees in Practice

Order (n) concept replaced by physical space
criterion in practice (at least half-full).
Inner nodes can typically hold many more entries
than leaf nodes.
Variable sized records and search keys mean
different nodes will contain different numbers of
entries.
Even with fixed length fields, multiple records
with the same search key value (duplicates) can
lead to variable-sized data entries.

44
Hash Tables
Introduction

Tree-based index structures map search key values
to record addresses via a tree structure.
Hash tables perform the same mapping via a hash
function, which computes the record address.
Search key K
Hash function h
B number of buckets.

45
Hash Tables
Introduction

Good hash function should have the following
property expected number of keys the same
(similar) for all buckets.
This is difficult to accomplish for search keys
that have a highly skewed distribution, e.g.
names.
Common hash function K x1 x2 xn n byte
character string
?B often chosen as prime number

46
Hash Tables
Secondary-Storage Hash Tables

Bucket collection of blocks.
Initially, bucket consists of one block.
Records hashed to b are stored in bucket b.
If bucket capacity exceeded, link chain of
overflow buckets.
Assume that address of first block of bucket i
can be computed given i.
E.g., main memory array of pointers to blocks.

47
Hash Tables
Secondary-Storage Hash Tables

Hash tables can perform their mapping directly or
indirectly.

48
Hash Tables
Insertions

To insert record with search key K.
Compute h(K) i.
Insert record into first block of bucket i that
has enough space.
If none of the current blocks has space, add a
new block to the overflow chain, and store new
record there.

49
Hash Tables
Insertions
bucket capacity 2 records
50
Hash Tables

Deletions
To delete record with search key K.
Compute h(K) i.
Locate record(s) with search key K in bucket i.
If possible, move up remaining records within
block.
If possible, move remaining records from overflow
chain to the previous block and de-allocate block.

51
Hash Tables
Deletions
0 1 2 3
a
Deleteef
b
d
c
c
e
f
g
52
Hash Tables

Queries
To find record(s) with search key K.
Compute h(K) i.
Locate record(s) with search key K in bucket i,
following the overflow chain.
In the absence of overflow blocks, only one block
I/O necessary, i.e. O(1) runtime.
This is (much) better than B-trees.
But hash tables do not support range queries!

53
Hash Tables

Queries
In order to keep overflow chains short, keep
space utilization between 50 and 80.
If space utilization lt 50 waste space.
If space utilization gt 80 overflow chains
become significant.
Depends on hash function and on bucket capacity.

54
Hash Tables

So far, only static hash tables, i.e. the number
B of buckets never changes.
With growing number of records, space utilization
cannot be kept in the desired range.
Dynamic hash tables adapt B to the actual number
of records stored.
Goal approximately one block per bucket.
Two dynamic methods
Extensible Hashing, and
Linear Hashing.

55
Extensible Hash Tables

Introduction
Add a level of indirection for the buckets, a
directory containing pointers to blocks, one for
each value of the hash function.
Size of directory doubles in each growth step.

. . .
h(K)
to bucket
. . .
56
Extensible Hash Tables

Introduction
Several buckets can share a data block, if they
do not contain too many records.
Hash function computes sequences of k bits, but
bucket numbers use only the i first of these
bits. i is the level of the hash table.

k
00110101
i, grows over time
57
Extensible Hash Tables

Insertions
To insert record with search key K.
Compute h(K) and take its first i bits. Global
level i is part of the data structure.
Retrieve the corresponding directory entry.
Follow that pointer leading to block b. b has a
local level j lt i.
If b has enough space, insert record there.
Otherwise, split b into two blocks.

58
Extensible Hash Tables

Insertions
If j lt i, distribute records in b based on
(j1)st bit of h(K) if 0, old block b, if 1 new
block b.
Increment the local level of b and b by one.
Adjust the pointer in the directory that pointed
to b but must now point to b.
If j i, first increment i by one. Double the
directory size and duplicate all entries. Proceed
as in case j lt i.

59
Extensible Hash Tables
Example
1
i
0001
1
1001
1100
Insert 1010
60
Extensible Hash Tables
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
61
Extensible Hash Tables
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
62
Extensible Hash Tables

Overflow Chains
May still needoverflow chainsin the presence
of too manyduplicates ofhash values.
Split does not help if allentries belong to
sameof two resulting blocks!

if we split
insert 1100
1101
1100
1100
1100
63
Extensible Hash Tables
Overflow Chains
add overflow block
insert 1100
1100
1101
1101
1101
1100
64
Extensible Hash Tables

Deletions
To delete record with search key K.
Using the directory, locate corresponding block b
and delete record from there.
If possible, merge block b with buddy block b
and adjust the directory pointers to b and b.
If possible, halve the directory.? reverse
insertion procedure

65
Extensible Hash Tables

Discussion
Can manage growing number of buckets without
wasting too much space.
Assume that directory fits into main memory.
Never need to access more than one data block (as
long as there are no overflow chains) for a
query.
Doubling the directory is a very expensive
operation. Interrupts other operations and may
require secondary storage.

66
Linear Hash Tables

Introduction
No directory.
Hash function computes sequences of k bits. Take
only the i last of these bits and interpret them
as bucket number m.
n number of last bucket, first number is 0.

k
00110101
i, grows over time
67
Linear Hash Tables

Insertions
If m lt n, store record in bucket m. Otherwise,
store it in bucket number
If bucket overflows, add overflow block.
If space utilization becomes too high, add one
bucket at the end and increment n by 1.
? file grows linearly

68
Linear Hash Tables

Insertions
Bucket we add is usually not in the range of hash
keys where an overflow occurred.
When n gt 2i, increment i by 1.
i is the number of rounds of doubling the size of
the Linear Hash table.
No need to move entries.

69
Linear Hash Tables
Example
k 4, i 2

insert 0101

Future growth buckets
0101
1111
1010
00 01 10 11
If h(k)i ? n, then look at bucket h(k)i
else, look at bucket h(k)i 2i -1
n 01
70
Linear Hash Tables
Example
k 4, i 2
Future growth buckets
0101
1111
1010
00 01 10 11
n 01
71
Linear Hash Tables
Example
k 4
i 2
1111
1010
0101
0101
00 01
10 11
. . .
n 11
72
Linear Hash Tables

Discussion
Can manage growing number of buckets without
wasting too much space.
No directory, i.e. no indirection in access and
no expensive doubling operation.
Significant need for overflow chains, even if no
duplicates among last i bits of hash values.

Write a Comment

User Comments (0)