INDEX CHAPTER 12 presentation

About This Presentation

Title:

INDEX CHAPTER 12

Description:

Set the local depth of b' and b to equal (local depth of b 1) ... Distinguish between b and b' using their new d and set the appropriate entry(ies) ... –

Number of Views:111

Avg rating:3.0/5.0

Slides: 41

Provided by: RaminSh9

Category:

more less

Transcript and Presenter's Notes

Title: INDEX CHAPTER 12

1
INDEX (CHAPTER 12)
2
TOPICS

Basic concepts
Hashing
B-tree

3
INTRODUCTION

Review

E-R data model
Conceptual
Logical
Relational data model SQL
relation a file Org. of records on a disk
page Organization of attributes within a
record Index Files
Physical
4
INTRODUCTION (Cont)

Motivation Speed-up those queries that reference
only a small portion of the records in a file.
Analogy Catalog cards in the library (more than
one index).
Evaluation
1. Access time (find)
2. Insertion time (find add)
3. Deletion time (find delete)
4. Space overhead
Search-key The attribute (or set of attributes)
used to lookup records in a file
Primary index The index whose search key
specifies the sequential order of the records
within a file.
Secondary index The index whose search key does
not specify the sequential order of the records
within a file.

5
INTRODUCTION (Cont)

Example
Assume, size of disk page 2 data records 5
index records.
Indexing or not indexing?
SELECT age SELECT age
FROM personnel FROM personnel
WHERE name Alice WHERE name Don

6
INTRODUCTION (Cont)

Example
Assume, size of disk page 2 data records 5
index records.
Primary vs. Secondary
SELECT name SELECT age
FROM personnel FROM personnel
WHERE state Ohio WHERE name David

7
INTRODUCTION (Cont)

Example (page 2 data 5 index)
Exact match vs. Range
SELECT name SELECT name
FROM personnel FROM personnel
WHERE state California WHERE state gt
Alaska and
state lt
Florida
Speedup by employing binary search (is it
possible?)

8
Dense Index Files

Dense index Index record appears for every
search-key value in the file.

9
Example of Sparse Index Files
10
Multilevel Index
11
HASHING

Hash function
K the set of all search key values
V the set of all bucket address
h(K) K V
K is large (perhaps infinite) but set of
search-key values actually stored in the database
is much smaller than K.
Fast lookup To find Ki, search the bucket with
h(Ki) address.

12
HASHING (Cont)

Example
K salary (set of all 6 digit integers)
V 1000 buckets addressed from 0 to 999
h(k) k mod 1000.
SELECT name
FROM personnel
WHERE salary 120,100
To find a 120,100 salary, we should search bucket
number 100.
Hash is only appropriate for Exact match queries.
A bad hash function maps the value to a subset of
(or a few) buckets (e.g., h(k) k mod 10.

13
HASHING (Cont)

Clustered Hash Index
The index structure and its buckets are
represented as a file (say file.hash)
The relation is stored in file.hash (I.e., each
entry in file.hash corresponds to a record in
relation)
Assuming no duplicates the record can be
accessed in 1 IO.
Non-clustered Hash Index
The index structure and its buckets are
represented as a file (say file.hash)
The relation remains intact
Each entry in file.hash has the following format
(search-key value, RID)
Assuming no duplicates the record can be
accessed in 2 IO.

14
Block address on disk
Bucket Number
0
1
2
M-2
M-1
15
Example of Hash Index
16
Overflow buckets
Main buckets
340
981
0
Record pointer
460
Record pointer
1
182
Record pointer
2
Record pointer
321
552
Record pointer
761
91
Record pointer
Record pointer
Record pointer
22
72
9
522
Record pointer
17
HASHING (Cont)

Loading factor
B of buckets, S of records per bucket, R
of records in the relation
loading - factor R / (BS)
The loading factor should not exceed 80, if that
happens, double B and re-hash.
Why a bucket might overflow?
Heavy loading of the file
Poor hash functions
Statistical peculiarities
If a bucket overflows?
Chaining chain an empty bucket to the bucket
that overflows.
Open addressing If bucket h(k) is full, store
the record in h(k) 1, if that is also full, try
h(k) 2, and so on.
Two hash functions If bucket h(k) is full, store
the record in h(k).

18
HASHING (Cont)

Problem The file grows and shrinks over time.
Hence, how one should choose the hash function
1. Based on current file size
performance degradation as DB grows
2. Based on anticipated file size waste
space initially (and reduced buffer hits)
3. Periodical reorganization time
consuming
3.1. Choose new hash function
3.2. Recompute hash value on every record
3.3. Generate new bucket assignments
Solution
Dynamic hash functions dynamic modification of h
to accommodate growth and shrinkage of the DB.
(e.g., extendible hashing)

19
HASHING (Cont)

Extendible hashing
Choose a hash function (h) such that it results
in a b (b 32) bit binary number.
The directory has a header that contains its
depth, d.
Each directory entry points to a hash bucket.
Buckets are created on demand, as records are
inserted.
Each bucket contains a local depth used to find
data.

Directory depth
bucket
directory
00
01
10
siblings
11
20
HASHING (Cont)

Extendible hashing (continued)
Every time a bucket overflows, its local depth is
increased. If the local depth is greater
than the depth of the directory, the directorys
depth is increased, causing the directory to
double in size.
Each directory entry has one sibling or buddy.
Two entries are buddies if they have identical
bit patterns except for the dth bit.
Every time a bucket overflows, its local depth is
increased.
If the local depth is greater than the depth of
the directory, then the directorys depth is
increased, causing the directory to double in
size.
A bucket can overflow at any desired loading
factor. That is, a split might happen every time
a bucket is 80 full.

21
HASHING (Cont)

Retrieval with Extendible hashing
Retrieve (K0)
Calculate h h(K0)
Read depth d of the directory
Interpret the d initial bits of h as an integer
base 2, term this r.
Retrieve the bucket pointed to by the rth entry
Find the record in this bucket
5.1. If a hashing technique is used to organize
the records in a bucket, use the d bits defined
on that bucket
5.2. If necessary, follow the collision
resolution scheme within this bucket.

22
HASHING (Cont)

Insertion with Extendible hashing
Insert (K0)
Apply the first four steps of Retrieve (K0) to
find bucket b.
If the insertion of K0 into b result in no
overflow then Insert K0 into b and return
Otherwise, obtain a new bucket b
Set the local depth of b and b to equal (local
depth of b 1)
If the new depth is NOT greater than the depth of
the directory
5.1. Distinguish between b and b using their
new d and set the appropriate entry(ies) of the
directory to point to each
5.2. Rehash the entries in bucket b and
assign each individual entry to the appropriate
bucket b or b
5.3. Insert (K0)
If the new depth is greater than the depth of the
directory
6.1 Increase the depth of the directory,
doubling its size
6.2. Set each entry and its buddy to point to
the old bucket that it was pointing to
6.3. Insert (K0)

23
HASHING (Cont)

Deletion with Extendible hashing
Delete (K0)
Apply the first four steps of Retrieve (K0) to
find bucket b.
If K0 is not b then return with value no found
Otherwise, delete the entry corresponding to K0
If the sum of the number of entries on this page
and its sibling page are below the size of a
bucket then
4.1. Copy the entries in the two buckets
into one bucket b
4.2. Depth of b (depth of b - 1)
4.3. Free bucket b and its sibling
4.4. Locate the two hash directory
entries pointing to b and its buddy. Set these
two pointers to b
4.5. If every pointer in the directory
equals its sibling pointer then decrease the
depth of the directory by one and set each
entry in an obvious manner.

24
Use of Extendable Hash Structure Example
Initial Hash structure, bucket size 2
25
Example (Cont.)

Hash structure after insertion of one Brighton
and two Downtown records

26
Example (Cont.)
Hash structure after insertion of Mianus record
27
Example (Cont.)
Hash structure after insertion of three
Perryridge records
28
Example (Cont.)

Hash structure after insertion of Redwood and
Round Hill records

29
HASHING (Cont)

Extendible hashing
The insertion algorithm of extendible hashing
might crash when

30
HASHING (Cont)

Hashing vs. Indexing
Hashing is appropriate for exact match queries
(cannot support range queries)
SELECT A1, A2,
FROM r
WHERE (Ai c)
Indexing is appropriate for both range and exact
match queries
SELECT A1, A2,
FROM r
WHERE (Ai lt c1) and (Ai gt c2)

31
Example

Suppose that we are using extendable hashing on a
file that contains records with the following
search key values
2, 3, 5, 7, 11, 17, 19, 23, 29, 31
Show the extendable hash structure for this file
if hash function is
h(x) x mod 8 and buckets can hold three records

32
B-TREE

B-tree is a multi-level tree structured
directory
Clustered Leaf nodes contain the records,
themselves.

33
B-TREE (Cont)

Non-clustered Leaf nodes contain the pairs (P,
K), where P is a pointer to the record in the
file and K is a search-key.

34
B-TREE (Cont)

Leaf nodes
Maintain between to n-1 values per
leaf.
If i lt j then Ki lt Kj
Every search-key value in the file appears in
some leaf node.
Suppose Li and Lj are two leaves and i lt j, then
every search value in Li is less than every
search value in Lj.

P1
K1
P2
. . .
Pn-1
Kn-1
Pn
5
7
10
15
17
18
35
B-TREE (Cont)

Internal nodes
Maintain between to n pointers per
internal node
root is an exception It must have more than one
pointer.
Suppose a node with m pointers and 2lt i lt m
Pi points to subtree containing search-key values
lt Ki and gt Ki-1.
Pm points to subtree containing search-key values
gt Km-1.
P1 points to subtree containing search-key values
lt K1.

36
B-TREE (Cont)

Lookup
Find 7 4 Ios
Find 4-20 4 IOs (assuming primary index), 8 IOs
(assuming secondary index)
More than 10 selection it is more efficient to
do sequential scan (do not use the secondary
index).
Example 10,000 records, select 1000 of them,
1000 records per disk page (Sequential search
10 IOs, Secondary index potentially 1000 IOs)

37
B-TREE (Cont)

Analysis
B in B-tree stands for Balanced. i.e., the
length of every path from the root to a leaf node
is the same.
Hence, good performance for lookup, insertion,
and deletion
K number of search key values in a file, then
the path is lt log (K).
K 1,000,000, and 10 lt n lt 100 then at most 3
to 9 nodes be accessed.
Insertion and Deletion should not destroy the
balance of the tree.
References
J. Jannink, Inplementing Deletion in B-Trees,
SIGMOD RECORD, Volume 24, Number 1 (March 1995),
pages 33-38.
D. Comer, The Ubiquitous B-tree, ACM Computing
Surveys, Volume 11, Number 2 (June 1979), pages
121-137

38
B-TREE (Cont)
n 4 Internal nodes 2 to 4 pointers Leaf
nodes 2 to 3 values
8
25
Insert 41
10
20
4
7
30
40
41
Insert 47
30
40
41
47
8
25
41
10
20
4
7
30
40
41
47
39
B-TREE (Cont)
Insert 50
Insert 52
41
47
50
52
8
25
41
50
41
25
8
50
10
20
4
7
30
40
41
47
50
52
40
B-TREE (Cont)
Delete 20
30
underflow
8
41
50
4
7
30
40
41
47
50
52
10
30
41
50
10
4
7
41
47
50
30
40
52

Write a Comment

User Comments (0)

About PowerShow.com