INDEX CHAPTER 12 - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

INDEX CHAPTER 12

Description:

Set the local depth of b' and b to equal (local depth of b 1) ... Distinguish between b and b' using their new d and set the appropriate entry(ies) ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 41
Provided by: RaminSh9
Category:
Tags: chapter | index

less

Transcript and Presenter's Notes

Title: INDEX CHAPTER 12


1
INDEX (CHAPTER 12)
2
TOPICS
  • Basic concepts
  • Hashing
  • B-tree

3
INTRODUCTION
  • Review

E-R data model
Conceptual
Logical
Relational data model SQL
relation a file Org. of records on a disk
page Organization of attributes within a
record Index Files
Physical
4
INTRODUCTION (Cont)
  • Motivation Speed-up those queries that reference
    only a small portion of the records in a file.
  • Analogy Catalog cards in the library (more than
    one index).
  • Evaluation
  • 1. Access time (find)
  • 2. Insertion time (find add)
  • 3. Deletion time (find delete)
  • 4. Space overhead
  • Search-key The attribute (or set of attributes)
    used to lookup records in a file
  • Primary index The index whose search key
    specifies the sequential order of the records
    within a file.
  • Secondary index The index whose search key does
    not specify the sequential order of the records
    within a file.

5
INTRODUCTION (Cont)
  • Example
  • Assume, size of disk page 2 data records 5
    index records.
  • Indexing or not indexing?
  • SELECT age SELECT age
  • FROM personnel FROM personnel
  • WHERE name Alice WHERE name Don

6
INTRODUCTION (Cont)
  • Example
  • Assume, size of disk page 2 data records 5
    index records.
  • Primary vs. Secondary
  • SELECT name SELECT age
  • FROM personnel FROM personnel
  • WHERE state Ohio WHERE name David

7
INTRODUCTION (Cont)
  • Example (page 2 data 5 index)
  • Exact match vs. Range
  • SELECT name SELECT name
  • FROM personnel FROM personnel
  • WHERE state California WHERE state gt
    Alaska and
  • state lt
    Florida
  • Speedup by employing binary search (is it
    possible?)

8
Dense Index Files
  • Dense index Index record appears for every
    search-key value in the file.

9
Example of Sparse Index Files
10
Multilevel Index
11
HASHING
  • Hash function
  • K the set of all search key values
  • V the set of all bucket address
  • h(K) K V
  • K is large (perhaps infinite) but set of
    search-key values actually stored in the database
    is much smaller than K.
  • Fast lookup To find Ki, search the bucket with
    h(Ki) address.

12
HASHING (Cont)
  • Example
  • K salary (set of all 6 digit integers)
  • V 1000 buckets addressed from 0 to 999
  • h(k) k mod 1000.
  • SELECT name
  • FROM personnel
  • WHERE salary 120,100
  • To find a 120,100 salary, we should search bucket
    number 100.
  • Hash is only appropriate for Exact match queries.
  • A bad hash function maps the value to a subset of
    (or a few) buckets (e.g., h(k) k mod 10.

13
HASHING (Cont)
  • Clustered Hash Index
  • The index structure and its buckets are
    represented as a file (say file.hash)
  • The relation is stored in file.hash (I.e., each
    entry in file.hash corresponds to a record in
    relation)
  • Assuming no duplicates the record can be
    accessed in 1 IO.
  • Non-clustered Hash Index
  • The index structure and its buckets are
    represented as a file (say file.hash)
  • The relation remains intact
  • Each entry in file.hash has the following format
    (search-key value, RID)
  • Assuming no duplicates the record can be
    accessed in 2 IO.

14
Block address on disk
Bucket Number
0
1
2
M-2
M-1
15
Example of Hash Index
16
Overflow buckets
Main buckets
340
981
0
Record pointer
460
Record pointer
1
182
Record pointer
2
Record pointer
321
552
Record pointer
761
91
Record pointer
Record pointer
Record pointer
22
72
9
522
Record pointer
17
HASHING (Cont)
  • Loading factor
  • B of buckets, S of records per bucket, R
    of records in the relation
  • loading - factor R / (BS)
  • The loading factor should not exceed 80, if that
    happens, double B and re-hash.
  • Why a bucket might overflow?
  • Heavy loading of the file
  • Poor hash functions
  • Statistical peculiarities
  • If a bucket overflows?
  • Chaining chain an empty bucket to the bucket
    that overflows.
  • Open addressing If bucket h(k) is full, store
    the record in h(k) 1, if that is also full, try
    h(k) 2, and so on.
  • Two hash functions If bucket h(k) is full, store
    the record in h(k).

18
HASHING (Cont)
  • Problem The file grows and shrinks over time.
    Hence, how one should choose the hash function
  • 1. Based on current file size
    performance degradation as DB grows
  • 2. Based on anticipated file size waste
    space initially (and reduced buffer hits)
  • 3. Periodical reorganization time
    consuming
  • 3.1. Choose new hash function
  • 3.2. Recompute hash value on every record
  • 3.3. Generate new bucket assignments
  • Solution
  • Dynamic hash functions dynamic modification of h
    to accommodate growth and shrinkage of the DB.
    (e.g., extendible hashing)

19
HASHING (Cont)
  • Extendible hashing
  • Choose a hash function (h) such that it results
    in a b (b 32) bit binary number.
  • The directory has a header that contains its
    depth, d.
  • Each directory entry points to a hash bucket.
  • Buckets are created on demand, as records are
    inserted.
  • Each bucket contains a local depth used to find
    data.

Directory depth
bucket
directory
00
01
10
siblings
11
20
HASHING (Cont)
  • Extendible hashing (continued)
  • Every time a bucket overflows, its local depth is
    increased. If the local depth is greater
    than the depth of the directory, the directorys
    depth is increased, causing the directory to
    double in size.
  • Each directory entry has one sibling or buddy.
    Two entries are buddies if they have identical
    bit patterns except for the dth bit.
  • Every time a bucket overflows, its local depth is
    increased.
  • If the local depth is greater than the depth of
    the directory, then the directorys depth is
    increased, causing the directory to double in
    size.
  • A bucket can overflow at any desired loading
    factor. That is, a split might happen every time
    a bucket is 80 full.

21
HASHING (Cont)
  • Retrieval with Extendible hashing
  • Retrieve (K0)
  • Calculate h h(K0)
  • Read depth d of the directory
  • Interpret the d initial bits of h as an integer
    base 2, term this r.
  • Retrieve the bucket pointed to by the rth entry
  • Find the record in this bucket
  • 5.1. If a hashing technique is used to organize
    the records in a bucket, use the d bits defined
    on that bucket
  • 5.2. If necessary, follow the collision
    resolution scheme within this bucket.

22
HASHING (Cont)
  • Insertion with Extendible hashing
  • Insert (K0)
  • Apply the first four steps of Retrieve (K0) to
    find bucket b.
  • If the insertion of K0 into b result in no
    overflow then Insert K0 into b and return
  • Otherwise, obtain a new bucket b
  • Set the local depth of b and b to equal (local
    depth of b 1)
  • If the new depth is NOT greater than the depth of
    the directory
  • 5.1. Distinguish between b and b using their
    new d and set the appropriate entry(ies) of the
    directory to point to each
  • 5.2. Rehash the entries in bucket b and
    assign each individual entry to the appropriate
    bucket b or b
  • 5.3. Insert (K0)
  • If the new depth is greater than the depth of the
    directory
  • 6.1 Increase the depth of the directory,
    doubling its size
  • 6.2. Set each entry and its buddy to point to
    the old bucket that it was pointing to
  • 6.3. Insert (K0)

23
HASHING (Cont)
  • Deletion with Extendible hashing
  • Delete (K0)
  • Apply the first four steps of Retrieve (K0) to
    find bucket b.
  • If K0 is not b then return with value no found
  • Otherwise, delete the entry corresponding to K0
  • If the sum of the number of entries on this page
    and its sibling page are below the size of a
    bucket then
  • 4.1. Copy the entries in the two buckets
    into one bucket b
  • 4.2. Depth of b (depth of b - 1)
  • 4.3. Free bucket b and its sibling
  • 4.4. Locate the two hash directory
    entries pointing to b and its buddy. Set these
    two pointers to b
  • 4.5. If every pointer in the directory
    equals its sibling pointer then decrease the
    depth of the directory by one and set each
    entry in an obvious manner.

24
Use of Extendable Hash Structure Example
Initial Hash structure, bucket size 2
25
Example (Cont.)
  • Hash structure after insertion of one Brighton
    and two Downtown records

26
Example (Cont.)
Hash structure after insertion of Mianus record
27
Example (Cont.)
Hash structure after insertion of three
Perryridge records
28
Example (Cont.)
  • Hash structure after insertion of Redwood and
    Round Hill records

29
HASHING (Cont)
  • Extendible hashing
  • The insertion algorithm of extendible hashing
    might crash when

30
HASHING (Cont)
  • Hashing vs. Indexing
  • Hashing is appropriate for exact match queries
    (cannot support range queries)
  • SELECT A1, A2,
  • FROM r
  • WHERE (Ai c)
  • Indexing is appropriate for both range and exact
    match queries
  • SELECT A1, A2,
  • FROM r
  • WHERE (Ai lt c1) and (Ai gt c2)

31
Example
  • Suppose that we are using extendable hashing on a
    file that contains records with the following
    search key values
  • 2, 3, 5, 7, 11, 17, 19, 23, 29, 31
  • Show the extendable hash structure for this file
    if hash function is
  • h(x) x mod 8 and buckets can hold three records

32
B-TREE
  • B-tree is a multi-level tree structured
    directory
  • Clustered Leaf nodes contain the records,
    themselves.

33
B-TREE (Cont)
  • Non-clustered Leaf nodes contain the pairs (P,
    K), where P is a pointer to the record in the
    file and K is a search-key.

34
B-TREE (Cont)
  • Leaf nodes
  • Maintain between to n-1 values per
    leaf.
  • If i lt j then Ki lt Kj
  • Every search-key value in the file appears in
    some leaf node.
  • Suppose Li and Lj are two leaves and i lt j, then
    every search value in Li is less than every
    search value in Lj.

P1
K1
P2
. . .
Pn-1
Kn-1
Pn
5
7
10
15
17
18
35
B-TREE (Cont)
  • Internal nodes
  • Maintain between to n pointers per
    internal node
  • root is an exception It must have more than one
    pointer.
  • Suppose a node with m pointers and 2lt i lt m
  • Pi points to subtree containing search-key values
    lt Ki and gt Ki-1.
  • Pm points to subtree containing search-key values
    gt Km-1.
  • P1 points to subtree containing search-key values
    lt K1.

36
B-TREE (Cont)
  • Lookup
  • Find 7 4 Ios
  • Find 4-20 4 IOs (assuming primary index), 8 IOs
    (assuming secondary index)
  • More than 10 selection it is more efficient to
    do sequential scan (do not use the secondary
    index).
  • Example 10,000 records, select 1000 of them,
    1000 records per disk page (Sequential search
    10 IOs, Secondary index potentially 1000 IOs)

37
B-TREE (Cont)
  • Analysis
  • B in B-tree stands for Balanced. i.e., the
    length of every path from the root to a leaf node
    is the same.
  • Hence, good performance for lookup, insertion,
    and deletion
  • K number of search key values in a file, then
    the path is lt log (K).
  • K 1,000,000, and 10 lt n lt 100 then at most 3
    to 9 nodes be accessed.
  • Insertion and Deletion should not destroy the
    balance of the tree.
  • References
  • J. Jannink, Inplementing Deletion in B-Trees,
    SIGMOD RECORD, Volume 24, Number 1 (March 1995),
    pages 33-38.
  • D. Comer, The Ubiquitous B-tree, ACM Computing
    Surveys, Volume 11, Number 2 (June 1979), pages
    121-137

38
B-TREE (Cont)
n 4 Internal nodes 2 to 4 pointers Leaf
nodes 2 to 3 values
8
25
Insert 41
10
20
4
7
30
40
41
Insert 47
30
40
41
47
8
25
41
10
20
4
7
30
40
41
47
39
B-TREE (Cont)
Insert 50
Insert 52
41
47
50
52
8
25
41
50
41
25
8
50
10
20
4
7
30
40
41
47
50
52
40
B-TREE (Cont)
Delete 20
30
underflow
8
41
50
4
7
30
40
41
47
50
52
10
30
41
50
10
4
7
41
47
50
30
40
52
Write a Comment
User Comments (0)
About PowerShow.com