1d index structures

About This Presentation

Title:

1d index structures

Description:

Local depth at buckets to decide if doubling of directory is needed ... Round ends when all initial buckets have been split (i.e. Next = NLevel) ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 19

Provided by: amb79

Category:

more less

Transcript and Presenter's Notes

Title: 1d index structures

1
1-d index structures

B-tree
Logarithmic complexity for equality and range
searches
Hashing
Best for equality searches
Static hashing
Long overflow chains are possible
Dynamic hashing
Linear hashing
Extendible hashing

2
Static hashing

Hash function and buckets
Primary and overflow pages
Single disk I/O for search
More for data resident in overflow pages
Heuristic pages 80 full initially

0
h(key) mod N
1
key
h
N-1
Primary bucket pages
Overflow pages
3
Dynamic hashing Extendible hashing

Directory
Address of buckets
Global depth
Local depth at buckets to decide if doubling of
directory is needed
Mapping from directory to buckets
Many-to-one for buckets with local depth lt global
depth
One-to-one for buckets with local depth global
depth

4
Example

Directory is array of size 4.
Bucket for record r has entry with index
global depth least significant bits of h(r)
If h(r) 5 binary 101, it is in bucket
pointed to by 01.
If h(r) 7 binary 111, it is in bucket
pointed to by 11.

2
LOCAL DEPTH
Bucket A
16
4
12
32
GLOBAL DEPTH
2
1
Bucket B
00
13
1
7
5
01
10
2
Bucket C
10
11
DIRECTORY
5
Handling Inserts

Find bucket where record belongs.
If theres room, put it there.
Else, if bucket is full, split it
increment local depth of original page
allocate new page with new local depth
re-distribute records from original page.
add entry for the new page to the directory

6
Example Insert 21, then 19, 15

21 10101
19 10011
15 01111

LOCAL DEPTH
Bucket A
GLOBAL DEPTH
2
2
1
Bucket B
00
13
1
7
5
21
01
2
10
Bucket C
10
11
DIRECTORY
15
19
7
DATA PAGES
7
Insert h(r)20 (Causes Doubling)
LOCAL DEPTH
Bucket A
GLOBAL DEPTH
2
2
Bucket B
1
5
21
13
00
01
2
10
Bucket C
10
11
2
Bucket D
15
7
19
of Bucket A)
8
Dynamic hashing Linear hashing

Directory not needed
Buckets during a round
Set A buckets split during current round
Set B buckets not split during current round
Set C new buckets created during current round
Next pointer to the beginning of set B next
bucket to be split
Not necessarily the overflowing bucket
Heuristics for splitting and advancement of Next.

9
Main idea

Use a family of hash functions h0, h1, h2, ...
hi(key) h(key) mod(2iN)
N initial buckets
h is some hash function
hi1 doubles the range of hi (similar to
directory doubling)

Algorithm proceeds in rounds. Current round
number is Level.
There are NLevel ( N 2Level) buckets at the
beginning of a round
Buckets 0 to Next-1 have been split Next to
NLevel 1 have not been split yet this round.
Round ends when all initial buckets have been
split (i.e. Next NLevel).
To start next round, increment level and reset
Next.

11
LH Search Algorithm

To find bucket for data entry r, find hLevel(r)
If hLevel(r) gt Next (i.e., hLevel(r) is a bucket
that hasnt been involved in a split this round)
then r belongs in that bucket.
Else, r could belong to bucket hLevel(r) or
bucket hLevel(r) NLevel . Apply hLevel1(r) to
find out.

12
Example Search 44 (11100), 9
(01001)
Level0, Next0, N4
h
h
0
1
00
000
001
01
10
010
011
11
PRIMARY
(This info is for illustration only!)
PAGES
13
Insert 43
Level0, Next 1, N4
14
Insert operation

Find appropriate bucket
If bucket is full
Add overflow page and insert data entry.
Split Next bucket and increment Next.
Note This is likely NOT the bucket being
inserted to!!!
to split a bucket, create a new bucket and use
hLevel1 to re-distribute entries.
Since buckets are split round-robin, long
overflow chains dont develop.

15
Example End of a Round
Level1, Next 0
Insert 37, 29, 22, 66, 34, 50
Level0, Next 3
PRIMARY
OVERFLOW
PAGES
h
PAGES
h
0
1
32
00
000
9
25
001
01
10
66
10
18
34
010
Next3
43
35
31
7
11
011
11
44
36
100
00
5
37
29
101
01
22
14
30
10
110
16
Summary

Hash-based indexes best for equality searches,
cannot support range searches.
Static Hashing can lead to long overflow chains.
Extendible Hashing avoids overflow pages by
splitting a full bucket when a new data entry is
to be added to it.
Directory to keep track of buckets, doubles
periodically.
Can get large with skewed data additional I/O if
this does not fit in main memory.

17
Summary (Contd.)

Linear Hashing avoids directory by splitting
buckets round-robin, and using overflow pages.
Overflow pages not likely to be long.
Space utilization could be lower than Extendible
Hashing, since splits not concentrated on dense
data areas.
Can tune criterion for triggering splits to
trade-off slightly longer chains for better space
utilization.

1d index structures - PowerPoint PPT Presentation

1d index structures

Local depth at buckets to decide if doubling of directory is needed ... Round ends when all initial buckets have been split (i.e. Next = NLevel) ... – PowerPoint PPT presentation