1d index structures - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

1d index structures

Description:

Local depth at buckets to decide if doubling of directory is needed ... Round ends when all initial buckets have been split (i.e. Next = NLevel) ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 19
Provided by: amb79
Category:

less

Transcript and Presenter's Notes

Title: 1d index structures


1
1-d index structures
  • B-tree
  • Logarithmic complexity for equality and range
    searches
  • Hashing
  • Best for equality searches
  • Static hashing
  • Long overflow chains are possible
  • Dynamic hashing
  • Linear hashing
  • Extendible hashing

2
Static hashing
  • Hash function and buckets
  • Primary and overflow pages
  • Single disk I/O for search
  • More for data resident in overflow pages
  • Heuristic pages 80 full initially

0
h(key) mod N
1
key
h
N-1
Primary bucket pages
Overflow pages
3
Dynamic hashing Extendible hashing
  • Directory
  • Address of buckets
  • Global depth
  • Local depth at buckets to decide if doubling of
    directory is needed
  • Mapping from directory to buckets
  • Many-to-one for buckets with local depth lt global
    depth
  • One-to-one for buckets with local depth global
    depth

4
Example
  • Directory is array of size 4.
  • Bucket for record r has entry with index
    global depth least significant bits of h(r)
  • If h(r) 5 binary 101, it is in bucket
    pointed to by 01.
  • If h(r) 7 binary 111, it is in bucket
    pointed to by 11.

2
LOCAL DEPTH
Bucket A
16
4
12
32
GLOBAL DEPTH
2
1
Bucket B
00
13
1
7
5
01
10
2
Bucket C
10
11
DIRECTORY
5
Handling Inserts
  • Find bucket where record belongs.
  • If theres room, put it there.
  • Else, if bucket is full, split it
  • increment local depth of original page
  • allocate new page with new local depth
  • re-distribute records from original page.
  • add entry for the new page to the directory

6
Example Insert 21, then 19, 15
  • 21 10101
  • 19 10011
  • 15 01111

LOCAL DEPTH
Bucket A
GLOBAL DEPTH
2
2
1
Bucket B
00
13
1
7
5
21
01
2
10
Bucket C
10
11
DIRECTORY
15
19
7
DATA PAGES
7
Insert h(r)20 (Causes Doubling)
LOCAL DEPTH
Bucket A
GLOBAL DEPTH
2
2
Bucket B
1
5
21
13
00
01
2
10
Bucket C
10
11
2
Bucket D
15
7
19
of Bucket A)
8
Dynamic hashing Linear hashing
  • Directory not needed
  • Buckets during a round
  • Set A buckets split during current round
  • Set B buckets not split during current round
  • Set C new buckets created during current round
  • Next pointer to the beginning of set B next
    bucket to be split
  • Not necessarily the overflowing bucket
  • Heuristics for splitting and advancement of Next.

9
Main idea
  • Use a family of hash functions h0, h1, h2, ...
  • hi(key) h(key) mod(2iN)
  • N initial buckets
  • h is some hash function
  • hi1 doubles the range of hi (similar to
    directory doubling)

10
  • Algorithm proceeds in rounds. Current round
    number is Level.
  • There are NLevel ( N 2Level) buckets at the
    beginning of a round
  • Buckets 0 to Next-1 have been split Next to
    NLevel 1 have not been split yet this round.
  • Round ends when all initial buckets have been
    split (i.e. Next NLevel).
  • To start next round, increment level and reset
    Next.

11
LH Search Algorithm
  • To find bucket for data entry r, find hLevel(r)
  • If hLevel(r) gt Next (i.e., hLevel(r) is a bucket
    that hasnt been involved in a split this round)
    then r belongs in that bucket.
  • Else, r could belong to bucket hLevel(r) or
    bucket hLevel(r) NLevel . Apply hLevel1(r) to
    find out.

12
Example Search 44 (11100), 9
(01001)
Level0, Next0, N4
h
h
0
1
00
000
001
01
10
010
011
11
PRIMARY
(This info is for illustration only!)
PAGES
13
Insert 43
Level0, Next 1, N4
14
Insert operation
  • Find appropriate bucket
  • If bucket is full
  • Add overflow page and insert data entry.
  • Split Next bucket and increment Next.
  • Note This is likely NOT the bucket being
    inserted to!!!
  • to split a bucket, create a new bucket and use
    hLevel1 to re-distribute entries.
  • Since buckets are split round-robin, long
    overflow chains dont develop.

15
Example End of a Round
Level1, Next 0
Insert 37, 29, 22, 66, 34, 50
Level0, Next 3
PRIMARY
OVERFLOW
PAGES
h
PAGES
h
0
1
32
00
000
9
25
001
01
10
66
10
18
34
010
Next3
43
35
31
7
11
011
11
44
36
100
00
5
37
29
101
01
22
14
30
10
110
16
Summary
  • Hash-based indexes best for equality searches,
    cannot support range searches.
  • Static Hashing can lead to long overflow chains.
  • Extendible Hashing avoids overflow pages by
    splitting a full bucket when a new data entry is
    to be added to it.
  • Directory to keep track of buckets, doubles
    periodically.
  • Can get large with skewed data additional I/O if
    this does not fit in main memory.

17
Summary (Contd.)
  • Linear Hashing avoids directory by splitting
    buckets round-robin, and using overflow pages.
  • Overflow pages not likely to be long.
  • Space utilization could be lower than Extendible
    Hashing, since splits not concentrated on dense
    data areas.
  • Can tune criterion for triggering splits to
    trade-off slightly longer chains for better space
    utilization.

18
References
  • Database textbook, e.g.,
  • Database management systems, Ramakrishnan and
    Gehrke, Mcgraw Hill, 2nd edition.
Write a Comment
User Comments (0)
About PowerShow.com