Title: CS 245: Database System Principles
1CS 245 Database System Principles
Notes 5 Hashing and More
2Hashing
ltkeygt
Buckets (typically 1 disk block)
. . .
3. . .
records
(1) key ? h(key)
. . .
4Two alternatives
record
(2) key ? h(key)
key 1
Index
5Two alternatives
record
(2) key ? h(key)
key 1
Index
- Alt (2) for secondary search key
6Example hash function
- Key x1 x2 xn n byte character string
- Have b buckets
- h add x1 x2 .. xn
- compute sum modulo b
7- ? This may not be best function
- ? Read Knuth Vol. 3 if you really need to
select a good function.
8- ? This may not be best function
- ? Read Knuth Vol. 3 if you really need to
select a good function.
Good hash ? Expected number of
function keys/bucket is the same for all
buckets
9Within a bucket
- Yes, if CPU time critical
- Inserts/Deletes not too frequent
10Next example to illustrate inserts,
overflows, deletes
11EXAMPLE 2 records/bucket
0 1 2 3
- INSERT
- h(a) 1
- h(b) 2
- h(c) 1
- h(d) 0
12EXAMPLE 2 records/bucket
0 1 2 3
- INSERT
- h(a) 1
- h(b) 2
- h(c) 1
- h(d) 0
h(e) 1
13EXAMPLE 2 records/bucket
0 1 2 3
- INSERT
- h(a) 1
- h(b) 2
- h(c) 1
- h(d) 0
h(e) 1
14EXAMPLE deletion
Deleteef
0 1 2 3
a
b
d
c
e
f
g
15EXAMPLE deletion
Deleteef
0 1 2 3
a
b
d
c
c
e
f
g
16EXAMPLE deletion
Deleteef
0 1 2 3
a
b
d
c
c
e
f
g
17Rule of thumb
- Try to keep space utilization
- between 50 and 80
- Utilization keys used
- total keys that fit
18Rule of thumb
- Try to keep space utilization
- between 50 and 80
- Utilization keys used
- total keys that fit
19How do we cope with growth?
- Overflows and reorganizations
- Dynamic hashing
20How do we cope with growth?
- Overflows and reorganizations
- Dynamic hashing
21Extensible hashing two ideas
- (a) Use i of b bits output by hash function
- b
- h(K) ?
- use i ? grows over time.
00110101
22- (b) Use directory
- h(K)i to bucket
. . .
. . .
23Example h(k) is 4 bits 2 keys/bucket
1
0001
1
1001
1100
Insert 1010
24Example h(k) is 4 bits 2 keys/bucket
1
0001
1
1001
1100
Insert 1010
25Example h(k) is 4 bits 2 keys/bucket
1
0001
1
1001
1100
Insert 1010
26Example continued
i
2
00 01 10 11
1
0001
1001
1010
Insert 0111 0000
1100
27Example continued
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
28Example continued
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
29Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
30Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
31Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
32Extensible hashing deletion
- No merging of blocks
- Merge blocks and cut directory if possible
- (Reverse insert procedure)
33Deletion example
- Run thru insert example in reverse!
34Note Still need overflow chains
- Example many records with duplicate keys
if we split
insert 1100
1101
1100
1100
1100
35Solution overflow chains
add overflow block
insert 1100
1100
1101
1101
1101
1100
36 Extensible hashing
Summary
- Can handle growing files
- - with less wasted space
- - with no full reorganizations
37 Extensible hashing
Summary
- Can handle growing files
- - with less wasted space
- - with no full reorganizations
38Linear hashing
- Another dynamic hashing scheme
39Linear hashing
- Another dynamic hashing scheme
40Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
m 01 (max used block)
41Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
m 01 (max used block)
42Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
m 01 (max used block)
43Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
m 01 (max used block)
44Note
- In textbook, n is used instead of m
- nm1
n10
Future growth buckets
0101
0000
1111
1010
00 01 10 11
m 01 (max used block)
45Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
m 01 (max used block)
46Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
m 01 (max used block)
47Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
m 01 (max used block)
48Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
m 01 (max used block)
49Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010
m 01 (max used block)
50Example Continued How to grow beyond this?
i 2
1111
1010
0101
0000
0101
. . .
m 11 (max used block)
51Example Continued How to grow beyond this?
i 2
1111
1010
0101
0000
0101
. . .
m 11 (max used block)
52Example Continued How to grow beyond this?
i 2
1111
1010
0101
0000
0101
. . .
m 11 (max used block)
53Example Continued How to grow beyond this?
i 2
1111
1010
0101
0000
0101
. . .
m 11 (max used block)
54? When do we expand file?
- Keep track of used slots
- total of slots
U
55? When do we expand file?
- Keep track of used slots
- total of slots
U
- If U gt threshold then increase m
- (and maybe i )
56 Linear Hashing
Summary
- Can handle growing files
- - with less wasted space
- - with no full reorganizations
-
- No indirection like extensible hashing
57Example BAD CASE
- Very full
- Very empty Need to move
- m here
- Would waste
- space...
-
58Summary
- Hashing
- - How it works
- - Dynamic hashing
- - Extensible
- - Linear
59Next
- Indexing vs Hashing
- Index definition in SQL
- Multiple key access
60Indexing vs Hashing
- Hashing good for probes given key
- e.g., SELECT
- FROM R
- WHERE R.A 5
61Indexing vs Hashing
- INDEXING (Including B Trees) good for
- Range Searches
- e.g., SELECT
- FROM R
- WHERE R.A gt 5
62Index definition in SQL
- Create index name on rel (attr)
- Create unique index name on rel (attr)
defines candidate key
63- CANNOT SPECIFY TYPE OF INDEX
- (e.g. B-tree, Hashing, )
- OR PARAMETERS
- (e.g. Load Factor, Size of Hash,...)
- ... at least in SQL...
Note
64- ATTRIBUTE LIST ? MULTIKEY INDEX
- (next)
- e.g., CREATE INDEX foo ON R(A,B,C)
Note
65 Multi-key Index
- Motivation Find records where
- DEPT Toy AND SAL gt 50k
66Strategy I
- Use one index, say Dept.
- Get all Dept Toy records and
check their salary
I1
67Strategy II
- Use 2 Indexes Manipulate Pointers
- Toy Sal
- gt 50k
68Strategy III
- Multiple Key Index
- One idea
I2
I3
I1
69Example
10k
15k
- Example
- Record
- Dept
- Index
- Salary
- Index
17k
21k
NameJoe DEPTSales SAL15k
12k
15k
15k
19k
70For which queries is this index good?
- Find RECs Dept Sales SAL20k
- Find RECs Dept Sales SAL gt 20k
- Find RECs Dept Sales
- Find RECs SAL 20k
71The BIG picture.
- Chapters 11 12 13 Storage, records,
blocks... - Chapters 13 14 14 Access Mechanisms -
Indexes - - B trees
- - Hashing
- - Multi key
- Chapters 15 16 15, 16 Query Processing
NEXT