Title: Chapter 11: Indexing and Hashing
1Chapter 11 Indexing and Hashing
- Indexing
- Basic Concepts
- Ordered Indices
- B-Tree Index Files
- Hashing
- Static
- Dynamic Hashing
2Hashing
- Static hashing
- Dynamic hashing
3Hashing
ltkeygt
Buckets (typically 1 disk block)
. . .
4Example hash function
- Key x1 x2 xn n byte character string
- Have b buckets
- h add x1 x2 .. xn
- compute sum modulo b
5- ? This may not be best function
Good hash ? Expected number of
function keys/bucket is the same for all
buckets
6Within a bucket
- Yes, if CPU time critical
- Inserts/Deletes not too frequent
7Next example to illustrate inserts,
overflows, deletes
8EXAMPLE 2 records/bucket
- INSERT
- h(a) 1
- h(b) 2
- h(c) 1
- h(d) 0
0 1 2 3
h(e) 1
9EXAMPLE deletion
Deleteef
0 1 2 3
a
b
d
c
c
e
f
g
10Rule of thumb
- Try to keep space utilization
- between 50 and 80
- Utilization keys used
- total keys that fit
- If lt 50, wasting space
- If gt 80, overflows significant depends on how
good hash function is on keys/bucket
11How do we cope with growth?
- Overflows and reorganizations
- Dynamic hashing
12Extensible hashing two ideas
- (a) Use i of b bits output by hash function
- b
- h(K) ?
- use i ? grows over time.
00110101
13- (b) Use directory
- h(K)i to bucket
. . .
. . .
14Example h(k) is 4 bits 2 keys/bucket
1
0001
1
1001
1100
Insert 1010
15Example continued
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
16Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
17Extensible hashing deletion
- No merging of blocks
- Merge blocks and cut directory if possible
- (Reverse insert procedure)
18Deletion example
- Run thru insert example in reverse!
19 Extensible hashing
Summary
- Can handle growing files
- - with less wasted space
- - with no full reorganizations
20Advanced indexing
- Multiple attributes
- Bitmap indexing
21Multiple-Key Access
- Use multiple indices for certain types of
queries. - Example
- select account-number
- from account
- where branch-name Perryridge and balance
1000 - Possible strategies?
22Indices on Multiple Attributes
- where branch-name PP and balance 1000
Suppose we have an index on combined
search-key (branch-name, balance).
BB,1000
CC,200 PP,800 PP,1500
AB,200
AA,2000 AA,2300 AA,2500
CC,200 DD,200 DD,300
PP,800 PP,1000 PP,1300
PP,1500 PP,1560
AB,200 AC,200
CC,200 PP,300
23Suppose we have an index on combined
search-key (branch-name, balance).
- where branch-name PP and balance lt 1000
search pp,0
BB,1000
CC,200 PP,800 PP,1500
AB,200
AA,2000 AA,2300 AA,2500
CC,200 DD,200 DD,300
PP,800 PP,1000 PP,1300
PP,1500 PP,1560
AB,200 AC,200
CC,200 PP,300
search pp,1000
24Suppose we have an index on combined
search-key (branch-name, balance).
NO!
- where branch-name lt PP and balance 1000?
BB,1000
CC,200 PP,800 PP,1500
AB,200
AA,2000 AA,2300 AA,2500
CC,200 DD,200 DD,300
PP,800 PP,1000 PP,1300
PP,1500 PP,1560
AB,200 AC,200
CC,200 PP,300
25Bitmap Indices
- An index designed for multiple valued search keys
26Bitmap Indices (Cont.)
The income-level value of record 3 is L1
Bitmap(size table size)
Unique values of gender
Unique values of income-level
27Bitmap Indices (Cont.)
- Some properties of bitmap indices
- Number of bitmaps for each attribute?
- Size of each bitmap?
- When is the bitmap matrix sparse and what
attributes are good for bitmap indices?
28Bitmap Indices (Cont.)
- Bitmap indices generally very small compared with
relation size - E.g. if record is 100 bytes, space for a single
bitmap is 1/800 of space used by relation. - If number of distinct attribute values is 8,
bitmap is only 1 of relation size - What about insertion?
- Deletion?
29Bitmap Indices Queries
Sample query Males with income level L1
even faster!
10010 AND 10100 10000
What about the number of males with income level
L1?
30Bitmap Indices Queries
- Queries are answered using bitmap operations
- Intersection (and)
- Union (or)
- Complementation (not)