Chapter 11: Indexing and Hashing - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Chapter 11: Indexing and Hashing

Description:

... no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys ... Good hash Expected number of. function: keys/bucket is the. same for all buckets. 39 ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 31
Provided by: marily230
Category:

less

Transcript and Presenter's Notes

Title: Chapter 11: Indexing and Hashing


1
Chapter 11 Indexing and Hashing
  • Indexing
  • Basic Concepts
  • Ordered Indices
  • B-Tree Index Files
  • Hashing
  • Static
  • Dynamic Hashing

2
Hashing
  • Static hashing
  • Dynamic hashing

3
Hashing
  • key ? h(key)

ltkeygt
Buckets (typically 1 disk block)
. . .
4
Example hash function
  • Key x1 x2 xn n byte character string
  • Have b buckets
  • h add x1 x2 .. xn
  • compute sum modulo b

5
  • ? This may not be best function

Good hash ? Expected number of
function keys/bucket is the same for all
buckets
6
Within a bucket
  • Do we keep keys sorted?
  • Yes, if CPU time critical
  • Inserts/Deletes not too frequent

7
Next example to illustrate inserts,
overflows, deletes
  • h(K)

8
EXAMPLE 2 records/bucket
  • INSERT
  • h(a) 1
  • h(b) 2
  • h(c) 1
  • h(d) 0

0 1 2 3
h(e) 1
9
EXAMPLE deletion
Deleteef
0 1 2 3
a
b
d
c
c
e
f
g
10
Rule of thumb
  • Try to keep space utilization
  • between 50 and 80
  • Utilization keys used
  • total keys that fit
  • If lt 50, wasting space
  • If gt 80, overflows significant depends on how
    good hash function is on keys/bucket

11
How do we cope with growth?
  • Overflows and reorganizations
  • Dynamic hashing
  • Extensible
  • Linear

12
Extensible hashing two ideas
  • (a) Use i of b bits output by hash function
  • b
  • h(K) ?
  • use i ? grows over time.

00110101
13
  • (b) Use directory
  • h(K)i to bucket

. . .
. . .
14
Example h(k) is 4 bits 2 keys/bucket
1
  • i

0001
1
1001
1100
Insert 1010
15
Example continued
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
16
Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
17
Extensible hashing deletion
  • No merging of blocks
  • Merge blocks and cut directory if possible
  • (Reverse insert procedure)

18
Deletion example
  • Run thru insert example in reverse!

19
Extensible hashing
Summary
  • Can handle growing files
  • - with less wasted space
  • - with no full reorganizations


20
Advanced indexing
  • Multiple attributes
  • Bitmap indexing

21
Multiple-Key Access
  • Use multiple indices for certain types of
    queries.
  • Example
  • select account-number
  • from account
  • where branch-name Perryridge and balance
    1000
  • Possible strategies?

22
Indices on Multiple Attributes
  • where branch-name PP and balance 1000

Suppose we have an index on combined
search-key (branch-name, balance).
BB,1000
CC,200 PP,800 PP,1500
AB,200
AA,2000 AA,2300 AA,2500
CC,200 DD,200 DD,300
PP,800 PP,1000 PP,1300
PP,1500 PP,1560
AB,200 AC,200
CC,200 PP,300
23
Suppose we have an index on combined
search-key (branch-name, balance).
  • where branch-name PP and balance lt 1000

search pp,0
BB,1000
CC,200 PP,800 PP,1500
AB,200
AA,2000 AA,2300 AA,2500
CC,200 DD,200 DD,300
PP,800 PP,1000 PP,1300
PP,1500 PP,1560
AB,200 AC,200
CC,200 PP,300
search pp,1000
24
Suppose we have an index on combined
search-key (branch-name, balance).
NO!
  • where branch-name lt PP and balance 1000?

BB,1000
CC,200 PP,800 PP,1500
AB,200
AA,2000 AA,2300 AA,2500
CC,200 DD,200 DD,300
PP,800 PP,1000 PP,1300
PP,1500 PP,1560
AB,200 AC,200
CC,200 PP,300
25
Bitmap Indices
  • An index designed for multiple valued search keys

26
Bitmap Indices (Cont.)
The income-level value of record 3 is L1
Bitmap(size table size)
Unique values of gender
Unique values of income-level
27
Bitmap Indices (Cont.)
  • Some properties of bitmap indices
  • Number of bitmaps for each attribute?
  • Size of each bitmap?
  • When is the bitmap matrix sparse and what
    attributes are good for bitmap indices?

28
Bitmap Indices (Cont.)
  • Bitmap indices generally very small compared with
    relation size
  • E.g. if record is 100 bytes, space for a single
    bitmap is 1/800 of space used by relation.
  • If number of distinct attribute values is 8,
    bitmap is only 1 of relation size
  • What about insertion?
  • Deletion?

29
Bitmap Indices Queries
Sample query Males with income level L1
even faster!
10010 AND 10100 10000
What about the number of males with income level
L1?
30
Bitmap Indices Queries
  • Queries are answered using bitmap operations
  • Intersection (and)
  • Union (or)
  • Complementation (not)
Write a Comment
User Comments (0)
About PowerShow.com