File Organization - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

File Organization

Description:

Reverse insert procedure. Merge blocks. and cut directory if possible. No merging of blocks ... Directory grows very fast (exponentially). What about a ... – PowerPoint PPT presentation

Number of Views:192
Avg rating:3.0/5.0
Slides: 29
Provided by: hak4
Category:

less

Transcript and Presenter's Notes

Title: File Organization


1
File Organization
  • File Sequence of records
  • Fixed-length vs. Variable-length records
  • Records are stored to Disk blocks
  • Spanned vs. Unspanned Organization
  • How to Allocate Disk blocks to disk?
  • Contiguous vs. Linked Allocation
  • File Organization
  • Unordered vs. Ordered
  • Hashing

2
Hashing
  • Map field values to addresses
  • H(field value) ? address of record
  • H(k) k mod M
  • ASCII codes for ASCII
  • Pick random digits of hash field value

3
External Hashing for Disk Files
  • Bucket One disk block or cluster of contiguous
    blocks (like fixed-size chaining)
  • Hashing maps a key to a bucket number (which is
    then mapped to a physical block)

Bucket
10
40
50
20
80
30
0 1 9
41
Collision? Less problem. (many records in a
bucket)
39
19
69
4
External Hashing
  • M buckets
  • m max of records in a bucket
  • What is best M,m?
  • If of records
  • If of records mM ? Collisions
  • Solution 1 Reorganize with a larger/smaller M,
    and a new hash function
  • Solution 2 Dynamic Hashing

5
Dynamic Hashing
  • Static hashing hash address space is fixed.
  • Dynamic hashing Expand and shrink the file
    dynamically.
  • Extendible Hashing
  • Linear Hashing

6
Extendible Hashing
  • Hash values binary numbers (result of hashing is
    a non-negative integer)
  • Change the number of bits used in hashing
  • Start with only the leading (final) bit
  • Double if overflows ? two leading bits
  • Split only one bucket
  • Multiple cells may point to a single bucket

7
Extendible Hashing Example h(k) is 4 bits 2
keys/bucket
Global depth (gd)
1
Local Depth
  • i

0001
1
  • Apply gd of bits as the hash function

1001
1100
Insert 1010
8
Example h(k) is 4 bits 2 keys/bucket
1
  • i

0001
1
1001
1100
Insert 1010
9
Example h(k) is 4 bits 2 keys/bucket
1
  • i

0001
1
1001
1100
Insert 1010
10
Example continued
i
2
00 01 10 11
1
0001
1001
1010
Insert 0111 0000
1100
11
Example continued
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
12
Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
13
Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
14
Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
15
Extendible hashing deletion
  • Reverse insert procedure
  • Merge blocks and cut directory if possible
  • No merging of blocks

16
Deletion example
  • Run thru insert example in reverse!

17
Example 2 Least Significant Bits
GLOBAL DEPTH
Insert 20
18
Insert h(20)2010100 ? Bucket pointed to by 00
(A), full! Split A, double directory.
2
LOCAL DEPTH
Bucket A
4 12 32 16
GLOBAL DEPTH
2
2
Bucket B
1
5
21
13
00
01
2
10
Bucket C
10
11
2
DIRECTORY
Bucket D
15
7
19
DIRECTORY (doubled in size By simply copying)
19
Extendible Hashing - Question
  • File size 100 records
  • Bucket size 5 records
  • The best-case (smallest possible) size of the
    directory?
  • The worst-case?

20
Points to Note
  • Does the order of insertion matter?
  • Bucket split ? directory doubling
  • If local depth becomes global depth
  • Worst case
  • Skewed hashing Multiple entries with same hash
    values.
  • Directory grows very fast (exponentially).
  • What about a smoother (linear) growth?
  • ? Linear Hashing

21
Linear Hashing
  • Another dynamic hashing scheme, an alternative to
    Extendible Hashing.
  • Motivation Ext. Hashing uses a directory that
    grows by doubling Can we do better? (smoother
    growth)
  • LH split buckets from left to right, regardless
    of which one overflowed (simple, but it works!)
  • LH uses overflow pages (chaining approach)

22
Linear Hashing Example
  • Initially h(x) x mod N (N4 here)
  • Assume 3 records/bucket
  • Insert 17 17 mod 4 1
  • Bucket id 0 1 2
    3
  • 4 8 5 9
    6 7 11

13
23
Linear Hashing Example
  • Initially h(x) x mod N (N4 here)
  • Assume 3 records/bucket
  • Insert 17 17 mod 4 1
  • Bucket id 0 1 2
    3
  • 4 8 5 9 6
    7 11

Overflow for Bucket 1
13
Split bucket 0, anyway!!
24
Linear Hashing Example
  • To split bucket 0, use another function h1(x)
  • h0(x) x mod N , h1(x) x mod (2N)
  • 0 1 2
    3
  • 4 8 5 9 6
    7 11

Split
13
17
25
Linear Hashing Example
  • To split bucket 0, use another function h1(x)
  • h0(x) x mod N , h1(x) x mod (2N)
  • Bucket id 0 1 2 3
    4
  • 8 5 9 6 7
    11 4

13
17
26
Linear Hashing Example
  • h0(x) x mod N , h1(x) x mod (2N)
  • Insert 15 and 3 (to cell 3 - overflows)
  • Bucket id 0 1 2
    3 4
  • 8 5 9 6
    7 11 4

13
17
27
Linear Hashing Example
  • h0(x) x mod N , h1(x) x mod (2N)
  • 0 1 2 3 4
    5
  • 8 9 6 7
    11 4 13 5

15
17
3
28
Linear Hashing Deletion
  • Run thru insert example in reverse!
Write a Comment
User Comments (0)
About PowerShow.com