Quick Review of material covered Apr 8 - PowerPoint PPT Presentation

1 / 13

About This Presentation

Title:

Quick Review of material covered Apr 8

Description:

Number of Views:51

Avg rating:3.0/5.0

Slides: 14

Provided by: david227

Learn more at: http://www.cs.umd.edu

Category:

Tags: apr | covered | material | quick | review

Transcript and Presenter's Notes

Title: Quick Review of material covered Apr 8

1
Quick Review of material covered Apr 8

2
B-tree File Organization

B-Tree Indices solve the problem of index file
degradation. The original data file will still
degrade upon a stream of insert/delete
operations.
Solve data-file degradation by using a B-tree
file organization
Leaf nodes in a B-tree file organization store
records, not pointers into a separate original
datafile
since records are larger than pointers, the
maximum number of recrods that can be stored in a
leaf node is less than the number of pointers in
a non-leaf node
leaf nodes must still be maintained at least half
full
insert and delete are handled in the same was as
insert and delete for entries in a B-tree index

3
B-tree File Organization Example

Records are much bigger than pointers, so good
space usage is important
To improve space usage, involve more sibling
nodes in redistribution during splits and merges
(to avoid split/merge when possible)
involving one sibling guarantees 50 space use
involving two guarantees at least 2/3 space use,
etc.

4
B-tree Index Files

B-trees are similar to B-trees, but search-key
values appear only once in the index (eliminates
redundant storage of key values)
search keys in non-leaf nodes dont appear in the
leaf nodes, so an additional pointer field for
each search key in a non-leaf node must be stored
to point to the bucket or record for that key
value
leaf nodes look like B-tree leaf nodes
(P1, K1, P2, K2, , Pn)
non-leaf nodes look like so
(P1, B1, K1, P2, B2, K2, , Pn)
where the Bi are pointers to buckets or file
records.

5
B-tree Index File Example

6
B-tree Index Files (cont.)

Advantages of B-tree Indices (vs. B-trees)
May use less tree nodes than a B-tree on the
same data
Sometimes possible to find a specific key value
before reaching a leaf node
Disadvantages of B-tree Indices
Only a small fraction of key values are found
early
Non-leaf nodes are larger, so fanout is reduced,
and B-trees may be slightly taller than B-trees
on the same data
Insertion and deletion are more complicated than
on B-trees
Implementation is more difficult than B-trees
In general, advantages dont outweigh
disadvantages

7
Hashing

Weve examined Ordered Indices (design based upon
sorting or ordering search key values) the other
type of major indexing technique is Hashing
Underlying concept is very simple
observation small files dont require indices or
complicated search methods
use some clever method, based upon the search
key, to split a large file into a lot of little
buckets
each bucket is sufficiently small
use the same method to find the bucket for a
given search key

8
Hashing Basics

A bucket is a unit of storage containing one or
more records (typically a bucket is one disk
block in size)
In a hash file organization we find the bucket
for a record directly from its search-key value
using a hash function
A hash function is a function that maps from the
set of all search-key values K to the set of all
bucket addresses B
The hash function is used to locate records for
access, insertion, and deletion
Records with different search-key values may be
mapped to the same bucket
the entire bucket must be searched to find a
record
buckets are designed to be small, so this task is
usually not onerous

9
Hashed File Example

10
Hash Functions

To search/insert/delete/modify a key do
compute H(k) to get the bucket number
search sequentially in the bucket (heap
organization within each bucket)
Choosing H almost any function that generates
random numbers in the range 0, B-1
try to distribute the keys evenly into the B
buckets
one rule of thumb when using MOD -- use a prime
number

11
Hash Functions (2)

12
Hash Functions (3)

Ideal hash functions are uniform
each bucket is assigned the same number of
search-key values from the set of all possible
values
Ideal hash functions are random
each bucket has approximately the same number of
records assigned to it irrespective of the actual
distribution of search-key values in the file
Finding a good hash function is not always easy

13
Examples of Hash Functions

Given 26 buckets and a string-valued search key,
consider the following possible hash functions
Hash based upon the first letter of the string
Hash based upon the last letter of the string
Hash based upon the middle letter of the string
Hash based upon the most common letter in the
string
Hash based upon the average letter in the
string the sum of the letters (using A0, B1,
etc) divided by the number of letters
Hash based upon the length of the string (modulo
26)
Typical hash functions perform computation on the
internal binary representation of the search key
example searching on a string value, hash based
upon the binary sum of the characters in the
string, modulo the number of buckets