Quick Review of material covered Apr 8 - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Quick Review of material covered Apr 8

Description:

We'll finish up B -trees with two more concepts: B -tree File Organization. B-tree index files ... divide the set of disk blocks that make up the file into buckets ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 14
Provided by: david227
Learn more at: http://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Quick Review of material covered Apr 8


1
Quick Review of material covered Apr 8
  • B-Tree Overview and some definitions
  • balanced tree
  • multi-level
  • reorganizes itself on insertion and deletion
  • built so each node fits on a single disk page
  • Examined mechanics of B-tree Insertion and
    Deletion
  • looked at several examples
  • Well finish up B-trees with two more concepts
  • B-tree File Organization
  • B-tree index files

2
B-tree File Organization
  • B-Tree Indices solve the problem of index file
    degradation. The original data file will still
    degrade upon a stream of insert/delete
    operations.
  • Solve data-file degradation by using a B-tree
    file organization
  • Leaf nodes in a B-tree file organization store
    records, not pointers into a separate original
    datafile
  • since records are larger than pointers, the
    maximum number of recrods that can be stored in a
    leaf node is less than the number of pointers in
    a non-leaf node
  • leaf nodes must still be maintained at least half
    full
  • insert and delete are handled in the same was as
    insert and delete for entries in a B-tree index

3
B-tree File Organization Example
  • Records are much bigger than pointers, so good
    space usage is important
  • To improve space usage, involve more sibling
    nodes in redistribution during splits and merges
    (to avoid split/merge when possible)
  • involving one sibling guarantees 50 space use
  • involving two guarantees at least 2/3 space use,
    etc.

4
B-tree Index Files
  • B-trees are similar to B-trees, but search-key
    values appear only once in the index (eliminates
    redundant storage of key values)
  • search keys in non-leaf nodes dont appear in the
    leaf nodes, so an additional pointer field for
    each search key in a non-leaf node must be stored
    to point to the bucket or record for that key
    value
  • leaf nodes look like B-tree leaf nodes
  • (P1, K1, P2, K2, , Pn)
  • non-leaf nodes look like so
  • (P1, B1, K1, P2, B2, K2, , Pn)
  • where the Bi are pointers to buckets or file
    records.

5
B-tree Index File Example
  • B-tree
  • and
  • B-tree

6
B-tree Index Files (cont.)
  • Advantages of B-tree Indices (vs. B-trees)
  • May use less tree nodes than a B-tree on the
    same data
  • Sometimes possible to find a specific key value
    before reaching a leaf node
  • Disadvantages of B-tree Indices
  • Only a small fraction of key values are found
    early
  • Non-leaf nodes are larger, so fanout is reduced,
    and B-trees may be slightly taller than B-trees
    on the same data
  • Insertion and deletion are more complicated than
    on B-trees
  • Implementation is more difficult than B-trees
  • In general, advantages dont outweigh
    disadvantages

7
Hashing
  • Weve examined Ordered Indices (design based upon
    sorting or ordering search key values) the other
    type of major indexing technique is Hashing
  • Underlying concept is very simple
  • observation small files dont require indices or
    complicated search methods
  • use some clever method, based upon the search
    key, to split a large file into a lot of little
    buckets
  • each bucket is sufficiently small
  • use the same method to find the bucket for a
    given search key

8
Hashing Basics
  • A bucket is a unit of storage containing one or
    more records (typically a bucket is one disk
    block in size)
  • In a hash file organization we find the bucket
    for a record directly from its search-key value
    using a hash function
  • A hash function is a function that maps from the
    set of all search-key values K to the set of all
    bucket addresses B
  • The hash function is used to locate records for
    access, insertion, and deletion
  • Records with different search-key values may be
    mapped to the same bucket
  • the entire bucket must be searched to find a
    record
  • buckets are designed to be small, so this task is
    usually not onerous

9
Hashed File Example
  • So we
  • divide the set of disk blocks that make up the
    file into buckets
  • devise a hash function that maps each key value
    into a bucket
  • V set of key values
  • B number of buckets
  • H hashing function H V--gt (0, 1, 2, 3, , B-1)
  • Example V 9 digit SS B1000 H key modulo
    1000

10
Hash Functions
  • To search/insert/delete/modify a key do
  • compute H(k) to get the bucket number
  • search sequentially in the bucket (heap
    organization within each bucket)
  • Choosing H almost any function that generates
    random numbers in the range 0, B-1
  • try to distribute the keys evenly into the B
    buckets
  • one rule of thumb when using MOD -- use a prime
    number

11
Hash Functions (2)
  • Collision is when two or more key values go to
    the same bucket
  • too many collisions increases search time and
    degrades performance
  • no or few collisions means that each bucket has
    only one (or very few) key(s)
  • Worst-case hash functions map all search keys to
    the same bucket

12
Hash Functions (3)
  • Ideal hash functions are uniform
  • each bucket is assigned the same number of
    search-key values from the set of all possible
    values
  • Ideal hash functions are random
  • each bucket has approximately the same number of
    records assigned to it irrespective of the actual
    distribution of search-key values in the file
  • Finding a good hash function is not always easy

13
Examples of Hash Functions
  • Given 26 buckets and a string-valued search key,
    consider the following possible hash functions
  • Hash based upon the first letter of the string
  • Hash based upon the last letter of the string
  • Hash based upon the middle letter of the string
  • Hash based upon the most common letter in the
    string
  • Hash based upon the average letter in the
    string the sum of the letters (using A0, B1,
    etc) divided by the number of letters
  • Hash based upon the length of the string (modulo
    26)
  • Typical hash functions perform computation on the
    internal binary representation of the search key
  • example searching on a string value, hash based
    upon the binary sum of the characters in the
    string, modulo the number of buckets
Write a Comment
User Comments (0)
About PowerShow.com