Hash%20Tables - PowerPoint PPT Presentation

About This Presentation
Title:

Hash%20Tables

Description:

... of the integer (table size 100) ... Table is an array of TableSize, hash(key) is a function that ... Folding (integer or bits) Divide value into subgroups (k ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 19
Provided by: ellenw4
Learn more at: http://cs.hiram.edu
Category:

less

Transcript and Presenter's Notes

Title: Hash%20Tables


1
Hash Tables
  • Ellen Walker
  • CPSC 201 Data Structures
  • Hiram College

2
Breaking the Rules
  • The fastest possible search algorithm, if you
    only compare two items at once, is O(log n) where
    n is the number of items in the table.
  • But, if we can figure out a way to compare
    multiple items at once, we can beat that!

3
Magic Address Calculator
  • Represent your table as an array
  • Add a new function, the magic address
    calculator
  • The input to this function is the key
  • The output of this function is the address to
    look in
  • No comparisons, so were not limited to log n.
  • In fact, if the calculator takes the same time
    for every input, its constant time search!

4
Hash Function
  • The magic calculator function is called a hash
    function
  • It treats the key as a sequence of bits or an
    integer, regardless of its original type
  • Example hash functions (not very good ones)
  • Last two digits of the integer (table size 100)
  • Divide the bit string into sequences of 8 bits
    and XOR all sequences together (table size 256)

5
Hash Table
  • Table is an array of TableSize, hash(key) is a
    function that returns a value from 0 to
    TableSize.
  • To insert
  • Tablehash(key) key
  • To retrieve
  • Result Tablehash(key)
  • To delete
  • Tablehash(key) empty marker
  • Can it really be that simple?

6
Hash Table Collisions
  • If the size of the the table is smaller than the
    number of possible keys, then there must be at
    least two keys with the same hash value.
  • E.g. 202 and 102 if key is last 2 digits
  • If we want to insert both values, we will get a
    collision
  • The item we retrieve might not really have a
    matching key
  • The location to insert into might already be full

7
Avoiding Collisions
  • Make the table big (if you can afford it)
  • Pick the right hash function
  • If you know all possible keys, create a perfect
    hash function (unique value for each possible
    key)
  • Try to distribute all possible keys evenly among
    the addresses
  • Try to distribute the most likely keys evenly
    among the addresses

8
Choosing a Hash Function
  • Should return integers in a fixed range
  • Should be quick to compute
  • Should avoid obvious patterns of results
  • Should involve the entire search key

9
Typical Hash Functions
  • Taking an integer modulo a prime number
  • Prime number has only 1 and itself as factors
  • This avoids patterns of addresses
  • Easiest to analyze and most common
  • Folding (integer or bits)
  • Divide value into subgroups (k bits or digits)
  • Add or XOR together subgroups

10
Resolving Collisions byOpen Addressing
  • Find another place within the table for the item
  • Linear probing new item goes in first empty
    space after the result of the hash function
    (Offsets are sequence of numbers)
  • Quadratic probing first look in next space,
    then skip to 4th space, then 9th, then 16th, etc.
    (Offsets are sequence of squares)
  • Double hashing use a second hash function on the
    key to find the offset. (Offsets are multiples
    of the second hash value)

11
Insertion with Open Addressing
  • void insert(E item)
  • int address hash(item)
  • while(Tableaddress!null)
  • compute next offset
  • address address offset
  • Tableaddress item

12
Retrieval with Open Addressing
  • E retrieve(E item)
  • int address hash(item)
  • while((!tableaddress.equals(item))
  • (tableaddress ! null))
  • compute next offset
  • address address offset
  • return( tableaddress) //returns null if not
    found

13
Issues with Open Addressing
  • Retrieval must follow same sequence of probes as
    insertion
  • If a collision fills a cell, then it forces a
    collision with the value that hashes directly to
    the cell.
  • Consider
  • Hash(key) key11
  • Sequence of items 1,14,12,2,3,41,27,15
  • Try linear, quadratic, double hash key7

14
Comparing Open Addressing Schemes
  • Linear probing is most prone to clustering
  • Large clumps of cells fill, causing long
    sequences of probes for each insertion
  • Quadratic probing is less prone to clustering
  • Each probe is even further from the cluster
  • No guarantee every slot will be searched, though!
  • Double hashing depends on the other hash function
  • Its base should be relatively prime to the
    original base so there is no pattern
  • In this case, it is as good or better than
    quadratic

15
Restructuring the Hash Table
  • Each address can contain multiple items
  • Bucket (set max items per hash key)
  • Separate chaining (array of linked lists)
  • Our example again
  • Hash(key) key11
  • Sequence of items 1,14,12,2,3,41,27,15

16
Bucket Multiple Cells per Hash Value
0 Data with hash value 0
Another data with hash value 0
Third data with hash value 0
1 First data with hash value 1
(etc).

2


17
Separate chaining
  • Hash table as array of linked lists

0
1 null
2
3 null
4
18
Growing a Hash Table
  • Open addressing
  • When the hash table is full, allocate a bigger
    one.
  • Rehashing add each element from the original
    table to the full one using the new hash code.
  • Chaining
  • When the lists are getting too long, allocate a
    bigger table
  • Rehash as above.
Write a Comment
User Comments (0)
About PowerShow.com