Hash Tables - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Hash Tables

Description:

IKI 10100: Data Structures & Algorithms. Ruli ... canary. alpha. crystal. dawn. custom. flamingo. hallmark. marigold. private. dark. Primary Clustering ... – PowerPoint PPT presentation

Number of Views:253
Avg rating:3.0/5.0
Slides: 34
Provided by: dn5
Category:
Tags: canary | hash | tables

less

Transcript and Presenter's Notes

Title: Hash Tables


1
Hash Tables
2
Review
  • Linked List
  • insert, find, delete operations take O(n)
  • Stack Queue
  • insert, find, delete operations take O(1)
  • but the access is restricted
  • Binary Search Tree
  • insert, find, delete operations take O(log n) in
    average case, but take O(n) in worst case
  • AVL Tree, Red-Black Tree
  • insert, find, delete operations take O(log n)

3
Review
  • Array
  • all operations take O(1) time
  • data accessed using index (integer)
  • size should be determined first
  • not growable

4
Outline
  • Hashing
  • Definition
  • Hash function
  • Collision resolution
  • Open hashing
  • Separate chaining
  • Closed hashing (Open addressing)
  • Linear probing
  • Quadratic probing
  • Double hashing
  • Primary Clustering, Secondary Clustering
  • Access insert, find, delete

5
Hash Tables
  • Hashing is used for storing relatively large
    amounts of data in a table called a hash table
    ADT.
  • Hash table is usually fixed as H-size, which is
    larger than the amount of data that we want to
    store.
  • We define the load factor (?) to be the ratio of
    data to the size of the hash table.
  • Hash function maps an item into an index in range.

6
Hash Tables (2)
  • Hashing is a technique used to perform
    insertions, deletions, and finds in constant
    average time.
  • To insert or find a certain data, we assign a key
    to the elements and use a function to determine
    the location of the element within the table
    called hash function.
  • Hash tables are arrays of cells with fixed size
    containing data or keys corresponding to data.
  • For each key, we use the hashing function to map
    key into some number in the range 0 to H-size-1
    using hashing function.

7
Hash Function
  • Hashing function should have the following
    features
  • Easy to compute.
  • Two distinct key map to two different cells in
    array (Not true in general) - why?.
  • This can be achieved by using direct-address
    table where universal set of keys is reasonably
    small.
  • Distributes the keys evenly among cells.
  • One simple hashing function is to use mod
    function with a prime number.
  • Any manipulation of digits, with least complexity
    and good distribution can be used.

8
Hash Function Truncation
  • Part of the key is simply ignored, with the
    remainder truncated or concatenated to form the
    index.
  • Phone no index
  • 731-3018 338
  • 539-2309 329
  • 428-1397 217

9
Hash Function Folding
  • The data can be split up into smaller chunks
    which are then folded together in some form.
  • Phone no 3-group index
  • 7313018 7313018 104
  • 5392309 5392309 454
  • 4281397 4281397 520

10
Hash Function Modular arithmetic
  • Convert the data into an integer, divide by the
    size of the hash table, and take the remainder as
    the index.
  • 3-group index
  • 7313018 3749 100 49
  • 5392309 2848 100 48
  • 4281397 1825 100 25

11
Choosing a hash function
  • A good has function should satisfy two criteria
  • 1. It should be quick to compute
  • 2. It should minimize the number of collisions

12
Example of hash function
  • Hash function for string
  • X 128
  • A3 X3 A2 X2 A1 X1 A0 X0
  • (((A3 X) A2) X A1) X A0
  • The result of hash function is much larger than
    the size of table, so we should modulo the result
    with the size of hash table.

13
Example of hash function
  • int hash(String key, int tableSize)
  • int hashVal 0
  • for (int i0 i lt key.length() i)
  • hashVal (hashVal 128
  • key.charAt(i)) tableSize
  • return hashVal tableSize
  • Modulo
  • (A B) C (A C B C) C
  • (A B) C (A C B C) C

14
Example of hash function
  • int hash(String key, int tableSize)
  • int hashVal 0
  • for (int i0 i lt key.length() i)
  • hashVal (hashVal 37
  • key.charAt(i))
  • hashVal tableSize
  • if (hashVal lt 0)
  • hashVal tableSize
  • return hashVal

15
Example of hash function
  • int hash(String key, int tableSize)
  • int hashVal 0
  • for (int i0 i lt key.length() i)
  • hashVal key.charAt(i)
  • return hashVal tableSize

16
Collision resolution
  • When two keys map into the same cell, we get a
    collision.
  • We may have collision in insertion, and need to
    set a procedure (collision resolution) to resolve
    it.

17
Closed Hashing
  • If collision, try to find alternative cells
    within table.
  • Closed hashing also known as open addressing.
  • For insertion, we try cells in sequence by using
    incremented function like
  • hi(x) (hash(x) f(i)) mod H-size f(0) 0
  • Function f is used as collision resolution
    strategy.
  • The table is bigger than the number of data.
  • Different method to choose function f
  • Linear probing
  • Quadratic probing
  • Double hashing

18
Linear probing
  • Use a linear function f(i) i
  • Find the first position in the table for the key,
    which is close to the actual position.
  • Least complex function.
  • May result in primary clustering.
  • Elements that hash to the different location
    probe the same alternative cells
  • The complexity of this probing is dependent on
    the value of ? (load factor).
  • We do not use this probing if ? gt 0.5.

19
Hashing - insert
0
alpha
1
crystal
2
3
dawn
4
emerald
5
flamingo
6
7
hallmark
8
9
10
11
marigold
12
moon
13
14
15
. . .
20
Hashing - lookup
0
alpha
1
cobalt?
2
crystal
3
dawn
4
emerald
5
flamingo
6
7
hallmark
8
9
10
11
marigold?
12
moon
13
marigold
14
private?
15
private
. . .
21
Hashing - delete
  • lazy deletion - why?

22
Hashing - operation after delete
0
alpha
1
custom (insert)
2
crystal
3
dawn
4
5
flamingo
6
7
hallmark
8
9
10
11
marigold?
12
13
marigold
14
15
private
. . .
23
Primary Clustering
  • Elements that hash to the different location
    probe the same alternative cells

alpha
alpha
canary
canary
cobalt
crystal
crystal
dark
dawn
dawn
custom
custom
flamingo
flamingo
hallmark
hallmark
marigold
marigold
private
private
. . .
. . .
24
Quadratic probing
  • Eliminate the primary clustering by selecting
    f(i) i2
  • There is more problem with a hash table that is
    more than half full.
  • You have to select appropriate table size that is
    not square of a number.
  • We can prove that quadratic probing with table
    size prime number and at least half empty will
    always find a location for an element.
  • Can use increment to collision by noting that
    quadratic function f(i) i2 f(i-1) 2 i - 1.
  • Elements that hash to the same location will
    probe the same alternative cells (secondary
    clustering).

25
Double hashing
  • Collision resolution function is another hash
    function like f(i) i hash2 (x)
  • Each time a factor of hash2 (x) is added to
    probe.
  • Have to be careful for the choice of second hash
    function to ensure that it does not come to zero
    and it probes all the cells.
  • It is essential to have a prime size hash table.

26
Double Hashing
alpha
alpha
canary
cobalt
crystal
crystal
done
dark
dawn
dawn
custom
custom
flamingo
flamingo
hallmark
hallmark
marigold
marigold
private
private
. . .
. . .
27
Open Hashing
  • Collision problems is solved by inserting all
    elements that hash to the same bucket into a
    single collection of values.
  • Open Hashing
  • To keep a linked list of all the elements that
    are hashed to the same cell (separate chaining).
  • Each cell in the hash table contains a pointer to
    a linked list containing the data.
  • Functions and Analysis of Open Hashing
  • Inserting a new element in to the table We add
    the element at the beginning or the end of the
    appropriate linked list.
  • Depending if you want to check for duplicates or
    not.
  • Also depends on how frequent you expect to access
    the most recently added elements.

28
Open Hashing
0
1
2
3
4
5
29
Open Hashing
  • For search, we use the hash function to determine
    which linked list holds the element, and then
    traverse the linked list to find the element.
  • Deletion is done to the element in the
    appropriate linked list after we find the element
    to be deleted.
  • We could use other kinds of lists like a tree or
    another hash table for each cell in the hash
    table to resolve collision.
  • The main advantage of this method is the fact
    that it can handle any amount of data (dynamic
    expansion).
  • The main disadvantage of this method is the
    memory usage for each cell.

30
Analysis of Open Hash
  • In general the average length of a list is the
    load factor ?.
  • Complexity of insertion depends on hashing
    function and where insertion is done but in
    general has the same complexity of insertion to
    the linked list time to evaluate the hashing
    function used.
  • For search, time complexity is the constant time
    to evaluate the hashing function traversing the
    list.
  • Worst case O(n) for search.
  • Average case depends ?.
  • General rule for open hashing is to make ??1.
  • Used for dynamic size data.

31
Issues
  • Other issues common to all closed hashing
    resolutions
  • Confusing after deletion.
  • Simpler than open hashing function
  • Good if we do not expect too many collisions.
  • If search is unsuccessful, we may have to search
    the whole table.
  • Use of large table compare to number of data
    expected.

32
Summary
  • Hash tables array
  • Hash function function that maps key into number
    0 ? size of hash table)
  • Collision resolution
  • Open hashing
  • Separate chaining
  • Closed hashing (Open addressing)
  • Linear probing
  • Quadratic probing
  • Double hashing
  • Primary Clustering, Secondary Clustering

33
Summary
  • Advantage
  • Running time
  • O(1) O(Collision resolution)
  • Disadvantage
  • Difficult (not efficient) to print all elements
    in hash table
  • Inefficient to find minimum element or maximum
    element
  • Not growable (for closed hash/open addressing)
  • Waste some space (load factor)
Write a Comment
User Comments (0)
About PowerShow.com