Hashing - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Hashing

Description:

Algo: Define entry = (content word, linked list of integers) ... Smart algo: 1,000,000; dumb algo: 1,000,000*10. Memoization. Recursive Fibonacci: ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 30
Provided by: iseB8
Category:
Tags: algo | hashing

less

Transcript and Presenter's Notes

Title: Hashing


1
Hashing
  • Hashing is another method for sorting and
    searching data.
  • Hashing makes it easier to add and remove
    elements from a data structure.
  • The worst-case behavior for locating a key is
    linear Q(n).
  • Javas standard hash table class is
    java.util.Hashtable

2
Hashing
  • Hashing usually implements a data structure
    called a hash table.
  • A hash table is an effective data structure.
  • A hash table is a generalization of an array.
  • A hash table requires a key to access data.

3
Hashing
  • A hash table uses an array whose length is
    proportional to the number of keys actually
    stored.
  • The array index is computed from the key, rather
    than using the key to access the array.
  • The key is a unique identifying value.

4
Hashing Functions
  • Hashing requires the use of a hashing function.
  • The purpose of the hashing function is to compute
    the storage slot from the key.
  • Maps key values to array indices.
  • This calculation reduces the range of array
    indices that need to be handled.

5
Hashing Functions
  • If a hashing function groups key values together,
    this is called clustering of the keys.
  • A good hashing function distributes the key
    values uniformly through the arrays index range.
  • Any hashing function that results in clustering
    should be changed.
  • A good hashing function has an equal likelihood
    of hashing a key into any of the slots.
  • The java.util.Hashtable contains the method
    hashCode

6
Hashing Functions
  • The division hash function depends upon the
    remainder of division.
  • Math.abs(H(k)) table.length
  • When using the division hash function, it is best
    to have a table size that is a prime number of
    the form 4n 3.
  • Using the division hash function can result in
    many collisions.

7
Hashing Functions
  • The mid-square hash function converts the key to
    an integer, then doubles the key. The function
    returns the middle digits of the results.
  • The multiplicative hash function converts the key
    to an integer and multiplies it by a constant
    less than one. The function returns the first
    few digits of the fractional part of the result.

8
Example
Table
0
H(k1)
Universe of Keys - U
H(k4)
K1
Actual Keys K
H(k2)
K4
K5
K2
K3
H(k3)
m - 1
9
Collisions
  • A collision occurs when the hashing function
    calculates the same array index for two different
    objects and one is already stored into the array
    index location.
  • Two keys hash to the same slot.

10
Collision Example
Table
0
H(k1)
Universe of Keys - U
H(k4)
K1
Actual Keys K
H(k2) H(k5)
K4
K5
K2
K3
H(k3)
m - 1
11
Open Addressing
  • Open addressing ensures that all elements are
    stored directly into the hash table.
  • Every table slot contains either data or null.
  • The problem is that the table can fill up.
  • The good thing is that there are no external
    storage locations for the table elements.
  • Open addressing attempts to resolve collisions
    using various methods.

12
Linear Probing
  • Linear Probing resolves collisions by placing the
    data into the next open slot in the table.
  • If this slot is open, the data is stored in the
    slot.
  • If this slot is not open, the algorithm looks at
    the next slot (index) until an open slot is
    found.

13
Linear Probing
  • It is difficult to delete items from a hash table
    that uses open addressing.
  • Can not simply put null into the slot because may
    miss information. Instead place Deleted into the
    empty slot.
  • If H(k) is the ordinary hash function, the
    linear probing hash function is
  • H(k, i) (H(k) 1) m where i 0, 1, 2, ,
    m and m is the number of elements that can be
    stored into the table.

14
Linear Probing
  • A problem associated with Linear Probing is
    called, primary clustering.
  • Primary clustering occurs when many items hash
    into the same slot and long runs of slots are
    filled up.
  • This results in increased search times.

15
Linear Probing
Table
0
H(k1)
Universe of Keys - U
H(k4)
K1
Actual Keys K
H(k2) H(k5)
K4
K5
H(k5)
K2
K3
H(k3)
m - 1
16
Double Hashing
  • Double hashing is one of the best methods for
    dealing with collisions.
  • The slot location is calculated based upon the
    hash function (H1(k)). If the slot is full, then
    a second hash function is calculated and combined
    with the first hash function (H(k, i)) to
    determine a new slot.

17
Double Hashing
  • Assume that
  • H1(k) Math.abs(H(k)) table.length
  • H2(k) 1 Math.abs(H(k)) (table.length x)
    where x is a small value 1, 2, or 3.
  • Then
  • H(k, i) (H1(k) i H2(k) ) m

18
Double Hashing
Table
0
H(k5)
H(k1)
Universe of Keys - U
H(k4)
K1
Actual Keys K
H(k2) H(k5)
K4
K5
K2
K3
H(k3)
m - 1
19
External Chaining
  • In external chaining the hash table contains an
    array in which each component can hold more than
    one element of the hash table.
  • Essentially, a multiple dimension array or a
    linked list of elements can exist for each table
    slot.
  • The typical implementation is that each slot
    contains a linked list.

20
External Chaining
Table
0
H(k1)
Universe of Keys - U
H(k4)
K1
Actual Keys K
H(k2)
H(k5)
K4
K5
K2
K3
H(k3)
m - 1
21
Load Factor
  • The load factor is a fraction that represents the
    number of elements stored in the table divided by
    the size of the tables array.
  • a
  • the number of elements stored in the table
  • the size of the tables array

22
Load Factor
  • If open addressing is used, then each table slot
    holds at most one element, therefore, the load
    factor can never be greater than 1.
  • If external chaining is used, then each table
    slot can hold many elements, therefore, the load
    factor may be greater than 1.

23
Hashing Analysis
  • The worst case analysis for hashing is the case
    where every key is hashed into the same slot.
  • Q (n) linear time.
  • The average time can be much faster.

24
Average Search Analysis
  • Searching with Linear probing.
  • For a table that is not near full
  • ½ ( 1 1 / (1 a) )
  • For a table that is full or near full
  • Math.Sqrt( n ( p / 8) )
  • Searching with double hashing.
  • (-ln (1 a) ) / a where l in ln is L
  • Searching with chained hashing.
  • 1 (a / 2 )

25
Hashing
  • Java provides the HashTable class, but it also
    provides two other classes.
  • The HashMap class implements a hash table using a
    map data structure.
  • The HashSet class implements a hash table using
    sets.

26
Applications
  • Compilers keep track of variables and scope
  • Graph Theory associate id with name (general)
  • Game Playing E.G. in chess, keep track of
    positions already considered and evaluated (which
    may be expensive)
  • Spelling Checker At least to check that word is
    right.
  • But how to suggest correct word
  • Lexicon/book indices

27
Lexicon Example
  • Inputs text file (N) content word file (the
    keys) (M)
  • Ouput content words in order, with page numbers
  • Algo
  • Define entry (content word, linked list of
    integers)
  • Initially, list is empty for each word.
  • Step 1 Read content word file and Make HashMap
    of content word, empty list
  • Step 2 Read text file and check if work in
    HashMap
  • if in, add to page number, else
    continue.
  • Step 3 Use the iterator method to now walk
    through the HashMap and put it into a sortable
    container.

28
Lexicon Example
  • Complexity
  • step 1 O(M), M number of content words
  • step 2 O(N), N word file size
  • step 3 O(M log M) max.
  • So O(max(N, M log M))
  • Dumb Algorithm
  • Sort content words O(Mlog M) (balanced tree)
  • Look up each word in Content Word tree and update
  • O(NlogM)
  • Total complexity O(N log M)
  • N 5002000 1,000,000 and M 1000
  • Smart algo 1,000,000 dumb algo 1,000,00010.

29
Memoization
  • Recursive Fibonacci
  • fib(n) if (nlt2) return 1
  • else return fib(n-1)fib(n-2)
  • Use hashing to store intermediate results
  • Hashtable ht
  • fib(n) Entry e (Entry)ht.get(n)
  • if (e ! null) return e.answer
  • else if (nlt2) return 1
  • else ans fib(n-1)fib(n-2)
  • ht.put(n,ans)
  • return ans
Write a Comment
User Comments (0)
About PowerShow.com