Hashing - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Hashing

Description:

Hashing Chapter 5 – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 23
Provided by: Emin154
Category:
Tags: function | hashing

less

Transcript and Presenter's Notes

Title: Hashing


1
Hashing
  • Chapter 5

2
  • Hashing is used for
  • Insertion, Deletion, Search in constant time
  • Hashing is not used for
  • Finding minimum, maximum
  • Sorting tthe elements
  • Appropriate for databases

3
What we will learn
  • Hash Table
  • Hash Function
  • Key
  • Linear robing
  • Collusion
  • Double Hashing

4
Now Starts
  • hash table
  • Tables which can be searched for an item
    in O(1) time using a hash function to form an
    address from the key.
  • hash function
  • Function which, when applied to the key, produces
    a integer which can be used as an address in a
    hash table.
  • collision
  • When a hash function maps two different keys to
    the same table address, a collision is said to
    occur.
  • linear probing
  • A simple re-hashing scheme in which the next slot
    in the table is checked on a collision.
  • quadratic probing
  • A re-hashing scheme in which a higher (usually
    2nd) order function of the hash index is used to
    calculate the address.
  • clustering.
  • Tendency for clusters of adjacent slots to be
    filled when linear probing is used.
  • secondary clustering.
  • Collision sequences generated by addresses
    calculated with quadratic probing.
  • perfect hash function
  • Function which, when applied to all the members
    of the set of items to be stored in a hash table,
    produces a unique set of integers within some
    suitable range.

5
An Example
  • In this example john maps 3
  • Phil maps 4
  • Problem
  • How mapping will be done?
  • If two items maps the same place what haappens?

6
A Plan For Hashing
  • Save items in a key-indexed table. Index is a
    function of the key.
  • Hash function.
  • Method for computing table index from key.
  • Collision resolution strategy.
  • Algorithm and data structure to handle two keys
    that hash to the same index.
  • If there is no space limitation
  • Trivial hash function with key as address.
  • If there is no time limitation
  • Trivial collision resolution sequential search.
  • Limitations on both time and space hashing (the
    real world)

7
Finding A Hash Function
  • Goal scramble the keys.
  • Each table position equally likely for each key.
  • Ex Vatandaslik Numarasi for 10000 person
  • Bad The Whole Number Since 10000 will not be
    used forever
  • Better last three digits. But every number is
    even
  • The Best Use 2,3,4,5 digits
  • Ex date of birth.
  • Bad first three digits of birth year.
  • Better birthday.
  • Ex phone numbers.
  • Bad first three digits.
  • Better last three digits.

8
Example for Hash Function
Question What should be the tableSize?
9
Example for Hash Function
Question Is it suitable for table size 10007
Answer No English is not random. Although
there are 26317576 possible combinations of
three character, only 2851 different combinations
occur. So 100-28 72 waste of table
10
New Hash Example
11
Problem.
  • Collision two keys hashing to same value.
  • Essentially unavoidable.
  • In probability theory, the birthday problem or
    birthday paradox pertains to the probability that
    in a set of randomly chosen people some pair of
    them will have the same birthday. In a group of
    23 (or more) randomly chosen people, there is
    more than 50 probability that some pair of them
    will both have been born on the same day. For 57
    or more people, the probability is more than 99,
    reaching 100 as the number of people reaches
    366. The mathematics behind this problem leads to
    a well-known cryptographic attack called the
    birthday attack.
  • With M hash values, expect a collision after
    sqrt( M/2) insertions.
  • Conclusion can't avoid collisions unlessyou have
    a ridiculous amount of memory.
  • Challenge efficiently cope with collisions.

12
Collision
13
When collusion occurs
  • Seperate Chaning
  • Keep a list of elements that hash to the same
    value.
  • Separate chaining array of M linked lists.
  • Hash map key to integer i between 0 and M-1.
  • Insert put at front of ith chain.
  • constant time
  • Search only need to search ith chain.
  • proportional to length of chain

14
Seperate Chain Performance
  • Search cost is proportional to length of chain.
  • Trivial average length N / M.
  • Worst case all keys hash to same chain.
  • Theorem. Let ? N / M gt 1 be average length of
    list which is called loading factor.
  • Average search cost 1 ?/2
  • What is the choice of M
  • M too large too many empty chains.
  • M too small chains too long.
  • Typical choice N / M 10 constant-time
    search/insert.

15
Hash Table without Link List
  • Linear probing array of size M.
  • Hash map key to integer i between 0 and M-1.
  • Insert put in slot i if free, if not try i1,
    i2, etc.
  • Search search slot i, if occupied but no match,
    try i1, i2, etc.
  • Cluster.
  • Contiguous block of items.
  • Search through cluster using elementary
    algorithm for arrays.

16
Lineer Probing
17
Lineer Probing Performance
  • Insert and search cost depend on length of
    cluster.
  • Trivial average length of cluster ? N / M.
  • Worst case all keys hash to same cluster.
  • Theorem (Knuth, 1962). Let ? N / M lt 1 be
    average length of list.
  • insert search
  • Parameters.
  • M too large too many empty array entries.
  • M too small clusters coalesce.
  • Typical choice M 2N constant-time
    search/insert.

18
Lineer Probing Analyze
19
Avoid Clustering
  • Use Double Hash
  • When collusion occurs use second hash function
  • Avoid to get 0 at the second hash

20
(No Transcript)
21
Rehashing
  • After 70 of table is full, double the size of
    the hash table.
  • Dont forget to have prime number

22
Rehashing
Rehashing
Write a Comment
User Comments (0)
About PowerShow.com