Hashing - PowerPoint PPT Presentation

About This Presentation
Title:

Hashing

Description:

Hashing ... Hashing – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 28
Provided by: wemi
Category:
Tags: function | hashing

less

Transcript and Presenter's Notes

Title: Hashing


1
Hashing
2
Hashing
  • Hashing is the transformation of a string of
    characters into a usually shorter fixed-length
    value or key that represents the original string.
    Hashing is used to index and retrieve items in a
    database because it is faster to find the item
    using the shorter hashed key than to find it
    using the original value. It is also used in many
    encryption algorithms.

3
Hash Table
  • Is a data structure that associates keys with
    values

A small phone book as a hash table.
4
Hash Table (1)
  • The primary operation it supports efficiently is
    a lookup given a key (a person's name), find the
    corresponding value (that person's telephone
    number). It works by transforming the key using a
    hash function into a hash, a number that is used
    as an index in an array to locate the desired
    location where the values should be.

5
Hash Function
  • The hashing algorithm
  • is any well-defined procedure or mathematical
    function which converts a large, possibly
    variable-sized amount of data into a small datum,
    usually a single integer that may serve as an
    index into an array. The values returned by a
    hash function are called hash values, hash codes,
    hash sums, or simply hashes.

6
Hash Function
7
1.Direct Hashing The key is the address
without any algorith- mic manipulation. The data
structure must therefore contain an element for
every possible key. While the situations
where you can use direct hashing are limited,
when it can be used it is very powerful because
it guarantees that there are no synonyms.
8
001 Elmer
002 Markh


005 Reymund

007 Hubert




100 Rollyn
Address
5
005
Hash Function
100
100
2
002
Key
9
2.Subtration Method Sometimes we have keys
that are consecutive but do not start from one.
Example A company may have only 100
employees, but the employee numbers start from
1000 and go to 1100. In this case, we use a very
simple hashing function that subtracts 1000 from
the key to determine the address.
10
3.Digit Extraction Selected digits are
extracted from the key and used as the address.
Example Using six-digit employee
number to hash to a three-digit address
(000-999), we could select the first, third, and
fourth digits. 379452 394 121267 112
378845 388 160252 102
11
4.Mod division
379452 Elmer

121267 Markh



378845 Hubert



160252 Arno
045128 Rollyn
001
Divides the key by the array size and uses the
remainder 1
002
003
004
005
3
121267
Hash Function
307
006
045128
1
379452
007
. . . . .
306
307
12
5.Midsquare Hashing The key is squared and the
address selected from the middle of the squared
number. Example 9452 9452
89340304 address is 3403 As a variation, we
can select a portion of the key, and then use
them rather than the whole key. 379452
379 379 143641 address is 364 378845
378 378 142884 address is 288
13
6.Folding Methods There are two folding
methods that are used Fold Shift, the key
value is divided into parts whose size matches
the size of the required address. Then, the left
and right parts are shifted and added with the
middle part. Fold Boundary, the left and
right numbers are folded on a fixed boundary
between them and the center number. This results
in a two outside values being reverse
14
Key
Digits reversed
123456789
123
123
789
789
Digits reversed
Discarded
15
Load Factor Is the number of elements in the
list divided by the number of physical elements
allocated for the list expressed for a
percentage. a k / n x 100
Clustering The tendency of data to build up
unevenly across a hashed list. It is usually
created by collisions.
16
Collision
17
Collision Is the event that occurs when a
hashing algorithm produce an address for an
insertion key and that address is already
occupied.
Home Address The address produced by hashing
algorithm.
Prime Area The memory that contains all of the
home addresses.
Probe Calculation of address and test for
success.
18
B A Collides
C B Collides

A
B
C
1
5
9
17
1. hash(A)
2. hash(B)
3. hash(C)
19
Collision Resolution
  • The process of finding alternate location
  • Collision strategy techniques
  • Separate chaining
  • Open addressing
  • Coalesced hashing
  • Perfect hashing
  • Dynamic perfect hashing
  • Probabilistic hashing
  • Robin hood hashing
  • Cache-conscious collision resolution

20
Separate Chaining
  • Sometimes called simply chaining or direct
    chaining, in its simplest form each slot in the
    array is a linked list, or the head cell of a
    linked list, where the list contains the elements
    that hashed to the same location. Insertion
    requires finding the correct slot, then appending
    to either end of the list in that slot

21
Open Addressing
  • Open addressing hash tables store the records
    directly within the array. This approach is also
    called closed hashing. A hash collision is
    resolved by probing, or searching through
    alternate locations in the array (following a
    probe sequence) until either the target record is
    found, or an unused array slot is found, which
    indicates that there is no such key in the table.

22
Well Known Probe Sequences
23
379452 Elmer

121267 Markh



378845 Hubert



160252 Arno
045128 Rollyn
001
070918
Redjie
002
003
166702
Reymund
004
005
070918
Hash Function
006
166702
007
. . . . .
Linear Probing
Collision is resolved by adding one(1) to the
current address
306
307
24
Quadratic Probing The increment is the
collision probe number squared. Probe Collision
Probe2 and New Num Location
Increment Address 1 1 12
1 1 2 2 22 4 3 3 6 32
9 5 4 15 42 16 7 5 31 52
25 9 6 56 62 36 11
25
Key Offset Is a double hashing method that
produces different collision path for different
keys. Formula offset (key /
listsize) adress ((offset old address)
modulo listsize) 1 For
example if the key is 166702 and the listsize is
307, using the modulo division offset
(166702 / 307) 543 address ((543 002)
modulo 307) 1 239
26
379452 Elmer
070918 Redjie
121267 Markh



378845 Hubert



160252 Arno
045128 Rollyn












001
166702 Reymund
002
003
572556 Angelus
004
005
006
007
. . . . .
306
307
27
                                                
          Hash collision resolved by linear
probing (interval1).
Write a Comment
User Comments (0)
About PowerShow.com