Dictionaries, Tables Hashing - PowerPoint PPT Presentation

1 / 58

About This Presentation

Title:

Dictionaries, Tables Hashing

Description:

Telephone directory. Library catalogue. Books in print: key ISBN. FAT (File Allocation Table) ... abucket array indexed by the phone number has optimal O(1) ... – PowerPoint PPT presentation

Number of Views:109

Avg rating:3.0/5.0

Slides: 59

Provided by: christin90

Learn more at: http://faculty.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: Dictionaries, Tables Hashing

1
Dictionaries, Tables Hashing

TCSS 342

2
The Dictionary ADT

a dictionary (table) is an abstract model of a
database
like a priority queue, a dictionary stores
key-element pairs
the main operation supported by a dictionary is
searching by key

3
Examples

Telephone directory
Library catalogue
Books in print key ISBN
FAT (File Allocation Table)

4
Main Issues

Size
Operations search, insert, delete, ??? Create
reports??? List?
What will be stored in the dictionary?
How will be items identified?

5
The Dictionary ADT

simple container methods
size()
isEmpty()
elements()
query methods
findElement(k)
findAllElements(k)

6
The Dictionary ADT

update methods
insertItem(k, e)
removeElement(k)
removeAllElements(k)
special element
NO_SUCH_KEY, returned by an unsuccessful search

7
Implementing a Dictionary with a Sequence

unordered sequence
searching and removing takes O(n) time
inserting takes O(1) time
applications to log files (frequent insertions,
rare searches and removals) 34 14 12 22 18

34
14
12
22
18
8
Implementing a Dictionary with a Sequence

array-based ordered sequence (assumes keys can
be ordered)- searching takes O(log n) time
(binary search)- inserting and removing takes
O(n) time- application to look-up tables
(frequent searches, rare insertions and removals)

12
14
18
22
34
9
Binary Search

narrow down the search range in stages
high-low game
findElement(22)

2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
14
low
mid
high
10
Binary Search
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
25
low
mid
high
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
19
low
mid
high
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
22
low mid high
11
Pseudocode for Binary SearchAlgorithm

BinarySearch(S, k, low, high)if low high then
return NO_SUCH_KEYelse mid (lowhigh) /
2if k key(mid) then return key(mid)else
if k k, low, mid-1)else return BinarySearch(S,
k, mid1, high)

12
Running Time of Binary Search

The range of candidate items to be searched is
halved after each comparison

13
Running Time of Binary Search

In the array-based implementation, access by rank
takes O(1) time, thus binary search runs in O(log
n) time
Binary Search is applicable only to Random Access
structures (Arrays, Vectors)

14
Implementations

Sorted? Non Sorted?
Elementary Arrays, vectors linked lists
Orgainization None (log file), Sorted, Hashed
Advanced balanced trees

15
Skip Lists

Simulate Binary Search on a linked list.
Linked list allows easy insertion and deletion.
http//www.epaperpress.com/s_man.html

16
A FAT Example

Directory Key file name. Data (time, date,
size ) location of first block in the FAT table.
If first block is in physical location 23 (Disk
block number) look up position 23 in the FAT.
Either shows end of file or has the block number
on disk.
Example Directory entry block 4
FAT x x x F 5 6 10 x 23 25
3
The file occupies blocks 4,5,6,10, 3.

17
Hashing

Place item with key k in position h(k).
Hope h(k) is 1-1.
Requires unique key (unless multiple items
allowed). Key must be protected from change (use
abstract class that provides only a constructor).
Keys must be comparable.

18
Key class

public abstract class KeyID
Private Comparable searchKey
Public KeyID(Comparable m)
searchKey m
//Only one constructor
public Comparable getSearchKey()
return searchKey

19
Hashing Problem

RTT is a large phone company, and they want to
provide enhanced caller ID capability
given a phone number, return the callers name
phone numbers are in the range 0 to R 10101
n is the number of phone numbers used
want to do this as efficiently as possible

20
Hashing Problem

We know two ways to design this dictionary
abalanced search tree (AVL, red-black) or a
skip-list with the phone number as the key has
O(log n) query time and O(n) space --- good space
usage and search time, but can we reduce the
search time to constant?
abucket array indexed by the phone number has
optimal O(1) query time, but there is a huge
amount of wasted space O(n R)

21
Bucket Array

Each cell is thought of as a bucket or a
container
Holds key element pairs
In array A of size N, an element e with key k is
inserted in Ak.

(null)
(null)
Roberto
(null)

000-000-0000 000-000-0001
401-863-7639 ... 999-999-9999
22
Generalized indexing

Hash table
Data storage associated with a key
The key need not be an integer

23
Hash Tables

A data structure
The location of an item is determined
directly as a function of the item itself
Not by a sequence of trial and error comparisons
Commonly used to provide faster searching
O(n) for linear searches
O (logn) for binary search
O(1) for hash table

24
Example

A symbol table constructed by a compiler
Stores identifiers and information about them

25
Another Solution

A Hash Table is an alternative solution with O(1)
expected query time and O(n N) space, where N
is the size of the table
Like an array, but with a function to map the
large range of keys into a smaller one
e.g., take the original key, mod the size of the
table, and use that as an index

26
Example

Insert item (401-863-7639, Roberto) into a table
ofsize 5
4018637639 mod 5 4, so item (401-863-7639,
Roberto) is stored in slot 4 of the table
A lookup uses the same process map the key to an
index, then check the array cell at that index

401- 863-7639 Roberto
0 1 2 3
4
27
Collision

Insert (401-863-9350, Andy)
And insert (401-863-2234, Devin). We have a
collision!

28
Collision Resolution

How to deal with two keys which map to the same
cell of the array?
Use chaining
Set up lists of items with the same index

29
Chaining
0 1 2 3 4
30
Chaining

The expected, search/insertion/removal time is
O(n/N), provided the indices are uniformly
distributed
The performance of the data structure can be
fine-tuned by changing the table size N

31
Hash Function

Function h defined by h(i) i
Determines the location of an item i in the hash
table
Called a hash function.
To reduce the large size of a hash table use
h(i) i mod 25

32
From Keys to Indices

The mapping of keys to indices of a hash table is
called a hash function
A hash function is usually the composition of two
maps
hash code map key ? integer
compression map integer ? 0, N - 1
An essential requirement of the hash function is
tomap equal keys to equal indices
A good hash function minimizes the probability
of collisions

33
Java Hash

Java provides a hashCode() method for the Object
class, which typically returns the 32-bit memory
address of the object.
This default hash code would work poorly for
Integer and String objects
The hashCode() method should be suitably
redefined by classes.

34
Popular Hash-Code Maps

Integer cast for numeric types with 32 bits or
less, we can reinterpret the bits of the number
as an int
Component sum for numeric types with more than
32 bits (e.g., long and double), we can add the
32-bit components.

35
Popular Hash-Code Maps

Polynomial accumulation for strings of a natural
language, combine the character values (ASCII or
Unicode) a 0 a 1 ... a n-1 by viewing them as the
coefficients of a polynomial a 0 a 1 x ...
x n-1 a n-1

36
Popular Hash-Code Maps

The polynomial is computed with Horners rule,
ignoring overflows, at a fixed value xa0 x
(a1 x (a2 ... x (an-2 x an-1 ) ... ))
The choice x 33, 37, 39, or 41 gives at most 6
collisions on a vocabulary of 50,000 English
words
Why is the component-sum hash code bad for
strings?

37
Random Hashing

Random hashing
Uses a simple random number generation technique
Scatters the items randomly throughout the hash
table

38
Popular Compression Maps

Division h(k) k mod N
the choice N 2 k is bad because not all the bits
aretaken into account
the table size N is usually chosen as a
primenumber
certain patterns in the hash codes are propagated
Multiply, Add, and Divide (MAD)
h(k) ak b mod N
eliminates patterns provided a mod N ¹ 0
same formula used in linear congruential
(pseudo)random number generators

39
More on Collisions

A key is mapped to an already occupied table
location
what to do?!?
Use a collision handling technique
Weve seen Chaining
Can also use Open Addressing
Double Hashing
Linear Probing

40
Linear Probing

If the current location is used, try the next
table location
linear_probing_insert(K)if (table is full)
errorprobe h(K)while (tableprobe
occupied)probe (probe 1) mod Mtableprobe
K

41
Linear Probing

Lookups walk along table until the key or an
empty slot is found
Uses less memory than chaining
dont have to store all those links
Slower than chaining
may have to walk along table for a long way
Deletion is more complex
either mark the deleted slot
or fill in the slot by shifting some elements down

42
Linear Probing Example

h(k) k mod 13
Insert keys
18 41 22 44 59 32 31 73

0 1 2 3 4 5 6 7
8 9 10 11 12
41
18
44
59
32
22
31
72
0 1 2 3 4 5 6 7
8 9 10 11 12
43
Double Hashing

Use two hash functions
If M is prime, eventually will examine every
position in the table
double_hash_insert(K)if(table is full)
errorprobe h1(K)offset h2(K)while
(tableprobe occupied) probe (probe
offset) mod Mtableprobe K

44
Double Hashing

Many of same (dis)advantages as linear probing
Distributes keys more uniformly than linear
probing does

45
Double Hashing Example

h1(K) K mod 13
h2(K) 8 - K mod 8
we want h2 to be an offset to add
18 41 22 44 59 32 31 73

0 1 2 3 4 5 6 7
8 9 10 11 12
44
41
73
18
32
53
31
22
0 1 2 3 4 5 6 7
8 9 10 11 12
46
Hash code

static int hashCode(long i)
return (int)((i 32) (int) i)

47
Hash code

static int hashCode(String s) int h0 for
(int i0 i(h 27) // 5-bit cyclic shift of the running
sum h (int) s.charAt(i) // add in next
character return h

48
Linear Probing Hash Table