151 presentation | free to download

About This Presentation

Transcript and Presenter's Notes

Title: 151

1
Dictionaries, Tables Hashing

TCSS 342

2
The Dictionary ADT

a dictionary (table) is an abstract model of a
database
like a priority queue, a dictionary stores
key-element pairs
the main operation supported by a dictionary is
searching by key

3
Examples

Telephone directory
Library catalogue
Books in print key ISBN
FAT (File Allocation Table)

4
Main Issues

Size
Operations search, insert, delete, ??? Create
reports??? List?
What will be stored in the dictionary?
How will be items identified?

5
The Dictionary ADT

simple container methods
size()
isEmpty()
elements()
query methods
findElement(k)
findAllElements(k)

6
The Dictionary ADT

update methods
insertItem(k, e)
removeElement(k)
removeAllElements(k)
special element
NO_SUCH_KEY, returned by an unsuccessful search

7
Implementing a Dictionary with a Sequence

unordered sequence
searching and removing takes O(n) time
inserting takes O(1) time
applications to log files (frequent insertions,
rare searches and removals) 34 14 12 22 18

34
14
12
22
18
8
Implementing a Dictionary with a Sequence

array-based ordered sequence (assumes keys can
be ordered)- searching takes O(log n) time
(binary search)- inserting and removing takes
O(n) time- application to look-up tables
(frequent searches, rare insertions and removals)

12
14
18
22
34
9
Binary Search

narrow down the search range in stages
high-low game
findElement(22)

2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
14
low
mid
high
10
Binary Search
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
25
low
mid
high
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
19
low
mid
high
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
22
low mid high
11
Pseudocode for Binary SearchAlgorithm

BinarySearch(S, k, low, high)if low high then
return NO_SUCH_KEYelse mid (lowhigh) /
2if k key(mid) then return key(mid)else
if k k, low, mid-1)else return BinarySearch(S,
k, mid1, high)

12
Running Time of Binary Search

The range of candidate items to be searched is
halved after each comparison

13
Running Time of Binary Search

In the array-based implementation, access by rank
takes O(1) time, thus binary search runs in O(log
n) time
Binary Search is applicable only to Random Access
structures (Arrays, Vectors)

14
Implementations

Sorted? Non Sorted?
Elementary Arrays, vectors linked lists
Orgainization None (log file), Sorted, Hashed
Advanced balanced trees

15
Skip Lists

Simulate Binary Search on a linked list.
Linked list allows easy insertion and deletion.
http//www.epaperpress.com/s_man.html

16
Hashing

Place item with key k in position h(k).
Hope h(k) is 1-1.
Requires unique key (unless multiple items
allowed). Key must be protected from change (use
abstract class that provides only a constructor).
Keys must be comparable.

17
Key class

public abstract class KeyID
Private Comparable searchKey
Public KeyID(Comparable m)
searchKey m
//Only one constructor
public Comparable getSearchKey()
return searchKey

18
Hash Tables

RTT is a large phone company, and they want to
provide enhanced caller ID capability
given a phone number, return the callers name
phone numbers are in the range 0 to R 10101
n is the number of phone numbers used
want to do this as efficiently as possible

19
Alternatives

There are a few ways to design this dictionary
Balanced search tree (AVL, red-black, 2-4 trees,
B-trees) or a skip-list with the phone number as
the key has O(log n) query time and O(n) space
--- good space usage and search time, but can we
reduce the search time to constant?
A bucket array indexed by the phone number has
optimal O(1) query time, but there is a huge
amount of wasted space O(n R)

20
Bucket Array

Each cell is thought of as a bucket or a
container
Holds key element pairs
In array A of size N, an element e with key k is
inserted in Ak.
Table operations without searches!

(null)
(null)
Roberto
(null)

000-000-0000 000-000-0001
401-863-7639 ... 999-999-9999 Note we
need 10,000,000,000 buckets!
21
Generalized indexing

Hash table
Data storage location associated with a key
The key need not be an integer, but keys must be
comparable.

22
Hash Tables

A data structure
The location of an item is determined
Directly as a function of the item itself
Not by a sequence of trial and error comparisons
Commonly used to provide faster searching.
Comparisons of searching time
O(n) for linear searches
O (logn) for binary search
O(1) for hash table

23
Examples

A symbol table constructed by a compiler.
Stores identifiers and information about them in
an array.
File systems
I-node location of a file in a file system.
Personal records
Personal information retrieval based on key

24
Hashing Engine

itemKey

Position Calculator
25
Example

Insert item (401-863-7639, Roberto) into a table
of size 5
calculate 4018637639 mod 5 4, insert item
(401-863-7639, Roberto) in position 4 of the
table (array, vector).
A lookup uses the same process use the hash
engine to map the key to a position, then check
the array cell at that position.

401- 863-7639 Roberto
0 1 2 3
4
26
Chaining

The expected, search/insertion/removal time is
O(n/N), provided the indices are uniformly
distributed
The performance of the data structure can be
fine-tuned by changing the table size N

27
From Keys to Indices

The mapping of keys to indices of a hash table is
called a hash function
A hash function is usually the composition of two
maps
hash code map key ? integer
compression map integer ? 0, N - 1
An essential requirement of the hash function is
tomap equal keys to equal indices.
A good hash function is fast and minimizes the
probability of collisions

28
Perfect hash functions

A perfect hash function maps each key to a unique
position.
A perfect hash function can be constructed if we
know in advance all the keys to be stored in the
table (almost never)

29
A good hash function

Be easy and fast to compute
Distribute items evenly throughout the hash table
Efficient collision resolution.

30
Popular Hash-Code Maps

Integer cast for numeric types with 32 bits or
less, we can reinterpret the bits of the number
as an int
Component sum for numeric types with more than
32 bits (e.g., long and double), we can add the
32-bit components.

31
Sample of hash functions

Digit selection
h(2536924520) 590
(select 2-nd, 5-th and last digits).
This is usually not a good hash function. It will
not distribute keys evenly.
A hash function should use every part of the key.

32
Sample (continued)

Folding add all digits
Modulo arithmetic
h(key) h(x) x mod table_size.
The modulo arithmetic is a very popular basis for
hash functions. To better the chance of even
distribution table_size should be a prime number.
If n is the number of items there is always a
prime p, n

33
Popular Hash-Code Maps

Polynomial accumulation for strings of a natural
language, combine the character values (ASCII or
Unicode) a 0 a 1 ... a n-1 by viewing them as the
coefficients of a polynomial a 0 a 1 x ...
a n-1 x n-1
For instance, choosing x 33, 37, 39, or 41
gives at most 6 collisions on a vocabulary of
50,000 English words.

34
Popular Hash-Code Maps

Why is the component-sum hash code bad for
strings?

35
Popular Compression Maps

Division h(k) k mod N
the choice N 2 k is bad because not all the bits
aretaken into account
the table size N is usually chosen as a
primenumber
certain patterns in the hash codes are propagated
Multiply, Add, and Divide (MAD)
h(k) ak b mod N
eliminates patterns provided a mod N ¹ 0
same formula used in linear congruential
(pseudo)random number generators

36
Java Hash

Java provides a hashCode() method for the Object
class, which typically returns the 32-bit memory
address of the object.
This default hash code would work poorly for
Integer and String objects
The hashCode() method should be suitably
redefined by classes.

37
Collision

A collision occurs when two distinct items are
mapped to the same position.
Insert (401-863-9350, Andy) ? 0
And insert (401-863-2234, Devin). 4018632234 ? 4.
We have a collision!

401- 863-9350 Andy
401- 863-7639 Roberto
0 1 2 3
4
38
Collision Resolution

How to deal with two keys which map to the same
cell of the array?
Need policies, design good Hashing engines that
will minimize collisions.

39
Chaining I

Use chaining
Each position is viewed as a container of a list
of items, not a single item. All items in this
list share the same hash value.

40
Chaining II
0 1 2 3 4
41
Collisions resolution policies

A key is mapped to an already occupied table
location
what to do?!?
Use a collision handling technique
Chaining (may have less buckets than items)
Open Addressing (load factor
Linear Probing
Quadratic Probing
Double Hashing

42
Linear Probing

If the current location is used, try the next
table location
linear_probing_insert(K)if (table is full)
errorprobe h(K)while (tableprobe
occupied)probe (probe 1) mod Mtableprobe
K

43
Linear Probing

Lookups walk along table until the key or an
empty slot is found
Uses less memory than chaining
dont have to store all those links
Slower than chaining
may have to walk along table for a long way
Deletion is more complex
either mark the deleted slot
or fill in the slot by shifting some elements down

44
Linear Probing Example

h(k) k mod 13
Insert keys
18 41 22 44 59 32 31 73

0 1 2 3 4 5 6 7
8 9 10 11 12
41
18
44
59
32
22
31
72
0 1 2 3 4 5 6 7
8 9 10 11 12
45
Double Hashing

Use two hash functions
If M is prime, eventually will examine every
position in the table
double_hash_insert(K)if(table is full)
errorprobe h1(K)offset h2(K)while
(tableprobe occupied) probe (probe
offset) mod Mtableprobe K

46
Double Hashing

Many of same (dis)advantages as linear probing
Distributes keys more uniformly than linear
probing does

47
Double Hashing Example

h1(K) K mod 13
h2(K) 8 - K mod 8
we want h2 to be an offset to add
18 41 22 44 59 32 31 73
h1(44) 5 (occupied) h2(0) 8 44 ? 58 Mod 13

0 1 2 3 4 5 6 7
8 9 10 11 12
44
41
73
18
32
53
31
22
0 1 2 3 4 5 6 7
8 9 10 11 12
48
Why so many Hash functions?

Its different strokes for different folks.
We seldom know the nature of the object that will
be stored in our dictionary.

49
A FAT Example

Directory Key file name. Data (time, date,
size ) location of first block in the FAT table.
If first block is in physical location 23 (Disk
block number) look up position 23 in the FAT.
Either shows end of file or has the block number
on disk.
Example Directory entry block 4
FAT x x x F 5 6 10 x 23 25
3
The file occupies blocks 4,5,6,10, 3.

Write a Comment

User Comments (0)

About PowerShow.com

151 PowerPoint PPT Presentation