Dictionaries, Tables Hashing - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Dictionaries, Tables Hashing

Description:

Telephone directory. Library catalogue. Books in print: key ISBN. FAT (File Allocation Table) ... abucket array indexed by the phone number has optimal O(1) ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 59
Provided by: christin90
Category:

less

Transcript and Presenter's Notes

Title: Dictionaries, Tables Hashing


1
Dictionaries, Tables Hashing
  • TCSS 342

2
The Dictionary ADT
  • a dictionary (table) is an abstract model of a
    database
  • like a priority queue, a dictionary stores
    key-element pairs
  • the main operation supported by a dictionary is
    searching by key

3
Examples
  • Telephone directory
  • Library catalogue
  • Books in print key ISBN
  • FAT (File Allocation Table)

4
Main Issues
  • Size
  • Operations search, insert, delete, ??? Create
    reports??? List?
  • What will be stored in the dictionary?
  • How will be items identified?

5
The Dictionary ADT
  • simple container methods
  • size()
  • isEmpty()
  • elements()
  • query methods
  • findElement(k)
  • findAllElements(k)

6
The Dictionary ADT
  • update methods
  • insertItem(k, e)
  • removeElement(k)
  • removeAllElements(k)
  • special element
  • NO_SUCH_KEY, returned by an unsuccessful search

7
Implementing a Dictionary with a Sequence
  • unordered sequence
  • searching and removing takes O(n) time
  • inserting takes O(1) time
  • applications to log files (frequent insertions,
    rare searches and removals) 34 14 12 22 18

34
14
12
22
18
8
Implementing a Dictionary with a Sequence
  • array-based ordered sequence (assumes keys can
    be ordered)- searching takes O(log n) time
    (binary search)- inserting and removing takes
    O(n) time- application to look-up tables
    (frequent searches, rare insertions and removals)

12
14
18
22
34
9
Binary Search
  • narrow down the search range in stages
  • high-low game
  • findElement(22)

2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
14
low
mid
high
10
Binary Search
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
25
low
mid
high
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
19
low
mid
high
2
4
5
7
8
9
12
14
17
19
22
25
27
28
33
37
22
low mid high
11
Pseudocode for Binary SearchAlgorithm
  • BinarySearch(S, k, low, high)if low high then
    return NO_SUCH_KEYelse mid (lowhigh) /
    2if k key(mid) then return key(mid)else
    if k k, low, mid-1)else return BinarySearch(S,
    k, mid1, high)

12
Running Time of Binary Search
  • The range of candidate items to be searched is
    halved after each comparison

13
Running Time of Binary Search
  • In the array-based implementation, access by rank
    takes O(1) time, thus binary search runs in O(log
    n) time
  • Binary Search is applicable only to Random Access
    structures (Arrays, Vectors)

14
Implementations
  • Sorted? Non Sorted?
  • Elementary Arrays, vectors linked lists
  • Orgainization None (log file), Sorted, Hashed
  • Advanced balanced trees

15
Skip Lists
  • Simulate Binary Search on a linked list.
  • Linked list allows easy insertion and deletion.
  • http//www.epaperpress.com/s_man.html

16
A FAT Example
  • Directory Key file name. Data (time, date,
    size ) location of first block in the FAT table.
  • If first block is in physical location 23 (Disk
    block number) look up position 23 in the FAT.
    Either shows end of file or has the block number
    on disk.
  • Example Directory entry block 4
  • FAT x x x F 5 6 10 x 23 25
  • 3
  • The file occupies blocks 4,5,6,10, 3.

17
Hashing
  • Place item with key k in position h(k).
  • Hope h(k) is 1-1.
  • Requires unique key (unless multiple items
    allowed). Key must be protected from change (use
    abstract class that provides only a constructor).
  • Keys must be comparable.

18
Key class
  • public abstract class KeyID
  • Private Comparable searchKey
  • Public KeyID(Comparable m)
  • searchKey m
  • //Only one constructor
  • public Comparable getSearchKey()
  • return searchKey

19
Hashing Problem
  • RTT is a large phone company, and they want to
    provide enhanced caller ID capability
  • given a phone number, return the callers name
  • phone numbers are in the range 0 to R 10101
  • n is the number of phone numbers used
  • want to do this as efficiently as possible

20
Hashing Problem
  • We know two ways to design this dictionary
  • abalanced search tree (AVL, red-black) or a
    skip-list with the phone number as the key has
    O(log n) query time and O(n) space --- good space
    usage and search time, but can we reduce the
    search time to constant?
  • abucket array indexed by the phone number has
    optimal O(1) query time, but there is a huge
    amount of wasted space O(n R)

21
Bucket Array
  • Each cell is thought of as a bucket or a
    container
  • Holds key element pairs
  • In array A of size N, an element e with key k is
    inserted in Ak.

(null)
(null)
Roberto
(null)


000-000-0000 000-000-0001
401-863-7639 ... 999-999-9999
22
Generalized indexing
  • Hash table
  • Data storage associated with a key
  • The key need not be an integer

23
Hash Tables
  • A data structure
  • The location of an item is determined
  • directly as a function of the item itself
  • Not by a sequence of trial and error comparisons
  • Commonly used to provide faster searching
  • O(n) for linear searches
  • O (logn) for binary search
  • O(1) for hash table

24
Example
  • A symbol table constructed by a compiler
  • Stores identifiers and information about them

25
Another Solution
  • A Hash Table is an alternative solution with O(1)
    expected query time and O(n N) space, where N
    is the size of the table
  • Like an array, but with a function to map the
    large range of keys into a smaller one
  • e.g., take the original key, mod the size of the
    table, and use that as an index

26
Example
  • Insert item (401-863-7639, Roberto) into a table
    ofsize 5
  • 4018637639 mod 5 4, so item (401-863-7639,
    Roberto) is stored in slot 4 of the table
  • A lookup uses the same process map the key to an
    index, then check the array cell at that index

401- 863-7639 Roberto
0 1 2 3
4
27
Collision
  • Insert (401-863-9350, Andy)
  • And insert (401-863-2234, Devin). We have a
    collision!

28
Collision Resolution
  • How to deal with two keys which map to the same
    cell of the array?
  • Use chaining
  • Set up lists of items with the same index

29
Chaining
0 1 2 3 4
30
Chaining
  • The expected, search/insertion/removal time is
    O(n/N), provided the indices are uniformly
    distributed
  • The performance of the data structure can be
    fine-tuned by changing the table size N

31
Hash Function
  • Function h defined by h(i) i
  • Determines the location of an item i in the hash
    table
  • Called a hash function.
  • To reduce the large size of a hash table use
  • h(i) i mod 25

32
From Keys to Indices
  • The mapping of keys to indices of a hash table is
    called a hash function
  • A hash function is usually the composition of two
    maps
  • hash code map key ? integer
  • compression map integer ? 0, N - 1
  • An essential requirement of the hash function is
    tomap equal keys to equal indices
  • A good hash function minimizes the probability
    of collisions

33
Java Hash
  • Java provides a hashCode() method for the Object
    class, which typically returns the 32-bit memory
    address of the object.
  • This default hash code would work poorly for
    Integer and String objects
  • The hashCode() method should be suitably
    redefined by classes.

34
Popular Hash-Code Maps
  • Integer cast for numeric types with 32 bits or
    less, we can reinterpret the bits of the number
    as an int
  • Component sum for numeric types with more than
    32 bits (e.g., long and double), we can add the
    32-bit components.

35
Popular Hash-Code Maps
  • Polynomial accumulation for strings of a natural
    language, combine the character values (ASCII or
    Unicode) a 0 a 1 ... a n-1 by viewing them as the
    coefficients of a polynomial a 0 a 1 x ...
    x n-1 a n-1

36
Popular Hash-Code Maps
  • The polynomial is computed with Horners rule,
    ignoring overflows, at a fixed value xa0 x
    (a1 x (a2 ... x (an-2 x an-1 ) ... ))
  • The choice x 33, 37, 39, or 41 gives at most 6
    collisions on a vocabulary of 50,000 English
    words
  • Why is the component-sum hash code bad for
    strings?

37
Random Hashing
  • Random hashing
  • Uses a simple random number generation technique
  • Scatters the items randomly throughout the hash
    table

38
Popular Compression Maps
  • Division h(k) k mod N
  • the choice N 2 k is bad because not all the bits
    aretaken into account
  • the table size N is usually chosen as a
    primenumber
  • certain patterns in the hash codes are propagated
  • Multiply, Add, and Divide (MAD)
  • h(k) ak b mod N
  • eliminates patterns provided a mod N ¹ 0
  • same formula used in linear congruential
    (pseudo)random number generators

39
More on Collisions
  • A key is mapped to an already occupied table
    location
  • what to do?!?
  • Use a collision handling technique
  • Weve seen Chaining
  • Can also use Open Addressing
  • Double Hashing
  • Linear Probing

40
Linear Probing
  • If the current location is used, try the next
    table location
  • linear_probing_insert(K)if (table is full)
    errorprobe h(K)while (tableprobe
    occupied)probe (probe 1) mod Mtableprobe
    K

41
Linear Probing
  • Lookups walk along table until the key or an
    empty slot is found
  • Uses less memory than chaining
  • dont have to store all those links
  • Slower than chaining
  • may have to walk along table for a long way
  • Deletion is more complex
  • either mark the deleted slot
  • or fill in the slot by shifting some elements down

42
Linear Probing Example
  • h(k) k mod 13
  • Insert keys
  • 18 41 22 44 59 32 31 73

0 1 2 3 4 5 6 7
8 9 10 11 12
41
18
44
59
32
22
31
72
0 1 2 3 4 5 6 7
8 9 10 11 12
43
Double Hashing
  • Use two hash functions
  • If M is prime, eventually will examine every
    position in the table
  • double_hash_insert(K)if(table is full)
    errorprobe h1(K)offset h2(K)while
    (tableprobe occupied) probe (probe
    offset) mod Mtableprobe K

44
Double Hashing
  • Many of same (dis)advantages as linear probing
  • Distributes keys more uniformly than linear
    probing does

45
Double Hashing Example
  • h1(K) K mod 13
  • h2(K) 8 - K mod 8
  • we want h2 to be an offset to add
  • 18 41 22 44 59 32 31 73

0 1 2 3 4 5 6 7
8 9 10 11 12
44
41
73
18
32
53
31
22
0 1 2 3 4 5 6 7
8 9 10 11 12
46
Hash code
  • static int hashCode(long i)
  • return (int)((i 32) (int) i)

47
Hash code
  • static int hashCode(String s) int h0 for
    (int i0 i(h 27) // 5-bit cyclic shift of the running
    sum h (int) s.charAt(i) // add in next
    character return h

48
Linear Probing Hash Table
  • public class LinearProbingHashTable implements
    Dictionary
  • / Marker for deactivated buckets /
  • private static Item AVAILABLE new Item(null,
    null)
  • / number of items in the dictionary /
  • private int n 0
  • / capacity of the bucket array /
  • private int N
  • / bucket array /
  • private Item A

49
Linear Probing Hash Table
  • / hash comparator /
  • private HashComparator h
  • / constructor providing the hash comparator /
  • public LinearProbingHashTable(HashComparator hc)
  • h hc
  • N 1023 // default capacity
  • A new ItemN

50
Linear Probing Hash Table
  • / constructor providing the hash comparator
    and the capacity
  • of the bucket array /
  • public LinearProbingHashTable(HashComparator hc,
    int bN)
  • h hc
  • N bN
  • A new ItemN

51
Linear Probing Hash Table
  • // auxiliary methods
  • private boolean available(int i)
  • return (Ai AVAILABLE)
  • private boolean empty(int i)
  • return (Ai null)

52
Linear Probing Hash Table
  • private Object key(int i)
  • return Ai.key()
  • private Object element(int i)
  • return Ai.element()
  • private void check(Object k)
  • if (!h.isComparable(k)) throw new
    InvalidKeyException("Invalid key.")

53
Helper search method
  • / helper search method /
  • private int findItem(Object key) throws
    InvalidKeyException
  • check(key)
  • int i h.hashValue(key) N // division
    method compression map
  • int j i

54
Helper search method
  • do
  • if (empty(i))
  • return -1 // item is not found
  • if (available(i))
  • i (i 1) N // bucket is deactivated
  • else if (h.isEqualTo(key(i), key)) // we have
    found our item
  • return i
  • else // we must keep looking
  • i (i 1) N
  • while (i ! j)
  • return -1 // item is not found

55
Dictionary
  • // methods of the dictionary ADT
  • public Object findElement (Object key) throws
    InvalidKeyException
  • int i findItem(key) // helper method for
    finding a key
  • if (i
  • return Dictionary.NO_SUCH_KEY
  • return element(i)

56
Dictionary
  • public void insertItem (Object key, Object
    element) throws InvalidKeyException
  • check(key)
  • int i h.
  • hashValue(key) N // division method
    compression map
  • int j i

57
Dictionary
  • // remember where we are starting
  • do
  • if (empty(i) available(i)) // this slot
    is available
  • Ai new Item(key, element)
  • n
  • return
  • i (i 1) N // check next slot
  • while (i ! j) // repeat until we return to
    start
  • throw new HashTableFullException("Hash table is
    full.")

58
Dictionary
  • public Object removeElement (Object key) throws
    InvalidKeyException
  • int i findItem(key) // find this key first
  • if (i
  • return Dictionary.NO_SUCH_KEY
  • // nothing to remove
  • Object toReturn element(i)
  • Ai AVAILABLE // mark this slot as
    deactivated
  • n--
  • return toReturn
Write a Comment
User Comments (0)
About PowerShow.com