Title: Hash Tables
1IAT 355
- Hash Tables
- Binary Search
- Sorting
2Data Structures
- With a collection of data, we often want to do
many things - Organize
- Iterate
- Add new
- Delete old
- Search
3Data Structures
- It is built to enable fast searching
- What LnkList Tree HashTable
- Store Light Less light Medium
- Iterate simple complex extra work
- Add O(1) O( lgN ) O(1)
- Delete O(1) O( lgN ) O(1)
- Search O(n) O(lgN) O(1)
4Hash Table
- An array in which items are not stored
consecutively - their place of storage is
calculated using the key and a hash function - Hashed key the result of applying a hash
function to a key - Keys and entries are scattered throughout the
array
key
entry
4
hash function
array index
Key
10
123
5Hashing
- insert compute location, insert TableNode O(1)
- find compute location, retrieve entry O(1)
- remove compute location, set it to null O(1)
key
entry
4
10
123
6Hashing example
- 10 stock details, 10 table positions
key
entry
- Stock numbers between 0 and 1000
85 85, apples
0
- Use hash function stock no. / 100
- What if we now insert stock no. 350?
- Position 3 is occupied there is a collision
323 323, guava
462 462, pears
- Collision resolution strategy insert in the next
free position (linear probing)
350 350, oranges
- Given a stock number, we find stock by using the
hash function again, and use the collision
resolution strategy if necessary
912 912, papaya
7Hashing performance
- The hash function
- Ideally, it should distribute keys and entries
evenly throughout the table - It should minimize collisions, where the position
given by the hash function is already occupied - The collision resolution strategy
- Separate chaining chain together several
keys/entries in each position - Open addressing store the key/entry in a
different position - The size of the table
- Too big will waste memory too small will
increase collisions and may eventually force
rehashing (copying into a larger table) - Should be appropriate for the hash function used
and a prime number is best
8Hash function
- Truncation
- Ignore part of the key and use the rest as the
array index (converting non-numeric parts) - A fast technique, but check for an even
distribution throughout the table - Folding
- Partition the key into several parts and then
combine them in any convenient way - Unlike truncation, uses information from the
whole key - Modular arithmetic (used by truncation folding,
and on its own) - To keep the calculated table position within the
table, divide the position by the size of the
table, and take the remainder as the new position
9Hash Function Examples
- Truncation If students have an 9-digit
identification number, take the last 3 digits as
the table position - e.g. 925371622 becomes 622
- Folding Split a 9-digit number into three
3-digit numbers, and add them - e.g. 925371622 becomes 925 376 622 1923
- Modular arithmetic If the table size is 1000,
the first example always keeps within the table
range, but the second example does not (it should
be mod 1000) - e.g. 1923 mod 1000 923 (in Java 1923
1000)
10Choosing the table size to minimize collisions
- As the number of elements in the table increases,
the likelihood of a collision increases - so make
the table as large as practical - If the table size is 100, and all the hashed keys
are divisible by 10, there will be many
collisions! - Particularly bad if table size is a power of a
small integer such as 2 or 10 - More generally, collisions may be more frequent
if - greatest common divisor (hashed keys, table size)
gt 1 - Therefore, make the table size a prime number
(gcd 1)
Collisions may still happen, so we need a
collision resolution strategy
11Collision resolution chaining
- Each table position is a linked list
- Add the keys and entries anywhere in the list
(front easiest) - Advantages over open addressing
- Simpler insertion and removal
- Array size is not a limitation (but should still
minimize collisions make table size roughly
equal to expected number of keys and entries) - Disadvantage
- Memory overhead is large if entries are small
No need to change position!
4
10
123
12Applications of Hashing
- Compilers use hash tables to keep track of
declared variables - A hash table can be used for on-line spelling
checkers if misspelling detection (rather than
correction) is important, an entire dictionary
can be hashed and words checked in constant time - Hash functions can be used to quickly check for
inequality if two elements hash to different
values they must be different - Storing sparse data
13When to use hashing?
- Good if
- Need many searches in a reasonably stable table
- Not So Good if
- Many insertions and deletions,
- If table traversals are needed
- Need things in sorted order
- More data than available memory
- Use a tree and store leaves on disk
14Java
- class HashMap
- Provides hash table functionality in Java
- More overhead, but free implementation
- Be careful to parameterize it carefully
15Bucket Sort
- For Each item to be sorted, compute
- entryIndex key / tableSize
- Chain entries on collision
- Result Each table entry has all the entries in a
range of key values - For some problems, this is enough
- Collision Detection
4
10
123
16Bucket Sort
- Frequently used in graphics interactive apps
- Eg. One bucket per pixel row
- Eg. One bucket for 64x64 pixel region
- Put all data into buckets so that selection
(search) can rapidly locate good candidates
17Search
- Frequently wish to organize data to support
search - Eg. Search for single item
- Eg. Search for all items between 3 and 7
18Search
- Often want to search for an item in a list
- In an unsorted list, must search linearly
- In a sorted list
19Binary Search
- Start with index pointer at start and end
- Compute index between two end pointers
20Binary Search
- Compare middle item to search item
- If search lt mid move end to mid -1
21Binary Search
- int Arr new int8
- ltpopulate arraygt
- int search 4
- int start 0, end Arr.length, mid
- mid (start end)/2
- while( start ltend )
-
- if(search Arrmid )
- SUCCESS
- if( search lt Arrmid )
- end mid 1
- else
- start mid 1
22Binary Search
- Run Time
- O( log(N) )
- Every iteration chops list in half
23Sorting
- Need a sorted list to do binary search
- Numerous sort algorithms
24The family of sorting methods
Main sorting themes
Address- -based sorting
Comparison-based sorting
Proxmap Sort
RadixSort
Transposition sorting
BubbleSort
Diminishing increment sorting
Insert and keep sorted
Divide and conquer
Priority queue sorting
ShellSort
Selection sort
QuickSort
MergeSort
Insertion sort
Tree sort
Heap sort
25Bubble sort transposition sorting
- Not a fast sort!
- Code is small
for (int iarr.length igt0 i--) for (int
j1 jlti j) if (arrj-1 gt arrj)
temp arrj-1
arrj-1 arrj arrj temp
26Divide and conquer sorting
MergeSort
QuickSort
27QuickSort divide and conquer sorting
- As its name implies, QuickSort is the fastest
known sorting algorithm in practice - Its average running time is O(n log n)
- The idea is as follows
- 1. If the number of elements to be sorted is 0 or
1, then return - 2. Pick any element, v (this is called the pivot)
- 3. Partition the other elements into two disjoint
sets, S1 of elements ? v, and S2 of elements gt v - 4. Return QuickSort (S1) followed by v followed
by QuickSort (S2)
28QuickSort example
5
1
4
2
10
3
9
15
12
Pick the middle element as the pivot, i.e., 10
29Partitioning example
5
11
4
25
10
3
9
15
12
Pick the middle element as the pivot, i.e., 10
3010
4
5
25
11
3
9
15
12
9
4
5
3
10
25
11
15
12
31Pseudocode for Quicksort
- procedure quicksort(array, left, right)
- if right gt left
- select a pivot index (e.g. pivotIdx left)
- pivotIdxNew partition(array, left, right,
pivotIdx) - quicksort(array, left, pivotIdxNew - 1)
- quicksort(array, pivotIdxNew 1, right)
32Pseudo code for partitioning
pivotIdx middle of array aswap apivotIdx
with afirst // Move the pivot out of the
way swapPos first 1 for( i swapPos 1 i
lt last i ) if (ai lt afirst)
swap aswapPos with ai swapPos
// Now move the pivot back to its rightful
place swap afirst with aswapPos-1 return
swapPos-1 // Pivot position
33Java
- Sort and binary search provided on Arrays
- sort() ints, floats
- sort( Object a, Comparator c )
- you supply the Comparator object, which
Contains a function to compare 2 objects - binarySearch()
- ints, floats.
- Search Objects with Comparator object