Title: Dynamic Set ADT; Dynamic Set Dictionary
1Dynamic Set ADT Dynamic Set Dictionary
- Definitions of Dynamic Set and Dictionary
- Implementation of Dictionary with lists and
arrays - Implementation with DAT
- Implementation with hash table using collision
resolution by chaining - Complexity of the operations under simple uniform
hashing - Selecting a hash function
- Applications
2Dynamic Sets ADT
- GOAL investigate data structures and algorithms
that support efficient implementation of various
operations on sets. - Dynamic sets may change size over time
- Key identifier of an element.
- Operations
- Search
- Insert
- Delete
- Min
- Max
- Predecessor
- Successor
3Dictionary Go look it up!
- Primary use store data so they can be located
quickly using keys. - Examples of dictionary
- the set of bank accounts
- the set of windows opened by GUI
- student database
- the symbol table used by compilers
- Dictionary ADT A dynamic set which support
Search, Insert, Delete, possibly Update. - Hash Table data structure implements
Dictionary - worst-case time to perform the operations is
O(n) - expected time is O(1)
4Dictionary ADT
x
- Dictionary of records, T.
- Each record has (key,data),
- Keys are distinct
- x is a reference to a dictionary record
- Operations
- insert(T,x) inserts the record pointed to by x
into T - delete(T,x), removes record point by x from T
- search(T,key), returns a pointer to the record
with the given key, or null if no record has
that key - update(T,oldrec,newrec), updates the record
pointed by oldrec to have the record pointed to
by newrec. - Example
- x search(key)
- delete(x)
key
data
5Implementation of Dictionary
- Input size n, number records in the dictionary
- Worst-case complexity sing lists and arrays
Insert T(n) Search T(n) Delete T(n) Update T(n) Space Complx.
Unsorted Dbl-lnkd
Sorted Dbl-lnkd
Sorted Array
6Implementation of Dictionary DAT
- Direct-access table, T
- Datum or reference corresponding to key k is
stored in slot k. - If T(k)NULL, no record with key k.
- Example, r5
T
0
U
1
3
2
3
0
4
4
7DAT Complexity of the operations
- DirectAddress_Search( T, key k)
- return Tk
- DirectAddress_Insert( T, ptr x to element)
- Tkey(x)x
- DirectAddressDelete( T, ptr x to element)
- Tkey(x) NULL
- Each operation time.
- Moderate r, rlt1000
- What if the number of keys, n , stored at any
particular time much smaller than r? - Example student dictionary, 109, n4000.
8Hash Table
T
- A version of DAT where item with key k is stored
at slot h(k) - The keys do not have to be integers
- h is a hash function, maps keys to integers,
- h hashes to slot
0
1
2
U
3
h
4
hash function
m-1
9Hash Tables
- Hash tables are typically one of the most
efficient ways of implementing a Dictionary ADT,
particularly if we know something about the
distribution of the key values - Hash tables do not support efficiently operations
that rely on relative order of data elements, for
example fining min, max or sorting - Since
- h hashes multiple keys to the same slot.
- Collision occurs when two keys hashed to the
same slot - Cannot avoid collision. Must resolve it.
10Collision resolution by chaining
- Put all elements that hush to the same slot in an
unsorted doubly-linked list, where the hash table
entry, T(h(k)), is a pointer to the first item
in the list
h
hash function
U
11Implementation of a Dictionary using hashing with
collision resolution by chaining
- T is the hash table, x is a ptr to a record
- key(x) returns the key of the record pointed by x
- ChainedHash_Search( T, k )
- search key k in the unsorted list Th(key)
- ChainedHash_Insert( T, x )
- insert the record pointed by x at the head
of the list Th(key(x)) - ChainedHash_Delete( T, x )
- delete x from the doubly-linked list
Th(key(x))
12Time complexity analysis of hashing with
collision resolution by chaining
- Computing h(k), h(k)
- Insert worst case
- Search worst-case, all keys hash to the same
slot, - Delete
- Update , (including delete and reinsert
if key changes). - Load factor , average
number keys per slot - m size of the hash table
- n number records currently in the dictionary
13Average-case time complexity using hashing with
collision resolution by chaining
- Assume simple uniform hashing for any key k,
h(k) is equally likely to hash to any of the m
slots of the table T - Search the expected time to search is the time
to hash the key, plus the expected length of the
list at the hashed slot. - Let T(j) the linked list at slot j
- Let be R.V. denoting the length of T(j) ,
j1,..,m. - What is the distribution of . For
i0,1,2,,n,
14Average-case time complexity using hashing with
collision resolution by chaining
- Search (cont) T(j) is the linked list at slot
j, is RV denoting the length of T(j) ,
j1,..,m. The distribution of - The expected length is
-
15Average-case time complexity using hashing with
collision resolution by chaining
- Assumed simple uniform hashing
- Search the expected time to search is the
expected length to hash, - , plus E(T(j)),
- The average-csea complexity for search is
, - i.e. one plus the avg number keys per slot
- If
- Insert, Delete, Update
16Selecting a hash function (HF)
- A good HF should not give preference to some
slots over others - HF should distribute keys uniformly, i.e. a key
is equally likely to hash to any of the slots
simple uniform hashing (SUH) - If the keys are drawn from U according to
distribution P, and K the RV representing the
key drawn, for SUH - Example
- U0, 1
- P is the uniform distribution over 0,1, h
that achieves SUH is given by - For m100, h(0.5)50, h(0.25)25,
- The problem is that we do not know the key
distribution P, so we cannot check (1). In
practice, heuristics are used to derive HF.
17Selecting a hash function
- Regularity condition hash function should be
independent of any patterns in the data. Similar
keys should not hash to similar or close slots - Assume keys are natural numbers, if not will
re-map them to the naturals. - Division Method for selecting HF
- HF should depend on the complete data (all bits)
m should not be power of 2 or 10. Why? - m10000 376218705
593598705 - Best select m to be prime, not close to powers of
2 or 10 - Given n,
- decide what load factor (avg search time) the
application can tolerate - Select m to be prime, close to n/load factor and
not close to power of 2 - Run experiments and test that that h does SUH
18Selecting a hash function
- Multiplication method
- Define
- Advantage value of m is not critical
- Good choice of A,
- Usually, m is power of 2 (easy to implement,
multiplication by 2 is a shift)
19ADT Dictionary
- Hashing the preferred way to implement
Dictionary - Achieves O(1) avg time to search when the load
factor is close to 1 (gt0.8 rule of thumb) - O(1) time to insert, delete, update
- Collision resolution how to deal with keys that
hash to the same slot. - Collision resolution by chaining maintains
unsorted doubly linked lists at the slots
(overhead for maintaining the lists) - Selecting a hush function
- Division method
- Multiplication method
20Applications of hashing
- Compiler use hashing in symbol table
implementation (to keep track of defined
variables). - Graph problems where nodes are identified by
names instead numbers - Game playing software to keep transposition
table (of already encountered lines of play) - On-line spell checkers (without error
correction). The whole dictionary is pre-hashed
and words can be checked in constant time.