Title: Implementation of Unordered Collections
1Implementation of Unordered Collections
2Implementations of Unordered Collections
- Fast searching is critical
- Sorted List
- Logarithmic searches
- Linear insertions and removals
3Implementations of Unordered Collections
- Fast searching is critical
- Binary search tree
- Logarithmic searches
- Logarithmic insertions and removals
- But guaranteed only if tree balancing is
maintained
4Implementations of Unordered Collections
- Fast searching is critical
- Hashing
- Constant-time searches, insertions, and removals,
for the most part
5Hashing
- Each element has a unique hash value
- This value is computed in constant time by a hash
function - This computation is performed on each insertion,
access, and removal
6How Are the Elements Stored?
- The hash value is used to locate the elements
index in an array, thus preserving constant-time
access - How to compute this
hashValue capacity of array
Position will be gt 0 and lt capacity
7A Sample Access Method
boolean contains(Object obj) index
Math.abs(obj.hashCode()) table.length
return tableindex ! null
- table is an array of objects
- table.length is the arrays current physical
size - hashCode() is a method that returns an objects
hash value - Other access methods have a similar structure
8A Sample Mutator Method
boolean add(Object obj) index
Math.abs(obj.hashCode()) table.length
tableindex obj return true
9Adding Items
mySet.add("A")
index 10
10Adding Items
mySet.add("B")
index 5
11Adding Items
mySet.add("C")
index 0
12Adding Items
mySet.add("D")
index 14
13Adding Items
Add 12 more items
14Adding Items
Array is full Resize the array and rehash all
elements
15Performance
- O(1) lookups, insertions, removals - wow!
- Cost of resizing the array is amortized over many
insertions and removals - Works as long as items dont have the same hash
values
16Problem Collisions
- As more elements fill the array, the likelihood
of their having the same hash value increases - What happens when two elements have the same hash
value? - A collision, that is, they compete for the same
position in the array
17Load Factor
- An arrays load factor expresses the ratio of the
number of elements to its capacity - Example elements(10) / length(30) .3333
- Arrays are more efficient than linked structures
when load factor lt .5 - Try to keep load factor low to minimize collisions
18Collision Processing Strategies
- Linear collision processing - search for the next
available empty slot in the array, wrapping
around if the end is reached - Can lead to clustering, where several elements
that have collided now occupy consecutive
positions - Several small clusters may coalesce into a large
cluster and thus degrade performance
19Collision Processing Strategies
- Rehashing - run one or more additional hash
functions until a collision does not occur - Works well when the load factor is small
- Multiple hash functions may contribute a large
constant of proportionality to the running time
20Collision Processing Strategies
- Quadratic collision processing - Move a
considerable distance from the initial collision - Does not require other rehashing functions
- When k is the collision position, we enter a loop
that repeatedly attempts to locate an empty
position -
k 12 // The first attempt to locate a
position k 22 // The first attempt to locate
a position k r2 // The rth attempt to locate
a position
21Collision Processing Strategies
- Chaining
- Each hash value specifies an index or bucket in
the array - This bucket is at the head of a linked list or
chain of items with the same hash value
22Some Buckets and Chains
index
0
1
2
3
4
23HashSetPT Data
// Temporary variables private Entry foundEntry
// entry just located
// undefined if not found private Entry
priorEntry // entry prior to one just located
// undefined if not
found private int index // index of
chain in which entry located
// undefined if not found
// Instance variables private
int capacity // size of table private
Entry table // the table of collision
lists private int size // number of
entries in the map
Temporary global variables support pointer
manipulations during insertions and removals
24HashSetPT Initialization
public HashSetPT() capacity
DEFAULT_CAPACITY clear() public void
clear() size 0 table new
Entrycapacity
25HashSetPT Searching
public boolean contains (Object item) index
Math.abs(item.hashCode()) capacity
priorEntry null foundEntry tableindex
while (foundEntry ! null) if
(foundEntry.item.equals (item)) return
true else priorEntry
foundEntry foundEntry
foundEntry.next return false
26HashSetPT Insertion
public boolean add(Object item) if (!contains
(item)) Entry newEntry new Entry (item,
tableindex) tableindex newEntry
size return true else
return false
Link to head of chain
27HashSetPT Removal
public boolean remove(Object item) if
(!contains (item)) return false else if
(priorEntry null) tableindex
foundEntry.next else priorEntry.next
foundEntry.next size-- return true
28Performance of Chaining
- If chains are evenly distributed across the
array, close to O(1) - If one or two chains get very long, processing
tends toward linear - Can use a large array but wastes memory
- On the average and for the most part, close to
O(1)