Implementation of Unordered Collections - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Implementation of Unordered Collections

Description:

Fast searching is critical. Hashing ... { index = Math.abs(item.hashCode()) % capacity; priorEntry = null; foundEntry = table[index] ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 29
Provided by: kennetha47
Category:

less

Transcript and Presenter's Notes

Title: Implementation of Unordered Collections


1
Implementation of Unordered Collections
2
Implementations of Unordered Collections
  • Fast searching is critical
  • Sorted List
  • Logarithmic searches
  • Linear insertions and removals

3
Implementations of Unordered Collections
  • Fast searching is critical
  • Binary search tree
  • Logarithmic searches
  • Logarithmic insertions and removals
  • But guaranteed only if tree balancing is
    maintained

4
Implementations of Unordered Collections
  • Fast searching is critical
  • Hashing
  • Constant-time searches, insertions, and removals,
    for the most part

5
Hashing
  • Each element has a unique hash value
  • This value is computed in constant time by a hash
    function
  • This computation is performed on each insertion,
    access, and removal

6
How Are the Elements Stored?
  • The hash value is used to locate the elements
    index in an array, thus preserving constant-time
    access
  • How to compute this

hashValue capacity of array
Position will be gt 0 and lt capacity
7
A Sample Access Method
boolean contains(Object obj) index
Math.abs(obj.hashCode()) table.length
return tableindex ! null
  • table is an array of objects
  • table.length is the arrays current physical
    size
  • hashCode() is a method that returns an objects
    hash value
  • Other access methods have a similar structure

8
A Sample Mutator Method
boolean add(Object obj) index
Math.abs(obj.hashCode()) table.length
tableindex obj return true
9
Adding Items
mySet.add("A")
index 10
10
Adding Items
mySet.add("B")
index 5
11
Adding Items
mySet.add("C")
index 0
12
Adding Items
mySet.add("D")
index 14
13
Adding Items
Add 12 more items
14
Adding Items
Array is full Resize the array and rehash all
elements
15
Performance
  • O(1) lookups, insertions, removals - wow!
  • Cost of resizing the array is amortized over many
    insertions and removals
  • Works as long as items dont have the same hash
    values

16
Problem Collisions
  • As more elements fill the array, the likelihood
    of their having the same hash value increases
  • What happens when two elements have the same hash
    value?
  • A collision, that is, they compete for the same
    position in the array

17
Load Factor
  • An arrays load factor expresses the ratio of the
    number of elements to its capacity
  • Example elements(10) / length(30) .3333
  • Arrays are more efficient than linked structures
    when load factor lt .5
  • Try to keep load factor low to minimize collisions

18
Collision Processing Strategies
  • Linear collision processing - search for the next
    available empty slot in the array, wrapping
    around if the end is reached
  • Can lead to clustering, where several elements
    that have collided now occupy consecutive
    positions
  • Several small clusters may coalesce into a large
    cluster and thus degrade performance

19
Collision Processing Strategies
  • Rehashing - run one or more additional hash
    functions until a collision does not occur
  • Works well when the load factor is small
  • Multiple hash functions may contribute a large
    constant of proportionality to the running time

20
Collision Processing Strategies
  • Quadratic collision processing - Move a
    considerable distance from the initial collision
  • Does not require other rehashing functions
  • When k is the collision position, we enter a loop
    that repeatedly attempts to locate an empty
    position

k 12 // The first attempt to locate a
position k 22 // The first attempt to locate
a position k r2 // The rth attempt to locate
a position
21
Collision Processing Strategies
  • Chaining
  • Each hash value specifies an index or bucket in
    the array
  • This bucket is at the head of a linked list or
    chain of items with the same hash value

22
Some Buckets and Chains
index
0
1
2
3
4
23
HashSetPT Data
// Temporary variables private Entry foundEntry
// entry just located
// undefined if not found private Entry
priorEntry // entry prior to one just located
// undefined if not
found private int index // index of
chain in which entry located
// undefined if not found
// Instance variables private
int capacity // size of table private
Entry table // the table of collision
lists private int size // number of
entries in the map
Temporary global variables support pointer
manipulations during insertions and removals
24
HashSetPT Initialization
public HashSetPT() capacity
DEFAULT_CAPACITY clear() public void
clear() size 0 table new
Entrycapacity
25
HashSetPT Searching
public boolean contains (Object item) index
Math.abs(item.hashCode()) capacity
priorEntry null foundEntry tableindex
while (foundEntry ! null) if
(foundEntry.item.equals (item)) return
true else priorEntry
foundEntry foundEntry
foundEntry.next return false
26
HashSetPT Insertion
public boolean add(Object item) if (!contains
(item)) Entry newEntry new Entry (item,
tableindex) tableindex newEntry
size return true else
return false
Link to head of chain
27
HashSetPT Removal
public boolean remove(Object item) if
(!contains (item)) return false else if
(priorEntry null) tableindex
foundEntry.next else priorEntry.next
foundEntry.next size-- return true
28
Performance of Chaining
  • If chains are evenly distributed across the
    array, close to O(1)
  • If one or two chains get very long, processing
    tends toward linear
  • Can use a large array but wastes memory
  • On the average and for the most part, close to
    O(1)
Write a Comment
User Comments (0)
About PowerShow.com