Chapter 11 Searching presentation

About This Presentation

Transcript and Presenter's Notes

Title: Chapter 11 Searching

1
Chapter 11Searching

CS 260 Data Structures
Indiana University Purdue University Fort Wayne
Mark Temte

2
Chapter outline

Serial search
Binary search
Search by hashing
Open-address hashing
Hash functions
Double hashing
Chained hashing
Analysis of hashing
All search methods considered are array searches

3
Serial search

This is also known as a . . .
Linear search
Sequential search
Goal
Look for a target value in a first .. (first n
1)
A search method typically might return
( first i ) for success
1 to indicate failure

int i for ( i 0 ( i lt n ) ( a first i
! target ) i ) // loop ended if ( ( i lt n
) ( a first i target ) ) lt
success at ( first i ) gt else lt failure gt
4
Serial search

Analysis
Best case
Success on the first access
O( 1 ) constant performance
Worst case
Failure
O( n ) linear performance
Average case
Assume success equally likely at each position
O( n ) linear performance

total accesses over all
positions n(n1)/2 Ave accesses
(n1)/2 number
of positions n
5
Binary search

Binary search is often written as a recursive
method
The following version is easier to remember and
code correctly than the version in the text

public static int binarySearch( int a, int
first, int last, int target ) int mid (
first last )/2 if ( first gt last )
return 1 if ( target lt a mid )
return binarySearch( a, first, mid-1, target
) else if ( target a mid )
return mid else return
binarySearch( a, mid1, last, target )
6
Binary search

Recall the precondition
The array must be sorted before the binary search
may be used
Analysis
O( log(n) ) logarithmic performance

7
Search by hashing

Hashing is a search technique with average O(1)
performance used to search a key-value table
A key-value table is also known as a . . .
Dictionary
Map
Associative array
A hash function associates every possible key
with a position in the array
The hash function must be easy to compute
To search for a key-value pair, the hash function
is applied to the key and the resulting position
in the array is accessed

8
Search by hashing

Not only the average hashing performance
constant, but it is also efficient to add and
remove key-value pairs
The hash function has the form
The integer returned by the hash function must be
a valid array index

private int hash( ltkey typegt key )
9
Example of a hash function

Let class Pair represent a key-value pair object
The table is the array table defined by
Pair table new Pair 1000
Each key is an employee social security number
This is String of characters of the form
999-99-9999
The hash function maps the social security number
to the array index defined by the last three
digits of the social security number
This integer is a valid array index in the range
0..999

10
Search by hashing

The ideal situation is to store the key-value
pair in
Problem the possibility of a collision
Also called a hash clash
A collision is when
It is not usually possible to obtain a perfect
hash function
How we resolve this problem leads to various
special hashing techniques
Open-address hashing
Double hashing
Chained hashing

table hash( key )
key1 ! key2 but hash key1 hash key2
11
Collision example

Hash your birthday to the range 0..365
Ignore leap year
Question
In a classroom with 23 students, what is the
probability of the students having at least one
collision?

Answer
Greater than 50
So, with an array loading factor of less than 6,
there more than a 50-50 chance of a collision
Collisions are almost guaranteed to happen
They must be handled in an efficient manner

12
Open-address hashing

The open-address hashing technique resolves
collisions using linear probing
For linear probing, establish a sequence of
predetermined alternate locations to use in the
event of a collision
Note that this wraps around the array if
necessary
Linear probing uses the first available open
location
Alternate locations are tried in order
The sequence of alternates is needed in the event
there are collisions at some of the alternate
locations

Let L0 hash( key ) If L0 is occupied, use a
series of alternate locations L1, L2, L3,
Alternate Lp is defined by Lp1 (Lp 1
) table.length
13
Example

Consider a hypothetical hash function at left
Build the table in the order
A, B, C, D E, F, G
Search for D
Success at location 1
Search for E
Success at location 7
Search for H with hash( H ) 3
Failure at location 5
Delete C and search for G
Search ends if failure at location 3 unless we
know to skip over location 3

0 1 2 3 4 5 6 7 8
keys values
table
14
Open-address hashing

To handle deletions . . .
Need to mark each location as one of . . .
hasBeenUsed
has not been used
For this purpose, add a new boolean instance
variable hasBeenUsed to the Pair class
Now the search for G has the information to skip
over deleted location 3 and succeed at location 4

0 1 2 3 4 5 6 7 8
keys values hasBeenUsed
table
15
Open-address hashing

The open-address hashing algorithm for searching
is to use linear probing until . . .
The key is found
Success
Or until
Failure
To reduce the number of collisions, the maximum
number of items to be placed in the table needs
to be known in advance
The capacity of the array must be set to a size
somewhat larger

table Lp .hasBeenUsed false
16
The hashCode( ) method

Every Java class inherits method hashCode( )
This method maps any key object to an int
The resulting int must subsequently be mapped to
the range 0 . . (table.length-1) by a method
hash( ) supplied by the programmer

table hash( key.hashCode( ) )
anObject
your choice
-------- int --------
------- array index --------
17
Not using Java?

If the given language does not have a method such
as hashCode( ), a replacement method must be
implemented
No problem if the key is already an integer
Otherwise, use the data in a non-integer key to
obtain an integer in some other way
Perhaps use the integer ASCII codes of a
character string to build an integer reflecting
the differences in Strings
Any data can be viewed as a bit string in
assembly language if necessary

18
Constructing hash( ) methods

Assume that the key has already been converted to
an int using hashCode( ) or some other method
The hash( ) method used to map the int to a valid
array index should . . .
Be efficient to compute with O(1)
Distribute the keys evenly throughout the array
Use all key information
Break up natural clusters of keys

19
Constructing hash( ) methods

A very good hash method is known as division
This method satisfies the first three criteria
for a good hash function
However, it does not break up natural clusters of
keys
Nearby keys keep their relative positions except
when one key wraps around and the other does not

hash( key ) Math.abs( key )table.length
20
Constructing hash( ) methods

Another hash method is multiplication
Still another is called mid-square

Let M (?5 1 ) / 2 0.6180339887 hash( key )
( int ) ( arrayCapacity lt fractional part of
Mkey gt )
hash( key ) lt extract some middle digits or
bits from ( key )2 gt
21
The Table class

This is a class for a key-value table ADT
Instead of defining a Pair class and having an
array of Pair objects, we will use parallel
arrays for keys, data, and hasBeenUsed
State

private int manyItems private Object
keys private Object data private
boolean hasBeenUsed
22
The Table class

Behavior
Table( capacity )
Inefficient to change the capacity dynamically
size( )
capacity( )
put( key, value )
containsKey( key )
get( key )
remove( key )

23
The ADT invariant of the Table class

The ADT invariant of the Table class
The number of elements in the table is in the
instance variable manyItems.
The preferred location for an element with a
given key is at index
hash( key ). If a collision occurs, then a
circular array search is performed
in the forward direction to find the next open
position. When an open
position is found a index i, then the element
itself is placed in data i
and the elements key is placed in keys i .
An index i that is not currently used has data i
and keys i set to null.
If an index i has been used at some point (now or
in the past), then
hasBeenUsed i is true otherwise it is false.

24
The Table class

Private helper methods
hash( key )
nextIndex( index )
findIndex( key )

private int hash(Object key) return
Math.abs( key.hashCode( ) ) data.length
private int nextIndex( int index ) if (
index 1 data.length ) return 0
else return index 1
25
The Table class
private int findIndex( Object key ) int
count 0 int i hash( key )
while ( ( count lt data.length )
hasBeenUsed i ) if ( key.equals(
keys i ) ) return i
count i nextIndex( i )
return -1

Note the variable count is needed when the key
is not in the table and every position has been
used
The search will terminate after every cell has
been examined

26
The Table class
public Object get( Object key ) int index
findIndex( key ) if ( index -1)
return null else return data
index

If the search for key fails, the method returns
null
Otherwise, it returns the data associated with
the key

27
public Object put( Object key, Object element )
int index findIndex( key )
Object answer if ( index ! -1 )
// The key is
already in the table. answer data
index data index element
return answer else if ( manyItems lt
data.length ) // The key is not yet in
this Table index hash( key )
while ( keys index ! null ) index
nextIndex( index ) keys index
key data index element
hasBeenUsed index true
manyItems return null else

// The table is
full. throw new IllegalStateException(
"Table is full. )
28
The Table class
public Object remove( Object key ) int
index findIndex( key ) Object answer
null if ( index ! -1 ) answer
data index keys index null
data index null
manyItems-- return answer
29
Double hashing

Linear probing used with open-address hashing
makes clustering worse
The double hashing technique is similar to
open-address hashing but reduces clustering
The double hashing technique chooses a second
hashing function hash2( key )
Example

Suppose hash( key ) 711 and hash2( key )
111 Linear probing sequence 711, 712, 713, . .
. Double hashing sequence 711, 822, 933, . . .
30
Double hashing

For double hashing, the sequence of predetermined
alternate locations to use in the event of a
collision is defined as follows
Note that the increment hash2( key ) is usually
different for different keys
For linear probing it was the same (i.e.,1) for
all keys

Let L0 hash( key ) If L0 is occupied, use a
series of alternate locations L1, L2, L3,
Alternate Lp is defined by Lp1 (Lp
hash2( key ) ) table.length
31
Double hashing

There is a problem with double hashing
If hash2( key ) evenly divides the table size,
many locations are never probed
Example
The solution to this dilemma is to choose an
array size that is a prime number

Suppose the array size is 1000 and hash2( key )
100 Suppose L0 327 Then the sequence of
probes examines only the locations 327,
427, 527, 627, 727, 827, 927, 027, 127, and 227
32
Double hashing

Example of choosing the array size to be a prime
Try this at home with your favorite prime number
and any values for hash( key ) and hash2( key )

Suppose the array size is 11 (prime) and hash2(
key ) 4 Suppose L0 6 Then the sequence of
probes examines only the locations 6, 10,
3, 7, 0, 4, 8, 1, 5, 9, 2 This covers the entire
array
33
Double hashing

The following are good choices for hash( key )
and hash2( key )
Both use Javas hashCode( ) and the division
method
Remember, the value of data.length must be prime
Note that the value of hash2( key ) is such that
The value of hash2( key ) cannot be 0 or
data.length

hash( key ) Math.abs( key.hashCode( ) )
data.length hash2( key ) 1 Math.abs(
key.hashCode( ) ) ( data.length 2 )
1 lt hash2( key ) lt data.length -1
34
Chained hashing

Chained hashing uses linked lists
Define a Node class with instance variables for
the key
the value
a Node pointer
Start with an Node array of any size
Each array component is interpreted to be the
head of a linked list of all key-value pairs that
collide at that position

Node table new Node size
35
Chained hashing

0 1 2 3 4 5 6

Three keys collide at position 2

36
Predefined Java class

Java has two predefined classes for hashing
java.util.Hashtable
java.util.HashMap
Both use open-address hashing
See the text for details
Appendix D, pages 764 765

37
Analysis of hashing

We consider the result of an analysis of the
three hash methods in the case of a successful
search
A statistically uniform hash function is assumed
It is also assumed that no removals have taken
place
The analysis gives the average number of probes
needed in a successful search as a function the
the loading factor

keys stored Definition The
hashing loading factor a array
size
38
Analysis of hashing

The following table gives
The average number of probes needed for each hash
technique as a function of the loading factor
Some representative values for various loading
factors

Write a Comment

User Comments (0)

About PowerShow.com

Chapter 11 Searching PowerPoint PPT Presentation