Chapter 11 Searching - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Chapter 11 Searching

Description:

table. 15. Open-address hashing ... Perhaps use the integer ASCII codes of a character string to build an integer ... abs( key )%table.length. 20. Constructing ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 39
Provided by: markt2
Category:

less

Transcript and Presenter's Notes

Title: Chapter 11 Searching


1
Chapter 11Searching
  • CS 260 Data Structures
  • Indiana University Purdue University Fort Wayne
  • Mark Temte

2
Chapter outline
  • Serial search
  • Binary search
  • Search by hashing
  • Open-address hashing
  • Hash functions
  • Double hashing
  • Chained hashing
  • Analysis of hashing
  • All search methods considered are array searches

3
Serial search
  • This is also known as a . . .
  • Linear search
  • Sequential search
  • Goal
  • Look for a target value in a first .. (first n
    1)
  • A search method typically might return
  • ( first i ) for success
  • 1 to indicate failure

int i for ( i 0 ( i lt n ) ( a first i
! target ) i ) // loop ended if ( ( i lt n
) ( a first i target ) ) lt
success at ( first i ) gt else lt failure gt
4
Serial search
  • Analysis
  • Best case
  • Success on the first access
  • O( 1 ) constant performance
  • Worst case
  • Failure
  • O( n ) linear performance
  • Average case
  • Assume success equally likely at each position
  • O( n ) linear performance

total accesses over all
positions n(n1)/2 Ave accesses
(n1)/2 number
of positions n
5
Binary search
  • Binary search is often written as a recursive
    method
  • The following version is easier to remember and
    code correctly than the version in the text

public static int binarySearch( int a, int
first, int last, int target ) int mid (
first last )/2 if ( first gt last )
return 1 if ( target lt a mid )
return binarySearch( a, first, mid-1, target
) else if ( target a mid )
return mid else return
binarySearch( a, mid1, last, target )
6
Binary search
  • Recall the precondition
  • The array must be sorted before the binary search
    may be used
  • Analysis
  • O( log(n) ) logarithmic performance

7
Search by hashing
  • Hashing is a search technique with average O(1)
    performance used to search a key-value table
  • A key-value table is also known as a . . .
  • Dictionary
  • Map
  • Associative array
  • A hash function associates every possible key
    with a position in the array
  • The hash function must be easy to compute
  • To search for a key-value pair, the hash function
    is applied to the key and the resulting position
    in the array is accessed

8
Search by hashing
  • Not only the average hashing performance
    constant, but it is also efficient to add and
    remove key-value pairs
  • The hash function has the form
  • The integer returned by the hash function must be
    a valid array index

private int hash( ltkey typegt key )
9
Example of a hash function
  • Let class Pair represent a key-value pair object
  • The table is the array table defined by
  • Pair table new Pair 1000
  • Each key is an employee social security number
  • This is String of characters of the form
    999-99-9999
  • The hash function maps the social security number
    to the array index defined by the last three
    digits of the social security number
  • This integer is a valid array index in the range
    0..999

10
Search by hashing
  • The ideal situation is to store the key-value
    pair in
  • Problem the possibility of a collision
  • Also called a hash clash
  • A collision is when
  • It is not usually possible to obtain a perfect
    hash function
  • How we resolve this problem leads to various
    special hashing techniques
  • Open-address hashing
  • Double hashing
  • Chained hashing

table hash( key )
key1 ! key2 but hash key1 hash key2
11
Collision example
  • Hash your birthday to the range 0..365
  • Ignore leap year
  • Question
  • In a classroom with 23 students, what is the
    probability of the students having at least one
    collision?
  • Answer
  • Greater than 50
  • So, with an array loading factor of less than 6,
    there more than a 50-50 chance of a collision
  • Collisions are almost guaranteed to happen
  • They must be handled in an efficient manner

12
Open-address hashing
  • The open-address hashing technique resolves
    collisions using linear probing
  • For linear probing, establish a sequence of
    predetermined alternate locations to use in the
    event of a collision
  • Note that this wraps around the array if
    necessary
  • Linear probing uses the first available open
    location
  • Alternate locations are tried in order
  • The sequence of alternates is needed in the event
    there are collisions at some of the alternate
    locations

Let L0 hash( key ) If L0 is occupied, use a
series of alternate locations L1, L2, L3,
Alternate Lp is defined by Lp1 (Lp 1
) table.length
13
Example
  • Consider a hypothetical hash function at left
  • Build the table in the order
  • A, B, C, D E, F, G
  • Search for D
  • Success at location 1
  • Search for E
  • Success at location 7
  • Search for H with hash( H ) 3
  • Failure at location 5
  • Delete C and search for G
  • Search ends if failure at location 3 unless we
    know to skip over location 3

0 1 2 3 4 5 6 7 8
keys values
table
14
Open-address hashing
  • To handle deletions . . .
  • Need to mark each location as one of . . .
  • hasBeenUsed
  • has not been used
  • For this purpose, add a new boolean instance
    variable hasBeenUsed to the Pair class
  • Now the search for G has the information to skip
    over deleted location 3 and succeed at location 4

0 1 2 3 4 5 6 7 8
keys values hasBeenUsed
table
15
Open-address hashing
  • The open-address hashing algorithm for searching
    is to use linear probing until . . .
  • The key is found
  • Success
  • Or until
  • Failure
  • To reduce the number of collisions, the maximum
    number of items to be placed in the table needs
    to be known in advance
  • The capacity of the array must be set to a size
    somewhat larger

table Lp .hasBeenUsed false
16
The hashCode( ) method
  • Every Java class inherits method hashCode( )
  • This method maps any key object to an int
  • The resulting int must subsequently be mapped to
    the range 0 . . (table.length-1) by a method
    hash( ) supplied by the programmer

table hash( key.hashCode( ) )
anObject
your choice
-------- int --------
------- array index --------
17
Not using Java?
  • If the given language does not have a method such
    as hashCode( ), a replacement method must be
    implemented
  • No problem if the key is already an integer
  • Otherwise, use the data in a non-integer key to
    obtain an integer in some other way
  • Perhaps use the integer ASCII codes of a
    character string to build an integer reflecting
    the differences in Strings
  • Any data can be viewed as a bit string in
    assembly language if necessary

18
Constructing hash( ) methods
  • Assume that the key has already been converted to
    an int using hashCode( ) or some other method
  • The hash( ) method used to map the int to a valid
    array index should . . .
  • Be efficient to compute with O(1)
  • Distribute the keys evenly throughout the array
  • Use all key information
  • Break up natural clusters of keys

19
Constructing hash( ) methods
  • A very good hash method is known as division
  • This method satisfies the first three criteria
    for a good hash function
  • However, it does not break up natural clusters of
    keys
  • Nearby keys keep their relative positions except
    when one key wraps around and the other does not

hash( key ) Math.abs( key )table.length
20
Constructing hash( ) methods
  • Another hash method is multiplication
  • Still another is called mid-square

Let M (?5 1 ) / 2 0.6180339887 hash( key )
( int ) ( arrayCapacity lt fractional part of
Mkey gt )
hash( key ) lt extract some middle digits or
bits from ( key )2 gt
21
The Table class
  • This is a class for a key-value table ADT
  • Instead of defining a Pair class and having an
    array of Pair objects, we will use parallel
    arrays for keys, data, and hasBeenUsed
  • State

private int manyItems private Object
keys private Object data private
boolean hasBeenUsed
22
The Table class
  • Behavior
  • Table( capacity )
  • Inefficient to change the capacity dynamically
  • size( )
  • capacity( )
  • put( key, value )
  • containsKey( key )
  • get( key )
  • remove( key )

23
The ADT invariant of the Table class
  • The ADT invariant of the Table class
  • The number of elements in the table is in the
    instance variable manyItems.
  • The preferred location for an element with a
    given key is at index
  • hash( key ). If a collision occurs, then a
    circular array search is performed
  • in the forward direction to find the next open
    position. When an open
  • position is found a index i, then the element
    itself is placed in data i
  • and the elements key is placed in keys i .
  • An index i that is not currently used has data i
    and keys i set to null.
  • If an index i has been used at some point (now or
    in the past), then
  • hasBeenUsed i is true otherwise it is false.

24
The Table class
  • Private helper methods
  • hash( key )
  • nextIndex( index )
  • findIndex( key )

private int hash(Object key) return
Math.abs( key.hashCode( ) ) data.length
private int nextIndex( int index ) if (
index 1 data.length ) return 0
else return index 1
25
The Table class
private int findIndex( Object key ) int
count 0 int i hash( key )
while ( ( count lt data.length )
hasBeenUsed i ) if ( key.equals(
keys i ) ) return i
count i nextIndex( i )
return -1
  • Note the variable count is needed when the key
    is not in the table and every position has been
    used
  • The search will terminate after every cell has
    been examined

26
The Table class
public Object get( Object key ) int index
findIndex( key ) if ( index -1)
return null else return data
index
  • If the search for key fails, the method returns
    null
  • Otherwise, it returns the data associated with
    the key

27
public Object put( Object key, Object element )
int index findIndex( key )
Object answer if ( index ! -1 )
// The key is
already in the table. answer data
index data index element
return answer else if ( manyItems lt
data.length ) // The key is not yet in
this Table index hash( key )
while ( keys index ! null ) index
nextIndex( index ) keys index
key data index element
hasBeenUsed index true
manyItems return null else

// The table is
full. throw new IllegalStateException(
"Table is full. )
28
The Table class
public Object remove( Object key ) int
index findIndex( key ) Object answer
null if ( index ! -1 ) answer
data index keys index null
data index null
manyItems-- return answer
29
Double hashing
  • Linear probing used with open-address hashing
    makes clustering worse
  • The double hashing technique is similar to
    open-address hashing but reduces clustering
  • The double hashing technique chooses a second
    hashing function hash2( key )
  • Example

Suppose hash( key ) 711 and hash2( key )
111 Linear probing sequence 711, 712, 713, . .
. Double hashing sequence 711, 822, 933, . . .
30
Double hashing
  • For double hashing, the sequence of predetermined
    alternate locations to use in the event of a
    collision is defined as follows
  • Note that the increment hash2( key ) is usually
    different for different keys
  • For linear probing it was the same (i.e.,1) for
    all keys

Let L0 hash( key ) If L0 is occupied, use a
series of alternate locations L1, L2, L3,
Alternate Lp is defined by Lp1 (Lp
hash2( key ) ) table.length
31
Double hashing
  • There is a problem with double hashing
  • If hash2( key ) evenly divides the table size,
    many locations are never probed
  • Example
  • The solution to this dilemma is to choose an
    array size that is a prime number

Suppose the array size is 1000 and hash2( key )
100 Suppose L0 327 Then the sequence of
probes examines only the locations 327,
427, 527, 627, 727, 827, 927, 027, 127, and 227
32
Double hashing
  • Example of choosing the array size to be a prime
  • Try this at home with your favorite prime number
    and any values for hash( key ) and hash2( key )

Suppose the array size is 11 (prime) and hash2(
key ) 4 Suppose L0 6 Then the sequence of
probes examines only the locations 6, 10,
3, 7, 0, 4, 8, 1, 5, 9, 2 This covers the entire
array
33
Double hashing
  • The following are good choices for hash( key )
    and hash2( key )
  • Both use Javas hashCode( ) and the division
    method
  • Remember, the value of data.length must be prime
  • Note that the value of hash2( key ) is such that
  • The value of hash2( key ) cannot be 0 or
    data.length

hash( key ) Math.abs( key.hashCode( ) )
data.length hash2( key ) 1 Math.abs(
key.hashCode( ) ) ( data.length 2 )
1 lt hash2( key ) lt data.length -1
34
Chained hashing
  • Chained hashing uses linked lists
  • Define a Node class with instance variables for
  • the key
  • the value
  • a Node pointer
  • Start with an Node array of any size
  • Each array component is interpreted to be the
    head of a linked list of all key-value pairs that
    collide at that position

Node table new Node size
35
Chained hashing

0 1 2 3 4 5 6





  • Three keys collide at position 2

36
Predefined Java class
  • Java has two predefined classes for hashing
  • java.util.Hashtable
  • java.util.HashMap
  • Both use open-address hashing
  • See the text for details
  • Appendix D, pages 764 765

37
Analysis of hashing
  • We consider the result of an analysis of the
    three hash methods in the case of a successful
    search
  • A statistically uniform hash function is assumed
  • It is also assumed that no removals have taken
    place
  • The analysis gives the average number of probes
    needed in a successful search as a function the
    the loading factor

keys stored Definition The
hashing loading factor a array
size
38
Analysis of hashing
  • The following table gives
  • The average number of probes needed for each hash
    technique as a function of the loading factor
  • Some representative values for various loading
    factors
Write a Comment
User Comments (0)
About PowerShow.com