Title: Ceng-212 Data Structures-1
1Searching
Chapter 2
2Outline
- Linear List Searches
- Sequential Search
- The sentinel search,
- The probability search,
- The ordered search.
- Binary Search
- Hashed List Searches
- Collision Resolution
3Linear List Searches
- We study searches that work with arrays.
Figure 2-1
4Linear List Searches
- There are two basic searches for arrays
- The sequential search.
- It can be used to locate an item in any array.
- The binary search.
- It requires an ordered list.
5Linear List SearchesSequential Search
- The list is not ordered!
- We will use this technique only for small arrays.
- We start searching at the beginning of the list
and continue until we find the target entity. - Eighter we find it,
- or we reach the end of the list!
6Locating data in unordered list.
Figure 2-2
7Linear List SearchesSequential Search Algorithm
- RETURN The algorithm must be tell two things to
calling algorithm - Did it find the data ?
- If it did, what is the index (address)?
8Linear List SearchesSequential Search Algorithm
- The searching algorithm requires five parameters
- The list.
- An index to the last element in the list.
- The target.
- The address where the found elements index
location is to be stored. - The address where the found or not found boolean
is to be stored.
9Sequential Search Algorithm
- algorithm SeqSearch (val list ltarraygt, val last
ltindexgt, - val target ltkeyTypegt, ref locn ltindexgt)
- Locate the target in an unordered list of size
elements. - PRE list must contain at least one element.
- last is index to last element in the list.
- target contains the data to be located.
- locn is address of index in calling
algorithm. - POST if found matching index stored in locn
found TRUE - if not found last stored in locn found
FALSE - RETURN found ltbooleangt
10Sequential Search Algorithm
- looker 1
- loop (looker lt last AND target not equal
list(looker)) - looker looker 1
- locn looker
- if (target equal list(looker))
- found true
- else
- found false
- return found
- end SeqSearch
Big-O(n)
11Variations On Sequential Search
- There are three variations of sequential search
algorithm - The sentinel search,
- The probability search,
- The ordered search.
12Sequential Search AlgorithmThe Sentinel Search
- If the target will be found in the list, we can
eliminate the test for the end of list. - algorithm SentinelSearch (val list ltarraygt, val
last ltindexgt, - val target ltkeyTypegt, ref locn ltindexgt)
- Locate the target in an unordered list of size
elements. - PRE list must contain element at the end for the
sentinel. - last is index to last element in the list.
- target contains the data to be located.
- locn is address of index in calling
algorithm. - POST if found matching index stored in locn
found TRUE - if not found last stored in locn found
FALSE - RETURN found ltbooleangt
13Sequential Search AlgorithmThe Sentinel Search
- listlast1 target
- looker 1
- loop (target not equal list(looker))
- looker looker 1
- if (looker lt last)
- found true
- locn looker
- else
- found false
- locn last
- return found
- end SentinelSearch
Big-O(n)
14Sequential Search AlgorithmThe Probability Search
- algorithm ProbabilitySearch (val list ltarraygt,
val last ltindexgt, - val target ltkeyTypegt, ref locn
ltindexgt) - Locate the target in a list ordered by the
probability of each element being the target
most probable first, least probable last. - PRE list must contain at least one element.
- last is index to last element in the list.
- target contains the data to be located.
- locn is address of index in calling
algorithm. - POST if found matching index stored in locn
found TRUE and element moved up in priority. - if not found last stored in locn found
FALSE - RETURN found ltbooleangt
15Sequential Search AlgorithmThe Probability Search
- looker 1
- loop (looker lt last AND target not equal
listlooker) - looker looker 1
- if (target listlooker)
- found true
- if (looker gt 1)
- temp listlooker-1
- listlooker-1 listlooker
- listlooker temp
- looker looker - 1
- else
- found false
- locn looker
- return found
- end ProbabilitySearch
Big-O(n)
16Sequential Search AlgorithmThe Ordered List
Search
- If the list is small it can be more efficient to
use a sequential search. - We can stop search loop, when the target becomes
less than or equal to the testing element of the
list. - algorithm OrderedListSearch (val list ltarraygt,
val last ltindexgt, - val target ltkeyTypegt, ref locn
ltindexgt) - Locate the target in a list ordered on target.
- PRE list must contain at least one element.
- last is index to last element in the list.
- target contains the data to be located.
- locn is address of index in calling
algorithm. - POST if found matching index stored in locn
found TRUE - if not found last stored in locn found
FALSE - RETURN found ltbooleangt
17Sequential Search AlgorithmThe Ordered List
Search
- if (target lt listlast)
- looker 1
- loop (target gt listlooker)
- looker looker 1
- else
- looker last
- if (target equal listlooker
- found true
- else
- found false
- locn looker
- return found
- end OrderedListSearch
Big-O(n)
18Sequential Search
- The sequential search algorithm is very slow for
the big lists. - Big-O(n)
- If the list is ordered, we can use a more
efficient algorithm called the binary search.
19Binary Search
Test the data in the element at the middle of the
array.
If it is in the second half!
If it is in the first half!
Test the data in the element at the middle of
the array.
Test the data in the element at the middle of
the array.
If it is in the second half!
If it is in the second half!
If it is in the first half!
If it is in the first half!
. . .
. . .
. . .
. . .
20mid(firstlast)/2
target gt mid first mid 1
target lt mid last mid -1
Figure 2-4
21first becomes larger than last!
Figure 2-5
22Binary Search Algorithm
- algorithm BinarySearch(val list ltarraygt, val last
ltindexgt, - val target ltkeyTypegt, ref locn
ltindexgt) - Search an ordered list using binary search.
- PRE list is orderedit must contain at least one
element. - last is index to the largest element in the
list. - target is the value of element being sought.
- locn is address of index in calling
algorithm. - POST Found locn assigned index to target
element. - found set true.
- Not found locn element below or above
target. - found set false.
- RETURN found ltbooleangt
23Binary Search Algorithm
- first 1
- last end
- loop (first lt last)
- mid (first last)/2
- if (target gt listmid)
- first mid 1 (Look in upper half).
- else if (target lt listmid
- last mid 1 (Look it lower halt).
- else
- first last 1 (Found equal force exit)
- locn mid
- if (target equal listmid)
- found true
- else
- found false
- Return
- end BinarySearch
Big-O(log2n)
24Comparison of binary and sequential searches
Size Binary Sequential(Average) Sequential(Worst case)
16 4 8 16
50 6 25 50
256 8 128 256
1.000 10 500 1.000
10.000 14 5.000 10.000
100.000 17 50.000 100.000
1.000.000 20 500.000 1.000.000
25Hashed List Searches
- In an ideal search, we would know exactly where
the data are and go directly there. - We use a hashing algorithm to transform the key
into the index of array, that contains the data
we need to locate.
26- It is a key-to-address transformation!
Figure 2-6
27- We call set of keys that hash to the same
location in our list synonymns. - A collision is the event that occurs when a
hashing algorithm produces an address for an
insertion key and that address is already
occupied. - Each calculation of an address and test for
success is known as a probe.
Figure 2-7
28Hashing Methods
Figure 2-8
29Direct Hashing Method
- The key is the address without any algorithmic
manipulation. - The data structure must contain an element for
every possible key. - It quarantees that there are no synonyms.
- We can use direct hashing very limited!
30Direct Hashing Method
Direct hashing of employee numbers.
Figure 2-9
31Subtraction Hashing Method
- The keys are consecutive and do not start from
one. - Example
- A company have 100 employees,
- Employee numbers start from 1000 to 1100.
Ali Esin
1
Sema Metin
2
x1001
1
2
x1002
x 1000
100
x1100
99
Filiz Yilmaz
100
32Modulo Division Hashing Method
- The modulo-division method divides the key by the
array size and uses remainder plus one for the
address. - address key mod (listSize) 1
- If a list size selected a prime number, that
produces fewer collisions than other list sizes.
33Modulo Division Hashing Method
121267 / 307 395 and remainder
2 hash(121267) 2 1 3
We have 300 employees, and the first prime
greater that 300 is 307!.
Figure 2-10
34Digit Extraction Method
- Selected digits are extracted from the key and
used as the address. - Example
- 379452 ? 394
- 121267 ? 112
- 378845 ? 388
- 526842 ? 568
35Midsquare Hashing Method
- The key is squared and the address selected from
the middle of the squared number. - The most obvious limitation of this method is the
size of the key. - Example
- 9452 9452 89340304 ? 3403 is the address.
- Or
- 379452 ? 379 379 143641 ? 364
36Folding Hashing Method
Figure 2-11
37Pseudorandom Hashing Method
- The key is used as the seed in a pseudorandom
number generator and resulting random number then
scaled in to a possiple address range using
modulo division. - Use a function such as y (ax b (mod m))1
- x is the key value,
- a is coefficient,
- b is a constant.
- m is the count of the element in the list.
- y is the address.
38Pseudorandom Hashing Method
- y (ax b (mod m)) 1 ? y (17x 7 (mod
307)) 1 - x 121267 is the key value,
- a 17
- b 7
- m 307
- y ((( 17 121267) 7) mod 307) 1
- y ((2061539 7) mod 307) 1
- y 2061546 mod 307 1
- y 41 1
- y 42
39Rotation Hashing Method
Rotation is often used in combination with
folding and psuedorandom hashing.
Figure 2-12
40Collision Resolution Methods
All above methods of handling collision are
independent of the hashing algorithm.
Figure 2-13
41Collision Resolution Concepts Load Factor
- We define a full list, as a list in which all
elements except one contain data. - Rule A hashed list should not be allowed to
become more than 75 full! - the number of filled elements in the list
- Load Factor ------------------------------------
------------------ x 100 - total number of elements in the list
- k
- a --------- x 100 the
number of elements - n
42Collision Resolution Concepts Clustering
- Some hashing algorithms tend to couse data to
group within the list. This is known as
clustering. - Clustering is created by collision.
- If the list contains a high degree of clustering,
then the number of probes to locate an element
grows and the processing efficiency of the list
is reduced.
43Collision Resolution Concepts Clustering
- Clustering types are
- Primary clustering clustering around a home
address in our list. - Secondary clustering the data are widely
distributed across the whole list so that the
list appears to be well distributed, however, the
time to locate a requested element of data can
become large.
44Collision Resolution Methods Open Addressing
- When a collision occurs, the home area addresses
are searched for an open or unoccupied element
where the new data can be placed. - We have four different method
- Linear probe,
- Quadratic probe,
- Double hashing,
- Key offset.
45Open AddressingLinear Probe
- When data cannot be stored in the home address,
we resolve the collision by adding one to the
current address. - Advantage
- Simple implementation!
- Data tend to remain near their home address.
- Disadvantages
- It tends to produce primary clustering.
- The search algorithm may become more complex
especially after data have been deleted!
.
46Open AddressingLinear Probe
15532 / 307 50 and remainder 2 hash(15532) 2
1 3 New address 31 4
47Open AddressingLinear Probe
Figure 2-14
48Open AddressingQuadratic Probe
- Clustering can be eliminated by adding a value
other than one to the current address. - The increment is the collision probe number
squared. - For the first probe 12
- For the second probe 22
- For the third collision probe 32 ...
- Until we eighter find an empty element or we
exhoust the possible elements. - We use the modulo of the quadratic sum for the
new address.
49Open Addressing Quadratic Probe
Increase by two Fore each probe!
Probe Number Collision Location ProbeProbe Increment New Address Increment Factor Next Increment
1 1 111 2 1 1
2 2 224 6 3 4
3 6 339 15 5 9
4 15 4416 31 7 16
5 31 5525 56 9 25
6 56 6636 92 11 36
7 92 7749 41 13 49
50Open Addressing Double HashingPseudorandom
Collision Resolution
In this methot, rather than using an arithmetic
probe functions, the address is rehashed.
y ((ax c) mod listSize) 1 y ((3.2 (-1)
mod 307) 1 y 6
Figure 2-15
51Open Addressing Double Hashing Key Offset
Collision Resolution
- Key offset is another double hashing method and,
produces different collision paths for different
keys. - Key offset calculates the new address as a
function of the old address and the key. -
52Open Addressing Double Hashing Key Offset
Collision Resolution
- offSet key / listSize
- address ((offSet old address) mod listSize)
1 - offSet 166702 / 307 543
- 1. Probe address ((543 2) mod 307) 1
239 - 2. Probe address ((543 239) mod 307) 1
169
Key Home Address Key Offset Probe 1 Probe 2
166702 2 543 239 169
572556 2 1865 26 50
67234 2 219 222 135
53Collision Resolution Open Addressing Resolution
- A major disadvantage to open addressing is that
each collision resolution increases the
probability of future collisions!
54Collision ResolutionLinked List Resolution
Link head pointer.
A link list is an ordered collection of data in
which each element contains the location of the
next element.
Figure 2-16
55Collision ResolutionBucket Hashing Resolution
Figure 2-17
56Hw 2
- Create an array which includes the random integer
100 numbers between 0 and 150. - This should be an unordered list.
- Use Linear sentinel search algorithm and find the
target value in the array. - Use the Probability search algorithm and find the
target value in the array. - Create an ordered list which includes the 100
numbers between 0 and 150. - Use ordered list search algorithm and find the
target value in the array. - Use binary search algorithm and find the target
value in the array.
Load your HW-2 to FTP site until 15 Mar. 06 at
1700.
57Hw 2
- Run the each search algorithm 10 times and report
these performance values for each of them. - Write your comments about the result table.
Sentinel Search Probability Search Ordered Search Binary Search
Number of Completed Searches
Number of Successful Searches
Avarage number of tests per search