Title: Algorithm Analysis
1Algorithms and Data Structures
- Algorithm Analysis
- Dr. Zumao Weng
2Introduction
- Algorithm Analysis
- Permits estimation of resource consumption of an
algorithm. - Permits the relative costs of two of more
algorithms to be compared. - which search to use
- which sort to use etc
- Permits the analysis of system behaviour to be
estimated prior to software development. - saving cost of development if behaviour likely to
be poor.
3Choosing Algorithms
- How do we choose?
- What does efficient mean?
- How do we measure efficiency?
System with an algorithm
input
output
4Example - is a value present?
- boolean is_present(int my_vals, int target)
- for (int i0 iltmy_vals.lengthi)
- if (my_valsi target)
- return true
-
-
- return false
- An array stores a collection of values.
- A for-loop is used to inspect each value stored.
- A return statement specifies the results of the
method.
5Sequential Search 26 present?
- 11 13 21 26 29 36 40 41 45 51 54 56
65 72 77 83 - i 0
- 11 13 21 26 29 36 40 41 45 51 54 56
65 72 77 83 - i 1
- 11 13 21 26 29 36 40 41 45 51 54 56
65 72 77 83 - i 2
- 11 13 21 26 29 36 40 41 45 51 54 56
65 72 77 83 - i 3
- return true as result
6Analysis
- An array provides a simple mechanism for storing
values. - What if one does not know how many elements might
be stored? - What if the number of elements changes
frequently? - What if the method is_present is used frequently?
- What does it mean to say that the solution is
efficient? - Is there a better solution?
- Question must be asked in the light of other
application code. - What is the determinant of speed of execution for
method is_present?
7is_present Version 2
- Suppose it is feasible, perhaps even desirable,
for the array of integers to be in sorted order - the_valsi lt the_valsi1 0 lt i
ltthe_vals.length - It is possible to use the sorted order to aid
searching the array.
8is_present Version 2, Binary Search
- boolean is_present(int the_vals, int target)
- int left0, rightthe_vals.length-1
- while (left lt right)
- int pivot (left right)/2
- if (the_valspivot target)
- return true // found
- else
- if (the_valspivot gt target)
- left pivot 1 // search right
- else
- right pivot-1 // search left
-
-
- return false
-
9Binary Search 54 present?
- 11 13 21 26 29 36 40 41 45 51 54 56
65 72 77 83 - 11 13 21 26 29 36 40 41 45 51 54 56
65 72 77 83 - 11 13 21 26 29 36 40 41 45 51 54 56
65 72 77 83 - 11 13 21 26 29 36 40 41 45 51 54 56
65 72 77 83
10Binary Search 11 present?
- 11 13 21 26 29 36 40 41 45 51 54 56
65 72 77 83 - 11 13 21 26 29 36 40 41 45 51 54 56
65 72 77 83 - 11 13 21 26 29 36 40 41 45 51 54 56
65 72 77 83 - 11 13 21 26 29 36 40 41 45 51 54 56
65 72 77 83
11Binary Search Analysis
- Operates by reducing the search space in half at
each comparison - if the array is large this is significant but if
the array is small then it is less so - Binary search requires that the array is sorted
- sorting is a time-consuming activity
- The measure of complexity is how much the search
space can be reduced to achieve a result - by implication the number of comparisons
- If search finds elements at the start of the
array then binary search will not be better than
sequential search - behaviour of application and data type is
important in understanding performance.
12Algorithm Cost Estimation
- Will the system meet memory constraints?
- e.g. mobile phone software, telephone switches
- Space complexity
- Will the system meet target response times?
- e.g. online banking system interacting with users
- e.g. a robot which must react to real-time
events. - Time Complexity
- Memory usage and time can be traded.
- if one holds a value for future use it requires
memory. - if one re-computes something then additional
processor time is required.
13Formal Algorithm Analysis?
- Is appropriate for applications with real
constraints. - can be time consuming and technically challenging
- makes assumptions about an algorithm and about
its everyday operation. - assumptions can be wrong!
- Is an important part of your CS education!
- Analysis of standard algorithms
- understand algorithm constraints.
- understand the trade-off between execution time
memory usage. - guide your choice of algorithm for a particular
purpose. - building your repertoire of standard algorithms.
14Goal of Algorithm Analysis
- introduce the concept of growth rate in
understanding performance. - introduce the concepts of upper and lower bound
on performance. - introduce the cost of an algorithm and the cost
of solving a problem. - many algorithmic solutions may exist for the same
problem
15Comparing Algorithms An Example
- A system has an array of integers and this array
must be placed in sorted order. - there are many sort algorithms
- which sort algorithm should be used?
- how can sort algorithms be compared
16Approach 1 Empirical
- Write programs for the candidate algorithms
- Run the programs using suitable data and time the
execution of the programs. - The best algorithm is the one that sorts the
array fastest - Problems
- Writing programs is hard work.
- Only one implementation is required!
- One program might be implemented more efficiently
than the other(s). - Test cases might distort performance.
- Not a realistic test of the algorithms.
- All algorithms may not meet constraints.
- must investigate more algorithms!.
17Approach 2 Asymptotic Analysis
- Measure efficiency of a program as input size
becomes very large. - a program (as an algorithm solution) is taking an
input set and delivering an output result. - can give an upper bound, a lower bound and an
average case for an algorithm. - Problems
- does not say anything about small input sets.
- Sorting small data sets will not take much time
- cannot say that one algorithm is slightly faster
than another. - naïve analysis often ignores disc access and RAM
access
18Performance Factors
- Time Complexity or execution time is usually the
most important factor. - Influenced by
- The speed of the CPU
- Bus speed between memory and CPU
- Disc speed
- cf CD ROM versus Hard Disc versus Floppy
- Competition for the CPU
- Quality of code produced by compiler
- The coding efficiency
- A more complete picture is given by including the
Space Complexity - RAM required, disc space used
19Execution time
- Previous factors affect execution time but tell
us little about relative merits of algorithms. - most factors will be the same for all algorithms
and thus cancel out in a comparison. - Absolute running time is a noisy measure.
- A proxy method for absolute running time and
absolute space usage is required to permit
algorithm comparison. - we do not want to execute algorithm
implementations. - we wish to compare in the absence of a machine
with real CPU speed, disc speed etc. - yet have a measure that will accurately reflect
observed running times.
20A First Analysis Preliminaries
- Asymptotic time complexity counts the number of
basic operations. - e.g. the number of swaps in a sort
- e.g. the number of comparisons in a search.
- Estimates the total number of basic operations
required for a certain input size. - e.g. the number of items to be sorted or searched
- Definition of a basic operation is chosen for
the algorithm being analysed. - the time of execution for a basic operation must
be independent of data it involves - swapping two array elements does not depend on
which elements are involved. - a comparison will be the same cost for data
values.
21A First Analysis Largest Value Sequential Search
- largest value in an array of n positive integers
int largest (int vals) int
current_largest 0 for (int i 0 i lt
vals.length i) if (valsi gt
current_largest)
current_largest valsi
return current_largest
22Analysis
- The input size is the size of the array to be
searched. - independent of the algorithm
- lets say there are N elements
- The basic operation is comparing two array
elements. - does not depend on values of the elements
23Analysis
- int largest (int vals)
-
- int current_largest 0
- for (int i 0 i lt vals.length i)
- if (valsi gt current_largest)
- current_largest valsi
-
-
- return current_largest
- If c denotes the time for a basic operation then
the algorithm time complexity is c times N - There are N comparisons to find the largest value
in the array. - T(n) c N
24Analysis
- T(n) c N
- This is the growth rate for the algorithm
- a linear growth rate
- We need not know what the actual cost c is or how
many elements N there are in the array. - The measure can be used to compare this algorithm
with other algorithms for the task. - measure does not depend on CPU speed.
- measure does not depend on RAM/BUS speed
25Observations
- int largest (int vals)
-
- int current_largest 0
- for (int i 0 i lt vals.length i)
- if (valsi gt current_largest)
- current_largest valsi
-
-
- return current_largest
- The basic operation cost includes
- loop cost
- assignment cost
- The measure is thus an approximation.
26Assignment
- Assume that the copy takes time c1
- T(n) c1
- This statement has constant running time
27Loops
- int sum 0
- for (int i0 i lt n i)
-
- sum
-
- The basic operation is increment.
- Loop executes n times
- T(n) n c1
- a linear growth rate
28Nested Loops
- int sum 0
- for (int i0 i lt n i)
- for (int j0 j lt n j)
- sum
-
- The basic operation is increment.
- Inner loop n times, outer n times
- T(n) n2 c1
- growth rate of n2
29Statement Sequences
- int sum 0
- for (int i0 i lt n i)
- sum
-
- for (int j0 j lt 2n j)
- sum
- The basic operation is increment.
- T(N)loop1 N c
- T(N)loop2 2N c
- T(N) T(N)loop1 T(N)loop2 3N c
30Selection
- if ( .... )
- ----- then part
- else
- ----- else part
- The cost is the higher cost..
31Growth Rates
32Growth Rates
33Complexity League
34Best, Worst, Average Cases
- int largest (int vals)
-
- int current_largest 0
- for (int i 0 i lt vals.length i)
- if (valsi gt current_largest)
- current_largest valsi
-
-
- return current_largest
35Best, Worst, Average Cases
- boolean is_present(int my_vals, int target)
-
- for (int i0 iltmy_vals.lengthi)
- if (my_valsi target)
- return true
-
-
- return false
- Best case is when element is first
- Worst case is when element is last
- Average is probably midway
- depends on distribution of values.
36Best, Worst, Average Cases
- boolean is_present(int my_vals, int target)
-
- for (int i0 iltmy_vals.lengthi)
- if (my_valsi target)
- return true
-
-
- return false
- T(N) best c O(1)
- T(N)worst n c O(N)
- T(N)average n/2 c O(N)
- nature of data usage is important here!
37Big Oh Notation
- Simplify analysis of complexity expressions
- Big Oh defines an upper bound on the asymptotic
growth of a complexity measure. - For T(N) is O(F(N)) if there are positive
constants c and N0 such that - T(N) lt c F(N) when NgtN0
- T(N) O(F(N)) is read as T(N) is of order F(N)
- i.e. for the complexity measure T(N) the
function F(N) is at most a constant factor larger
than its actual complexity function. - Big Oh is similar to less than or equal
- Big Omega (gt), Big Theta (), little o (lt)
38Big Oh
- E.g. T(N) O(N2)
- for some value N0
- The process is one of identifying the largest
term in an expressions and making it a unit
co-efficient expression.
39Big O Simplifications
- T(N) O(2N) ? O(N)
- we can drop the cost of the basic operation
- T(N) O(N c) ? O(N) if c constant
- T(N) O(N2) but convention is smallest function
- T(N) O(N2 N) ? O(N2)
- T(N) O(N3 N2 N) ? O(N3)
- T(N) O(2n N4) ? O(2n)
- The simplifications apply to best, worst and
average cost expressions
40Useful Identities
- ?i1 i n i n(n1)/2
- number of times loops performed.
- ?i1 i n 2i an1 - 1 / (a-1)
- binary search approximations
- ?i1 i n i2 ? n3 / 3