Title: Algorithms
1Algorithms Data Structures
- COM328
- Algorithm Analysis
2Learning Outcomes
- At the end of this lecture you should
- Understand the need to measure algorithm
efficiency. - Understand how we can compare efficiency of
algorithms. - Be familiar with the concepts of growth rate, and
upper and lower performance bounds. - Be able to perform some simple algorithm
analysis. - Understand what is meant by asymptotic complexity
and Big Oh. - Be able to determine the Big Oh value for a given
algorithm complexity. - Understand the limitations of Big Oh.
3Reading Material
- Recommended chapters to review include
- Chapter 2 R Lafore
- Chapter 5 MA Weiss (DS PS)
- Chapter 2 MA Weiss (DS AA)
- Chapter 1 DS Malik, PS Nair
- Chapter 3 Goodrich Tommassia
4Introduction
- "Languages come and go,
- but algorithms stand the test of time"
- "An algorithm must be seen to be believed."
- Donald Knuth
5Introduction
- We are interested in design of good algorithms
and data structures - An algorithm is a step-by-step procedure for
performing tasks in a finite amount of time - A data structure is a systematic way of
organizing and accessing data - Analysis of standard algorithms aims to provide
you with - An understanding of algorithm constraints
- An understanding of the trade-off between
execution time memory usage - A guide for your choice of algorithm for a
particular purpose - A repertoire of standard algorithms
6Measuring the Efficiency of Algorithms
- Consider two algorithms which carry out the same
task - What does it mean to compare the algorithms?
- How can we conclude that one is better or more
efficient than the other? - What are the factors that affect the efficiency
of an algorithm?
7Performance Factors
- Our Primary interest is in running time (time
complexity) of algorithms and data structure
operations - We are interested in determining the dependency
of the running time on the size of the input - Our Secondary interest is in space usage (space
complexity) i.e. how much memory is required to
solve problem - We will concentrate on analysing time complexity
and use it to allow the relative costs of two or
more algorithms to be compared e.g. which search
or sort to use etc.
8Time Complexity
- The time an algorithm takes to solve a problem
clearly depends on two sets of parameters - The number of computational steps.
- The time take to complete each of these steps
- Computational Steps can be calculated by
examining the code - Time taken for each step depends on many factors
- performance of computer
- efficiency of compiler etc.
- We will therefore analyse the computational steps
9Problem Analysis File Download
- Suppose when downloading a file over the internet
there is an initial two second delay (setup
connection) and then download proceeds at
1.6K/sec. If a file is N kilobytes, the time to
download is described by formula - T(N) N/1.6 2
- This is a linear function as downloading an 80k
file takes approx 52 seconds while a file twice
as large (160k) takes approx 102 seconds (twice
as long) - Time taken is proportional to the amount of input
(N) - The property where time is directly proportional
to amount of input is the signature of a linear
algorithm - A linear algorithm is one of the most efficient
algorithms
10Problem Analysis Delivering Packages
- Delivering packages to 50 houses each one mile
apart.
- Solution 1 Collect all 50 packages from shop.
Drive to closest house to shop, deliver package
and then drive to next closest shop (1mile away)
and so on. Finally driving back to the shop when
all packages are delivered.
- Solution 2 Collect first package from shop drive
to first house, deliver package and drive back to
shop to collect second package. Repeat this
process for each package.
11Problem Analysis Delivering Packages
- In solution 1, the distance the driver travels to
deliver all 50 packages is 1 1 1 1
50 milesTherefore total distance travelled to
deliver the packages and return to the shop is
50 50 100 miles - In Solution 2, the distance the driver travels to
deliver all 50 packages is 2 ( 1 2 3 4
50) 2550 miles - So suppose there are n packages to deliver.
- Solution 1 1 1 1 n 2n
- Solution 2 2 (1 2 3 n)
2(n(n-1)/2) n2 n
n times
12Values of n
- In solution 1 we say the distance travelled is a
function of n. - In solution 2, for large values of n, the
dominant term is n2 and the term containing n is
negligible.
- So when analysing an algorithm we usually count
the number of operations performed (number of
steps taken) as this is a constant factor no
matter what computer or programming language is
used to implement the algorithm.
13A Simple Analysis
- Sum of an array of integers (a) of size N
int sum(int a, int N) int s0 for (int
i0 ilt N i) s s ai return s
2
4
3
1
5
6
7
8
- Statements
- 1,2,8 Executed once
- 3,4,5,6,7 Executed once per each iteration of
for loop, N i - Thus giving total 5n 3
- The complexity function of the algorithm is
f(n) 5n 3
14How 5n3 Grows
- Estimated running time for different values of n
- n 10 gt 53 steps
- n 100 gt 503 steps
- n 1,000 gt 5003 steps
- n 1,000,000 gt 5,000,003 steps
- As n grows, the number of steps grows in linear
proportion to n for this Sum function. - This makes sense since
- f(n) 5n3 is a linear function in n.
15Asymptotic Complexity
- What term in the previous complexity function
dominates? - What about the 5 in 5n3 ?
- What about the 3 ?
- As n gets large, the 3 becomes insignificant
- The 5 is inaccurate as different operations
require varying amounts of time - What is fundamental is that the time is linear in
n. - Asymptotic Complexity As n gets large, ignore
all lower order terms and concentrate on the
highest order term only i.e. - Drop lower order terms such as 3
- Drop the constant coefficient of the highest
order term.
16Asymptotic Complexity (2)
- The 5n3 time bound is said to "grow
asymptotically" like n. - This gives us an approximation of the complexity
of the algorithm. (i.e. f(n) n ) - Ignores lots of (machine dependent) details,
concentrates on the bigger picture. Why is this
useful? - As inputs get larger, any algorithm of a smaller
order will be more efficient than an algorithm of
a larger order.
Time (Steps)
0.01n2
10n
Input Size (N)
17Asymptotic Complexity (3)
- Note how for small values of n the value in the
third column (10n) overwhelms the quantity in the
second column (0.01n2) - But as n increases the differences between n2 and
n increases so quickly that it eventually more
than compensates for the difference between 10
and 0.01. - Thus when n1000 the time taken by both terms is
roughly equal. But after that the term 0.01n2
overwhelms 10n so much that 10n becomes
insignificant to the overall performance of the
algorithm.
18Another Analysis
int sum 0 for (int i0 i lt n i)
sum for (int j0 j lt 2n j) if
((j2)0) sum else sum--
- f(n)loop1 3n 2
- f(n)loop2 5n 1
- f(n) f(n)loop1 f(n)loop2 8n 3
19A More Complex Analysis
int sum 0 for (int i0 i lt n i) for
(int j0 j lt n j) sum
- Analysis shows
- sum, jltn and j are executed nn times (n2)
- iltn and i and j0 are executed n times
- sum0 and i0 are executed once
- Therefore
- f(n) 3n2 3n 2
20An Asymptotic Notation Big-O
- We use a convention called Big-O Notation to
represent different complexity classes. - Big Oh makes no attempt to provide exact running
times for an algorithm, but estimates how fast
the execution time grows as the size of the input
grows. - Big Oh simplifies analysis of complexity
expressions - Big Oh defines an upper bound on the asymptotic
growth of a complexity class
21Big-O Notation - Simplification
- Only consider the term which grows fastest as N
increases - e.g. if f(N) is N2N, then the relationship is
O(N2) - Drop any constants before the largest term
- e.g. if f(N) is 5N24N, then the relationship is
O(N2) - The base of any logs can be ignored
- e.g. if f(N) is logbN, then the relationship is
O(log N) - What is Big-O notation for earlier code example
- f(N) is 3N2 3N 2, then relationship is
________?
22Best, Worst, Average Cases
- Consider searching for an element in an array of
elements
23 45 21 19 56 33 89 67 40 11
- Best case is when element is first
- f(n) best 1 O(1)
- Worst case is when element is last
- f(n) worst n O(N)
- Average is probably midway (but depends on
distribution) - f(n) average n/2 O(N)
- Usually we analyze an algorithms worst-case
behaviour (because its easier). Worst case is
sometimes uncommon and can be ignored, at other
times it is very common and cannot be ignored. - In some cases, average-case behaviour is more
useful (but its usually much more difficult to
evaluate)
23Classes of Complexity
- Many algorithms can be written in multiple ways.
Usually the simplest, most obvious algorithm will
also be the slowest. More sleek, clever, less
intuitive algorithms developed over the years may
be faster. Some common running times for
algorithms are - Notation Complexity Example
- O(1) constant array access
- O(log N) logarithmic binary search
- O(N) linear sequential search
- O(N log N) N log N quicksort, heap sort, merge
sort - O(N2) quadratic selection sort, insertion sort
- O(N3) cubic multiply 2 nXn matrices
- O(Nn) exponential travelling salesman problem
(TSP) - We will examine some of these algorithms in
future lectures
24Graph of Big O Times
40 35 30 25 20 15 10 5 0
O(N2)
Number of steps
O(N)
Lafore p72
O(log N)
O(1)
0 5 10 15
20 25
Number of items (N)
25Limitations of Big-Oh
- Big-Oh is very effective in establishing an
algorithms complexity class, but it has some
limitations. - Its not appropriate for small amounts of input
here just use the simplest algorithm - Large constants (ignored by Big-Oh) may come into
play when an algorithm is excessively complex - e.g. In a complex algorithm 2N log N is probably
better than 1000N even though its growth rate is
larger - Large constants also come into play because our
analysis disregards constants and cannot
differentiate between memory access (cheap) and
disk access (expensive). - Analysis assumes infinite memory, but in large
data sets this can be a problem
26Summary
- The efficiency of an algorithm is determined in
terms of its growth-rate. - The growth-rate of an algorithm tells us how
quickly the algorithm grows as a function of the
size of the problem - Computational complexity of an algorithm is
represented using the Big-O notation. - Using Big-O notation, we can classify algorithms
to belong to different complexity classes. - We note that its easier to analyse an algorithms
worst case than an average case. - We would like to avoid algorithms that are of
exponential complexity.
27Questions
- Q1. Analyse the following segments of code,
estimate their complexity and provide an answer
in Big-Oh notation.
28Questions
- Q2. Calculate complexity orders for algorithms
to - a) Calculate minimum element in an array. (Given
an array of N items, find the smallest item). - Initialise min to first element of array
- Scan array and update min as appropriate
- b) Compute closest points in a plane. (Given N
points in a plane which is an x-y coordinate
system), find the pair of points that are closest
together. - Calculate distance between each pair of points
- Retain minimum distance
- Q3. An algorithm takes 0.5ms for input size 100.
How long will it take for input size 500 if
running time isa) Linear b) Quadratic c) Cubic