Title: CSC 332 Algorithms and Data Structures
1CSC 332 Algorithms and Data Structures
Dr. Paige H. Meeker Computer Science Presbyterian
College, Clinton, SC
2Why do we care?
- Every 18-24 months, manufacturers are
introducing faster machines with larger memories.
Why do we need to write efficient code?
3Well, just suppose
- Imagine we are defining a Java class Huge to
represent long integers. - We want a method for our class that
- adds two large integers two instances of our
class - multiplies two large integers.
- Suppose we have successfully implemented the
add() method and are moving on to multiply().
4Well, just suppose
- Suppose we have successfully implemented the
add() method and are moving on to multiply().
Object oriented coding is all about using what
you already have, right? So, since multiplication
is equivalent to repeated addition, to compute
the product 7562 423 we could initialize a
variable to 0 and then add 7562 to it 463 times
why not, add() works, right?
5Efficiency Example
- public class BigOEx
- public static void main (String args)
- long firstOp 7562
- long secondOp 463
- long product 0
- for (long isecondOp igt0 i--)
- product firstOp
- System.out.println("Product of "firstOp"
and "secondOp" "product) -
6Efficiency Example
- If we run the previous code, we should get a
result in a reasonable amount of time. So, lets
run it but replace the 463 with 100000000 (8
0s). Will we still get a result in a reasonable
time? How about with 1000000000. (9 0s) Is
something wrong?
7Efficiency Example
- Why a code delay?
- Can we do better?
8Efficiency Example
- How might we rethink our code to produce a more
efficient result?
9Efficiency Example
- Consider the product of the original numbers (463
7562). The secondOp has 3 digits a 100s
digit, a 10s digit, and a 1s digit. (463 400
60 3) So - 7562 463 7562 (400 60 3)
- 7562 400 7562 60 7562 3
- 756200 4 75620 6 75623
- 75620075620075620075620075620
- 7562075620756207562075620756275627562
10Efficiency Example
- public class BetterBigOEx
- public static void main (String args)
- long firstOrig,secondOrig
- long firstOp firstOrig 7562
- long secondOp secondOrig 1000000000
- int secOpLength10
- long product 0
- for (int digitPosition0
- digitPositionltsecOpLengthdigitPosition
) - int digit (int)(secondOp -
(secondOp/10)10) - for (int counter digit counter gt 0
counter--) - product productfirstOp
- secondOp secondOp/10 // discard last
digit - firstOp 10firstOp //tack a 0 to the
right -
- System.out.println("Product of "firstOrig"
and "secondOrig" "product) -
11Efficiency Example
- Does efficiency matter?
- How do we measure it?
- Create 2 programs and measure the difference (not
always possible) - Measure algorithm before implementation
12Efficiency Calculation
- Lets say you need to get downtown. You can
walk, drive, ask a friend to take you, or take a
bus. Whats the best way?
13Efficiency Measurement
- Algorithms have measurable time and space
requirements called complexity. - We are not considering how difficult it is to
code, but rather the time it takes to execute and
the memory it will need.
14Analysis of Algorithms
- Usually measure time complexity
- Compute approximation, not actual time
- Typically estimate the WORST (maximum) time the
algorithm could take. Why? - Measurements also exist for best and average
cases, but generally you look for the worse case
analysis.
15Analysis of Algorithms
- How do we compute worst case time?
- PROBLEM Compute the sum 12n for some
positive integer n. - Think about possible ways to solve this problem
(then look at the next 3 slides for suggestions!)
16Analysis of Algorithms
- AlgorithmA computes 012n from left to
right - sum 0
- for (i1 to n)
- sum i
17Analysis of Algorithms
- AlgorithmB computes 0 1 (11)
(111)(111) - sum 0
- for (i1 to n)
- for (j 1 to i)
- sum
18Analysis of Algorithms
- AlgorithmC uses an algebraic identity to
compute the sum -
- sum n (n1)/2
19Analysis of Algorithms
- How do we determine which algorithm (A, B, or C)
is fastest? - Consider the size of the problem and the effort
involved. (Measure problem size using n) - Find an appropriate growth-rate function.
- Count the number of operations required by the
algorithm
20Analysis of Algorithms
- AlgorithmA n1 assignments, n additions, 0
multiplications, and 0 divisions for TOTAL OPS
2n1 - AlgorithmB 1n(n1)/2 assignments, n(n1)/2
additions, 0 multiplications, and 0 divisions for
TOTAL OPS n2n1 - AlgorithmC 1 assignments, 1 additions, 1
multiplications, and 1 divisions for TOTAL OPS 4
21Analysis of Algorithms
- So,
- AlgorithmAs growth rate function is 2n1 time
units - AlgorithmBs is n2n1 time units
- AlgorithmCs is constant.
- Speed wise, Cs fastest, followed by A and then
by B.
22Analysis of Algorithms
- The running time of an algorithm is a function of
the size of the input. More data means that the
program takes more time.
23Analysis of Algorithms
- How do we express this, then, in proper notation?
- First rule focus on the large instances of the
problem that is, consider only the dominant term
in each growth rate function. (Here, n2) - The difference between n2n1 and n2 is
relatively small for large n and so we can use
the term with the largest exponent to describe
the growth rate.
24Big Oh Notation
- Computer Scientists use different notations to
represent best, average, and worst case analysis.
Big-Oh represents worst case. So - AlgorithmA is O(n)
- AlgorithmB is O(n2)
- AlgorithmC is O(1) (aka constant time)
25Real Life Examples
- You are seated at a wedding reception with a
table of n people. In preparation for a toast,
the waiter pours champagne into n glasses. What
is the time complexity? - Someone makes a toast. What is the time
complexity? - Everyone clinks glasses with everyone else?
26Designing Efficient Algorithms
- Generally, we are wanting to process a large
amount of data - We want to design an algorithm (step by step
instructions) that will use the resources
(memory, space, speed) of the computer well.
27- As we previously mentioned, the amount of time
taken is our usual tool to analyze and this is
determined by the amount of input so the
running time of the algorithm is given as a
function of its input size.
28(No Transcript)
29(No Transcript)
30Questions to Ask
- Is it always important to be on the most
efficient curve? - How much better is one curve than another?
- How to you decide which curve a particular
algorithm lies on? - How do you design algorithms that avoid being on
less efficient curves?
31Functions in Order of Increasing Growth Rate
- Function Name
- C Constant
- logN Logarithmic
- log2N Log-squared
- N Linear
- NlogN NlogN
- N2 Quadratic
- N3 Cubic
- 2N Exponential
32- Growth rate of a function is most important when
N is sufficiently large. - When input sizes are small, it is best to use the
simplest algorithm - Quadratic algorithms are impractical if input
size gt a few thousand - Cubic algorithms are impractical if input size gt
a few hundred.
333 Problems to Analyze
- Minimum Element in an Array
- Closest Points in the Plane
- Colinear Points in the Plane
34Minimum Element in an Array
- Given an array of N items, find the smallest
item. - Obvious Solution
- Maintain a variable min that stores the minimum
element - Initialize min to the first element
- Make a sequential scan through the array and
update min as appropriate - Running Time?
35Closest Points in the Plane
- Given N points in a plane, find the pair of
points that are closest together - Obvious Solution
- Calculate the distance between each pair of
points - Retain the minimum distance
- Running Time?
36Colinear Points in the Plane
- Given N points in a plane, determine if any three
form a straight line. - Obvious Solution
- Enumerate all groups of 3 points
- Running Time?
37Maximum Contiguous Subsequence Sum Problem
- Given (possibly negative) integers A1, A2An,
find (and identify the sequence corresponding to)
the maximum value of the sum from elements i
through j of the list. The maximum contiguous
subsequence sum is zero if all integers are
negative.
38Maximum Contiguous Subsequence Sum Problem
- Example
- Given -2, 11, -4, 13, -5, 2, the answer is 20,
the contiguous subsequence from items 2 through 4.
39Maximum Contiguous Subsequence Sum Problem
- Designing a Solution
- Consider emptiness
- Obvious Solution (aka Brute Force)
- Can we improve it? (must be a little clever)
- Can we further improve it? (must be really clever
and/or experienced!)
40Maximum Contiguous Subsequence Sum Problem
- Obvious Solution O(N3)
- A direct and exhaustive search (Brute Force
Approach) - Pro Extreme simplicity easy to program
- Con Least efficient method
41- /
- Cubic maximum contiguous subsequence sum
algorithm. - seqStart and seqEnd represent the actual
best sequence. - /
- public static int maxSubSum1( int a )
-
- int maxSum 0
- for( int i 0 i lt a.length i )
- for( int j i j lt a.length j )
-
- int thisSum 0
- for( int k i k lt j k )
- thisSum a k
- if( thisSum gt maxSum )
-
- maxSum thisSum
42Maximum Contiguous Subsequence Sum Problem
- To calculate the analysis of the algorithm, you
basically count the number of times each
statement is executed and then pick the dominant
one. In our case, the statement inside the 3rd
for loop is executed a little less than N3 times,
making it the dominant term.
43Maximum Contiguous Subsequence Sum Problem
- SHORTCUT
- We see a loop of potentially size N inside a loop
of potentially size N inside a loop of
potentially size N NNN potential iterations! - Generally, this cost calculation is off by a
constant factor (that gets removed by Big-Oh
notation anyway), so we can get away with it.
44Maximum Contiguous Subsequence Sum Problem
- Since our cubic algorithm seems to be the result
of statements inside of loops, can we lower the
running time by removing a loop? Are they all
necessary? - In some cases, we cant remove the loop. In this
one
45Maximum Contiguous Subsequence Sum Problem
- Lets observe that we calculate the contiguous
subsequence as we go we dont need to reinvent
the wheel each time (as our algorithm does) we
only need to add one additional number to what we
just calculated. So, programming that perspective
on the algorithm removes one of the loops and
gives us a O(N2) algorithm.
46- /
- Quadratic maximum contiguous subsequence
sum algorithm. - seqStart and seqEnd represent the actual
best sequence. - /
- public static int maxSubSum2( int a )
-
- int maxSum 0
- for( int i 0 i lt a.length i )
-
- int thisSum 0
- for( int j i j lt a.length j )
-
- thisSum a j
- if( thisSum gt maxSum )
-
- maxSum thisSum
- seqStart i
47Maximum Contiguous Subsequence Sum Problem
- Can we do better?
- Can we remove yet another loop?
- We need a clever observation that allows us to
eliminate some subsequences from consideration
without calculating their sums. Can we do that?
48Maximum Contiguous Subsequence Sum Problem
- Intuitively, if a subsequences sum is negative,
it cant be part of the maximum contiguous
subsequence. - All contiguous subsequences that border the
maximum contiguous subsequence must have negative
or 0 sums, or they would be included. - When a negative subsequence is found, we can not
only break the inner loop, we can advance i to
j1 (Proof, Th 5.3 p. 175)
49Maximum Contiguous Subsequence Sum Problem
- With these observations, we can find the solution
to this problem in linear time - O(N).
50- /
- Linear-time maximum contiguous subsequence
sum algorithm. - seqStart and seqEnd represent the actual
best sequence. - /
- public static int maxSubSum3( int a )
-
- int maxSum 0
- int thisSum 0
- for( int i 0, j 0 j lt a.length j
) -
- thisSum a j
- if( thisSum gt maxSum )
-
- maxSum thisSum
- seqStart i
- seqEnd j
-
51Big-Oh Rules
- Big-Oh T(N) is O(F(N)) if there are positive
constants c and N0 such that T(N) lt cF(N) when
Ngt N0
52Big-Oh Rules
- Big-Omega T(N) is (omega)(F(N)) if there are
positive constants c and N0 such that T(N) gt
cF(N) when Ngt N0
53Big-Oh Rules
- Big-Theta T(N) is (theta)(F(N)) if and only if
T(N) is O(F(N)) and T(N) is (omega)(F(N))
54Big-Oh Rules
- Little-oh T(N) is o(F(N)) if and only if T(N) is
O(F(N)) and T(N) is NOT (theta)(F(N))
55(No Transcript)
56Big-Oh Rules
- Including constants or low order terms inside a
Big-Oh is bad style (like O(2n) or O(n1)) - In any analysis with a Big-Oh answer allows lots
of shortcuts - Throw away low order terms
- Throw away leading constants
- Throw away relational symbols
57Big-Oh Rules
- Running time of a loop is at most the running
time of the statements inside the loop (including
tests) times the number of iterations. - Running time of statements inside of nested loops
is the running time of the statements (including
tests in the innermost loop) multiplied by the
sizes of all the loops.
58Big-Oh Rules
- Guarantees ONLY an upper bound, not an exact
asymptotic answer. - So, using actual figures, how does the running
time grow for each type of curve seen earlier? - If an algorithm takes T(N) time to solve a
problem of size N, how long does it take to solve
a larger problem?
59(No Transcript)
60Big-Oh Extras
- So, are all cubic and quadratic algorithms
useless? NO! - Sometimes, thats the best you can do.
- When the amount of input is low, any algorithm
will do. - Usually easier to program.
- Good for testing
61Logarithms
- The exponent that indicates the power to which a
number (the base) is raised to produce a given
number. - For any B, Ngt0,
- logBN K if BK N
- In computer science, when the base (B) is
omitted, it defaults to 2. (also written lg
instead of log) - Base is unimportant to Big-Oh (it can be
anything!)
62Logarithms
- Important fact logs grow slowly.
63Logarithms
- How do we use logs in CSC?
- How many bits are required to represent numbers?
- Starting with X1, how many times should X be
doubled before it is at least as large as N? - Starting with XN, how many times should X be
halved before it is smaller than or equal to 1?
64Logarithms
- The number of bits necessary to represent numbers
is logarithmic - The repeated doubling principle holds that,
starting at 1, we can repeatedly double only
logarithmically many times until we reach N. - The repeated halving principle holds that,
starting at N, we can halve only logarithmically
many times. This process is used to obtain
logarithmic routines for searching.
65Static Searching Problem
- Given an integer X and an array A, return the
position of X in A or an indication that it is
not present. If X occurs more than once, return
any occurrence. The array A is never altered. - EXAMPLE Phone Book
- Look up a name (easy)
- Look up a phone number (difficult)
66Static Searching Problem
- Sequential Search
- Binary Search
- Interpolation Search
67Sequential Search
- A linear searching technique that steps through
an array sequentially until a match is found. - Analysis
- Provide the cost of an unsuccessful search
- Provide the cost of a worst-case successful
search - Find the average cost of a successful search.
68Sequential Search Analysis
- Provide the cost of an unsuccessful search
- Typically more time consuming than successful
searches. - Requires the examination of every item in the
array, so the time would be O(N).
69Sequential Search Analysis
- Provide the cost of a worst-case successful
search - We may not find the item until we check the last
element of the array. - Requires the examination of every item in the
array, so the time would be O(N).
70Sequential Search Analysis
- Find the average cost of a successful search.
- On average, we only search half of the array.
This would be N/2. - What is Big-Oh?
71Binary Search
- Requires that the input array be sorted.
- Works from the middle of the array instead of
either end. - Analysis
- Provide the cost of an unsuccessful search
- Provide the cost of a worst-case successful
search - Find the average cost of a successful search.
72Binary Search Analysis
- Provide the cost of an unsuccessful search
- We halve the range in each iteration, rounding
down if the range has an odd number of elements. - The number of iterations is the floor(logN) 1
73Binary Search Analysis
- Provide the cost of a worst-case successful
search - We halve the range in each iteration, rounding
down if the range has an odd number of elements. - The number of iterations is the floor(logN)
74Binary Search Analysis
- Find the average cost of a successful search.
- Only one iteration better than the worse-case
analysis. - BOTTOM LINE O(log N)
75Interpolation Search
- Requires that
- Each access must be very expensive compared to a
typical instruction. (Ex array is on a disk
instead of memory such that each comparison
requires disk access) - The data must be sorted AND fairly uniformly
distributed. - Requires that we spend more time making an
accurate guess as to where the item might be.
Binary search uses the midpoint however,
Interpolation search recognizes that we may want
to start more toward one end or the other of the
array.
76Interpolation Search Analysis
- Provide the cost of a worst-case successful
search - If the data is not uniformly distributed, could
be linear and every item might need to be
examined. - Find the average cost of a successful search.
- Has been shown to be O(loglogN)
77Limitations of Big-Oh
- Not appropriate for small inputs
- Constants may be too large to be practical.
- Can be an overestimate
- While average case may be a better judge of
algorithm effectiveness, it is more difficult to
find than Big-Oh, so we usually use Big-Oh
despite the limitations.