CSC 332 Algorithms and Data Structures

About This Presentation

Title:

CSC 332 Algorithms and Data Structures

Description:

Every 18-24 months, manufacturers are introducing faster machines with larger memories. ... In preparation for a toast, the waiter pours champagne into n glasses. ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 78

Provided by: acade124

Category:

more less

Transcript and Presenter's Notes

Title: CSC 332 Algorithms and Data Structures

1
CSC 332 Algorithms and Data Structures

Analysis

Dr. Paige H. Meeker Computer Science Presbyterian
College, Clinton, SC
2
Why do we care?

Every 18-24 months, manufacturers are
introducing faster machines with larger memories.
Why do we need to write efficient code?

3
Well, just suppose

Imagine we are defining a Java class Huge to
represent long integers.
We want a method for our class that
adds two large integers two instances of our
class
multiplies two large integers.
Suppose we have successfully implemented the
add() method and are moving on to multiply().

4
Well, just suppose

Suppose we have successfully implemented the
add() method and are moving on to multiply().
Object oriented coding is all about using what
you already have, right? So, since multiplication
is equivalent to repeated addition, to compute
the product 7562 423 we could initialize a
variable to 0 and then add 7562 to it 463 times
why not, add() works, right?

5
Efficiency Example

public class BigOEx
public static void main (String args)
long firstOp 7562
long secondOp 463
long product 0
for (long isecondOp igt0 i--)
product firstOp
System.out.println("Product of "firstOp"
and "secondOp" "product)

6
Efficiency Example

If we run the previous code, we should get a
result in a reasonable amount of time. So, lets
run it but replace the 463 with 100000000 (8
0s). Will we still get a result in a reasonable
time? How about with 1000000000. (9 0s) Is
something wrong?

7
Efficiency Example

Why a code delay?
Can we do better?

8
Efficiency Example

How might we rethink our code to produce a more
efficient result?

9
Efficiency Example

Consider the product of the original numbers (463
7562). The secondOp has 3 digits a 100s
digit, a 10s digit, and a 1s digit. (463 400
60 3) So
7562 463 7562 (400 60 3)
7562 400 7562 60 7562 3
756200 4 75620 6 75623
75620075620075620075620075620
7562075620756207562075620756275627562

10
Efficiency Example

public class BetterBigOEx
public static void main (String args)
long firstOrig,secondOrig
long firstOp firstOrig 7562
long secondOp secondOrig 1000000000
int secOpLength10
long product 0
for (int digitPosition0
digitPositionltsecOpLengthdigitPosition
)
int digit (int)(secondOp -
(secondOp/10)10)
for (int counter digit counter gt 0
counter--)
product productfirstOp
secondOp secondOp/10 // discard last
digit
firstOp 10firstOp //tack a 0 to the
right
System.out.println("Product of "firstOrig"
and "secondOrig" "product)

11
Efficiency Example

Does efficiency matter?
How do we measure it?
Create 2 programs and measure the difference (not
always possible)
Measure algorithm before implementation

12
Efficiency Calculation

Lets say you need to get downtown. You can
walk, drive, ask a friend to take you, or take a
bus. Whats the best way?

13
Efficiency Measurement

Algorithms have measurable time and space
requirements called complexity.
We are not considering how difficult it is to
code, but rather the time it takes to execute and
the memory it will need.

14
Analysis of Algorithms

Usually measure time complexity
Compute approximation, not actual time
Typically estimate the WORST (maximum) time the
algorithm could take. Why?
Measurements also exist for best and average
cases, but generally you look for the worse case
analysis.

15
Analysis of Algorithms

How do we compute worst case time?
PROBLEM Compute the sum 12n for some
positive integer n.
Think about possible ways to solve this problem
(then look at the next 3 slides for suggestions!)

16
Analysis of Algorithms

AlgorithmA computes 012n from left to
right
sum 0
for (i1 to n)
sum i

17
Analysis of Algorithms

AlgorithmB computes 0 1 (11)
(111)(111)
sum 0
for (i1 to n)
for (j 1 to i)
sum

18
Analysis of Algorithms

AlgorithmC uses an algebraic identity to
compute the sum
sum n (n1)/2

19
Analysis of Algorithms

How do we determine which algorithm (A, B, or C)
is fastest?
Consider the size of the problem and the effort
involved. (Measure problem size using n)
Find an appropriate growth-rate function.
Count the number of operations required by the
algorithm

20
Analysis of Algorithms

AlgorithmA n1 assignments, n additions, 0
multiplications, and 0 divisions for TOTAL OPS
2n1
AlgorithmB 1n(n1)/2 assignments, n(n1)/2
additions, 0 multiplications, and 0 divisions for
TOTAL OPS n2n1
AlgorithmC 1 assignments, 1 additions, 1
multiplications, and 1 divisions for TOTAL OPS 4

21
Analysis of Algorithms

So,
AlgorithmAs growth rate function is 2n1 time
units
AlgorithmBs is n2n1 time units
AlgorithmCs is constant.
Speed wise, Cs fastest, followed by A and then
by B.

22
Analysis of Algorithms

The running time of an algorithm is a function of
the size of the input. More data means that the
program takes more time.

23
Analysis of Algorithms

How do we express this, then, in proper notation?
First rule focus on the large instances of the
problem that is, consider only the dominant term
in each growth rate function. (Here, n2)
The difference between n2n1 and n2 is
relatively small for large n and so we can use
the term with the largest exponent to describe
the growth rate.

24
Big Oh Notation

Computer Scientists use different notations to
represent best, average, and worst case analysis.
Big-Oh represents worst case. So
AlgorithmA is O(n)
AlgorithmB is O(n2)
AlgorithmC is O(1) (aka constant time)

25
Real Life Examples

You are seated at a wedding reception with a
table of n people. In preparation for a toast,
the waiter pours champagne into n glasses. What
is the time complexity?
Someone makes a toast. What is the time
complexity?
Everyone clinks glasses with everyone else?

26
Designing Efficient Algorithms

Generally, we are wanting to process a large
amount of data
We want to design an algorithm (step by step
instructions) that will use the resources
(memory, space, speed) of the computer well.

As we previously mentioned, the amount of time
taken is our usual tool to analyze and this is
determined by the amount of input so the
running time of the algorithm is given as a
function of its input size.

28
(No Transcript)
29
(No Transcript)
30
Questions to Ask

Is it always important to be on the most
efficient curve?
How much better is one curve than another?
How to you decide which curve a particular
algorithm lies on?
How do you design algorithms that avoid being on
less efficient curves?

31
Functions in Order of Increasing Growth Rate

Function Name
C Constant
logN Logarithmic
log2N Log-squared
N Linear
NlogN NlogN
N2 Quadratic
N3 Cubic
2N Exponential

Growth rate of a function is most important when
N is sufficiently large.
When input sizes are small, it is best to use the
simplest algorithm
Quadratic algorithms are impractical if input
size gt a few thousand
Cubic algorithms are impractical if input size gt
a few hundred.

33
3 Problems to Analyze

Minimum Element in an Array
Closest Points in the Plane
Colinear Points in the Plane

34
Minimum Element in an Array

Given an array of N items, find the smallest
item.
Obvious Solution
Maintain a variable min that stores the minimum
element
Initialize min to the first element
Make a sequential scan through the array and
update min as appropriate
Running Time?

35
Closest Points in the Plane

Given N points in a plane, find the pair of
points that are closest together
Obvious Solution
Calculate the distance between each pair of
points
Retain the minimum distance
Running Time?

36
Colinear Points in the Plane

Given N points in a plane, determine if any three
form a straight line.
Obvious Solution
Enumerate all groups of 3 points
Running Time?

37
Maximum Contiguous Subsequence Sum Problem

Given (possibly negative) integers A1, A2An,
find (and identify the sequence corresponding to)
the maximum value of the sum from elements i
through j of the list. The maximum contiguous
subsequence sum is zero if all integers are
negative.

38
Maximum Contiguous Subsequence Sum Problem

Example
Given -2, 11, -4, 13, -5, 2, the answer is 20,
the contiguous subsequence from items 2 through 4.

39
Maximum Contiguous Subsequence Sum Problem

Designing a Solution
Consider emptiness
Obvious Solution (aka Brute Force)
Can we improve it? (must be a little clever)
Can we further improve it? (must be really clever
and/or experienced!)

40
Maximum Contiguous Subsequence Sum Problem

Obvious Solution O(N3)
A direct and exhaustive search (Brute Force
Approach)
Pro Extreme simplicity easy to program
Con Least efficient method

/
Cubic maximum contiguous subsequence sum
algorithm.
seqStart and seqEnd represent the actual
best sequence.
/
public static int maxSubSum1( int a )
int maxSum 0
for( int i 0 i lt a.length i )
for( int j i j lt a.length j )
int thisSum 0
for( int k i k lt j k )
thisSum a k
if( thisSum gt maxSum )
maxSum thisSum

42
Maximum Contiguous Subsequence Sum Problem

To calculate the analysis of the algorithm, you
basically count the number of times each
statement is executed and then pick the dominant
one. In our case, the statement inside the 3rd
for loop is executed a little less than N3 times,
making it the dominant term.

43
Maximum Contiguous Subsequence Sum Problem

SHORTCUT
We see a loop of potentially size N inside a loop
of potentially size N inside a loop of
potentially size N NNN potential iterations!
Generally, this cost calculation is off by a
constant factor (that gets removed by Big-Oh
notation anyway), so we can get away with it.

44
Maximum Contiguous Subsequence Sum Problem

Since our cubic algorithm seems to be the result
of statements inside of loops, can we lower the
running time by removing a loop? Are they all
necessary?
In some cases, we cant remove the loop. In this
one

45
Maximum Contiguous Subsequence Sum Problem

Lets observe that we calculate the contiguous
subsequence as we go we dont need to reinvent
the wheel each time (as our algorithm does) we
only need to add one additional number to what we
just calculated. So, programming that perspective
on the algorithm removes one of the loops and
gives us a O(N2) algorithm.

/
Quadratic maximum contiguous subsequence
sum algorithm.
seqStart and seqEnd represent the actual
best sequence.
/
public static int maxSubSum2( int a )
int maxSum 0
for( int i 0 i lt a.length i )
int thisSum 0
for( int j i j lt a.length j )
thisSum a j
if( thisSum gt maxSum )
maxSum thisSum
seqStart i

47
Maximum Contiguous Subsequence Sum Problem

Can we do better?
Can we remove yet another loop?
We need a clever observation that allows us to
eliminate some subsequences from consideration
without calculating their sums. Can we do that?

48
Maximum Contiguous Subsequence Sum Problem

Intuitively, if a subsequences sum is negative,
it cant be part of the maximum contiguous
subsequence.
All contiguous subsequences that border the
maximum contiguous subsequence must have negative
or 0 sums, or they would be included.
When a negative subsequence is found, we can not
only break the inner loop, we can advance i to
j1 (Proof, Th 5.3 p. 175)

49
Maximum Contiguous Subsequence Sum Problem

With these observations, we can find the solution
to this problem in linear time - O(N).

/
Linear-time maximum contiguous subsequence
sum algorithm.
seqStart and seqEnd represent the actual
best sequence.
/
public static int maxSubSum3( int a )
int maxSum 0
int thisSum 0
for( int i 0, j 0 j lt a.length j
)
thisSum a j
if( thisSum gt maxSum )
maxSum thisSum
seqStart i
seqEnd j

51
Big-Oh Rules

Big-Oh T(N) is O(F(N)) if there are positive
constants c and N0 such that T(N) lt cF(N) when
Ngt N0

52
Big-Oh Rules

Big-Omega T(N) is (omega)(F(N)) if there are
positive constants c and N0 such that T(N) gt
cF(N) when Ngt N0

53
Big-Oh Rules

Big-Theta T(N) is (theta)(F(N)) if and only if
T(N) is O(F(N)) and T(N) is (omega)(F(N))

54
Big-Oh Rules

Little-oh T(N) is o(F(N)) if and only if T(N) is
O(F(N)) and T(N) is NOT (theta)(F(N))

55
(No Transcript)
56
Big-Oh Rules

Including constants or low order terms inside a
Big-Oh is bad style (like O(2n) or O(n1))
In any analysis with a Big-Oh answer allows lots
of shortcuts
Throw away low order terms
Throw away leading constants
Throw away relational symbols

57
Big-Oh Rules

Running time of a loop is at most the running
time of the statements inside the loop (including
tests) times the number of iterations.
Running time of statements inside of nested loops
is the running time of the statements (including
tests in the innermost loop) multiplied by the
sizes of all the loops.

58
Big-Oh Rules

Guarantees ONLY an upper bound, not an exact
asymptotic answer.
So, using actual figures, how does the running
time grow for each type of curve seen earlier?
If an algorithm takes T(N) time to solve a
problem of size N, how long does it take to solve
a larger problem?

59
(No Transcript)
60
Big-Oh Extras

So, are all cubic and quadratic algorithms
useless? NO!
Sometimes, thats the best you can do.
When the amount of input is low, any algorithm
will do.
Usually easier to program.
Good for testing

61
Logarithms

The exponent that indicates the power to which a
number (the base) is raised to produce a given
number.
For any B, Ngt0,
logBN K if BK N
In computer science, when the base (B) is
omitted, it defaults to 2. (also written lg
instead of log)
Base is unimportant to Big-Oh (it can be
anything!)

62
Logarithms

Important fact logs grow slowly.

63
Logarithms

How do we use logs in CSC?
How many bits are required to represent numbers?
Starting with X1, how many times should X be
doubled before it is at least as large as N?
Starting with XN, how many times should X be
halved before it is smaller than or equal to 1?

64
Logarithms

The number of bits necessary to represent numbers
is logarithmic
The repeated doubling principle holds that,
starting at 1, we can repeatedly double only
logarithmically many times until we reach N.
The repeated halving principle holds that,
starting at N, we can halve only logarithmically
many times. This process is used to obtain
logarithmic routines for searching.

65
Static Searching Problem

Given an integer X and an array A, return the
position of X in A or an indication that it is
not present. If X occurs more than once, return
any occurrence. The array A is never altered.
EXAMPLE Phone Book
Look up a name (easy)
Look up a phone number (difficult)

66
Static Searching Problem

Sequential Search
Binary Search
Interpolation Search

67
Sequential Search

A linear searching technique that steps through
an array sequentially until a match is found.
Analysis
Provide the cost of an unsuccessful search
Provide the cost of a worst-case successful
search
Find the average cost of a successful search.

68
Sequential Search Analysis

Provide the cost of an unsuccessful search
Typically more time consuming than successful
searches.
Requires the examination of every item in the
array, so the time would be O(N).

69
Sequential Search Analysis

Provide the cost of a worst-case successful
search
We may not find the item until we check the last
element of the array.
Requires the examination of every item in the
array, so the time would be O(N).

70
Sequential Search Analysis

Find the average cost of a successful search.
On average, we only search half of the array.
This would be N/2.
What is Big-Oh?

71
Binary Search

Requires that the input array be sorted.
Works from the middle of the array instead of
either end.
Analysis
Provide the cost of an unsuccessful search
Provide the cost of a worst-case successful
search
Find the average cost of a successful search.

72
Binary Search Analysis

Provide the cost of an unsuccessful search
We halve the range in each iteration, rounding
down if the range has an odd number of elements.
The number of iterations is the floor(logN) 1

73
Binary Search Analysis

Provide the cost of a worst-case successful
search
We halve the range in each iteration, rounding
down if the range has an odd number of elements.
The number of iterations is the floor(logN)

74
Binary Search Analysis

Find the average cost of a successful search.
Only one iteration better than the worse-case
analysis.
BOTTOM LINE O(log N)

75
Interpolation Search

Requires that
Each access must be very expensive compared to a
typical instruction. (Ex array is on a disk
instead of memory such that each comparison
requires disk access)
The data must be sorted AND fairly uniformly
distributed.
Requires that we spend more time making an
accurate guess as to where the item might be.
Binary search uses the midpoint however,
Interpolation search recognizes that we may want
to start more toward one end or the other of the
array.

76
Interpolation Search Analysis

Provide the cost of a worst-case successful
search
If the data is not uniformly distributed, could
be linear and every item might need to be
examined.
Find the average cost of a successful search.
Has been shown to be O(loglogN)

77
Limitations of Big-Oh

Not appropriate for small inputs
Constants may be too large to be practical.
Can be an overestimate
While average case may be a better judge of
algorithm effectiveness, it is more difficult to
find than Big-Oh, so we usually use Big-Oh
despite the limitations.

Write a Comment

User Comments (0)

About PowerShow.com

CSC 332 Algorithms and Data Structures - PowerPoint PPT Presentation

CSC 332 Algorithms and Data Structures

Every 18-24 months, manufacturers are introducing faster machines with larger memories. ... In preparation for a toast, the waiter pours champagne into n glasses. ... – PowerPoint PPT presentation