CSC 332 Algorithms and Data Structures - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

CSC 332 Algorithms and Data Structures

Description:

Every 18-24 months, manufacturers are introducing faster machines with larger memories. ... In preparation for a toast, the waiter pours champagne into n glasses. ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 78
Provided by: acade124
Category:

less

Transcript and Presenter's Notes

Title: CSC 332 Algorithms and Data Structures


1
CSC 332 Algorithms and Data Structures
  • Analysis

Dr. Paige H. Meeker Computer Science Presbyterian
College, Clinton, SC
2
Why do we care?
  • Every 18-24 months, manufacturers are
    introducing faster machines with larger memories.
    Why do we need to write efficient code?

3
Well, just suppose
  • Imagine we are defining a Java class Huge to
    represent long integers.
  • We want a method for our class that
  • adds two large integers two instances of our
    class
  • multiplies two large integers.
  • Suppose we have successfully implemented the
    add() method and are moving on to multiply().

4
Well, just suppose
  • Suppose we have successfully implemented the
    add() method and are moving on to multiply().
    Object oriented coding is all about using what
    you already have, right? So, since multiplication
    is equivalent to repeated addition, to compute
    the product 7562 423 we could initialize a
    variable to 0 and then add 7562 to it 463 times
    why not, add() works, right?

5
Efficiency Example
  • public class BigOEx
  • public static void main (String args)
  • long firstOp 7562
  • long secondOp 463
  • long product 0
  • for (long isecondOp igt0 i--)
  • product firstOp
  • System.out.println("Product of "firstOp"
    and "secondOp" "product)

6
Efficiency Example
  • If we run the previous code, we should get a
    result in a reasonable amount of time. So, lets
    run it but replace the 463 with 100000000 (8
    0s). Will we still get a result in a reasonable
    time? How about with 1000000000. (9 0s) Is
    something wrong?

7
Efficiency Example
  • Why a code delay?
  • Can we do better?

8
Efficiency Example
  • How might we rethink our code to produce a more
    efficient result?

9
Efficiency Example
  • Consider the product of the original numbers (463
    7562). The secondOp has 3 digits a 100s
    digit, a 10s digit, and a 1s digit. (463 400
    60 3) So
  • 7562 463 7562 (400 60 3)
  • 7562 400 7562 60 7562 3
  • 756200 4 75620 6 75623
  • 75620075620075620075620075620
  • 7562075620756207562075620756275627562

10
Efficiency Example
  • public class BetterBigOEx
  • public static void main (String args)
  • long firstOrig,secondOrig
  • long firstOp firstOrig 7562
  • long secondOp secondOrig 1000000000
  • int secOpLength10
  • long product 0
  • for (int digitPosition0
  • digitPositionltsecOpLengthdigitPosition
    )
  • int digit (int)(secondOp -
    (secondOp/10)10)
  • for (int counter digit counter gt 0
    counter--)
  • product productfirstOp
  • secondOp secondOp/10 // discard last
    digit
  • firstOp 10firstOp //tack a 0 to the
    right
  • System.out.println("Product of "firstOrig"
    and "secondOrig" "product)

11
Efficiency Example
  • Does efficiency matter?
  • How do we measure it?
  • Create 2 programs and measure the difference (not
    always possible)
  • Measure algorithm before implementation

12
Efficiency Calculation
  • Lets say you need to get downtown. You can
    walk, drive, ask a friend to take you, or take a
    bus. Whats the best way?

13
Efficiency Measurement
  • Algorithms have measurable time and space
    requirements called complexity.
  • We are not considering how difficult it is to
    code, but rather the time it takes to execute and
    the memory it will need.

14
Analysis of Algorithms
  • Usually measure time complexity
  • Compute approximation, not actual time
  • Typically estimate the WORST (maximum) time the
    algorithm could take. Why?
  • Measurements also exist for best and average
    cases, but generally you look for the worse case
    analysis.

15
Analysis of Algorithms
  • How do we compute worst case time?
  • PROBLEM Compute the sum 12n for some
    positive integer n.
  • Think about possible ways to solve this problem
    (then look at the next 3 slides for suggestions!)

16
Analysis of Algorithms
  • AlgorithmA computes 012n from left to
    right
  • sum 0
  • for (i1 to n)
  • sum i

17
Analysis of Algorithms
  • AlgorithmB computes 0 1 (11)
    (111)(111)
  • sum 0
  • for (i1 to n)
  • for (j 1 to i)
  • sum

18
Analysis of Algorithms
  • AlgorithmC uses an algebraic identity to
    compute the sum
  • sum n (n1)/2

19
Analysis of Algorithms
  • How do we determine which algorithm (A, B, or C)
    is fastest?
  • Consider the size of the problem and the effort
    involved. (Measure problem size using n)
  • Find an appropriate growth-rate function.
  • Count the number of operations required by the
    algorithm

20
Analysis of Algorithms
  • AlgorithmA n1 assignments, n additions, 0
    multiplications, and 0 divisions for TOTAL OPS
    2n1
  • AlgorithmB 1n(n1)/2 assignments, n(n1)/2
    additions, 0 multiplications, and 0 divisions for
    TOTAL OPS n2n1
  • AlgorithmC 1 assignments, 1 additions, 1
    multiplications, and 1 divisions for TOTAL OPS 4

21
Analysis of Algorithms
  • So,
  • AlgorithmAs growth rate function is 2n1 time
    units
  • AlgorithmBs is n2n1 time units
  • AlgorithmCs is constant.
  • Speed wise, Cs fastest, followed by A and then
    by B.

22
Analysis of Algorithms
  • The running time of an algorithm is a function of
    the size of the input. More data means that the
    program takes more time.

23
Analysis of Algorithms
  • How do we express this, then, in proper notation?
  • First rule focus on the large instances of the
    problem that is, consider only the dominant term
    in each growth rate function. (Here, n2)
  • The difference between n2n1 and n2 is
    relatively small for large n and so we can use
    the term with the largest exponent to describe
    the growth rate.

24
Big Oh Notation
  • Computer Scientists use different notations to
    represent best, average, and worst case analysis.
    Big-Oh represents worst case. So
  • AlgorithmA is O(n)
  • AlgorithmB is O(n2)
  • AlgorithmC is O(1) (aka constant time)

25
Real Life Examples
  • You are seated at a wedding reception with a
    table of n people. In preparation for a toast,
    the waiter pours champagne into n glasses. What
    is the time complexity?
  • Someone makes a toast. What is the time
    complexity?
  • Everyone clinks glasses with everyone else?

26
Designing Efficient Algorithms
  • Generally, we are wanting to process a large
    amount of data
  • We want to design an algorithm (step by step
    instructions) that will use the resources
    (memory, space, speed) of the computer well.

27
  • As we previously mentioned, the amount of time
    taken is our usual tool to analyze and this is
    determined by the amount of input so the
    running time of the algorithm is given as a
    function of its input size.

28
(No Transcript)
29
(No Transcript)
30
Questions to Ask
  • Is it always important to be on the most
    efficient curve?
  • How much better is one curve than another?
  • How to you decide which curve a particular
    algorithm lies on?
  • How do you design algorithms that avoid being on
    less efficient curves?

31
Functions in Order of Increasing Growth Rate
  • Function Name
  • C Constant
  • logN Logarithmic
  • log2N Log-squared
  • N Linear
  • NlogN NlogN
  • N2 Quadratic
  • N3 Cubic
  • 2N Exponential

32
  • Growth rate of a function is most important when
    N is sufficiently large.
  • When input sizes are small, it is best to use the
    simplest algorithm
  • Quadratic algorithms are impractical if input
    size gt a few thousand
  • Cubic algorithms are impractical if input size gt
    a few hundred.

33
3 Problems to Analyze
  • Minimum Element in an Array
  • Closest Points in the Plane
  • Colinear Points in the Plane

34
Minimum Element in an Array
  • Given an array of N items, find the smallest
    item.
  • Obvious Solution
  • Maintain a variable min that stores the minimum
    element
  • Initialize min to the first element
  • Make a sequential scan through the array and
    update min as appropriate
  • Running Time?

35
Closest Points in the Plane
  • Given N points in a plane, find the pair of
    points that are closest together
  • Obvious Solution
  • Calculate the distance between each pair of
    points
  • Retain the minimum distance
  • Running Time?

36
Colinear Points in the Plane
  • Given N points in a plane, determine if any three
    form a straight line.
  • Obvious Solution
  • Enumerate all groups of 3 points
  • Running Time?

37
Maximum Contiguous Subsequence Sum Problem
  • Given (possibly negative) integers A1, A2An,
    find (and identify the sequence corresponding to)
    the maximum value of the sum from elements i
    through j of the list. The maximum contiguous
    subsequence sum is zero if all integers are
    negative.

38
Maximum Contiguous Subsequence Sum Problem
  • Example
  • Given -2, 11, -4, 13, -5, 2, the answer is 20,
    the contiguous subsequence from items 2 through 4.

39
Maximum Contiguous Subsequence Sum Problem
  • Designing a Solution
  • Consider emptiness
  • Obvious Solution (aka Brute Force)
  • Can we improve it? (must be a little clever)
  • Can we further improve it? (must be really clever
    and/or experienced!)

40
Maximum Contiguous Subsequence Sum Problem
  • Obvious Solution O(N3)
  • A direct and exhaustive search (Brute Force
    Approach)
  • Pro Extreme simplicity easy to program
  • Con Least efficient method

41
  • /
  • Cubic maximum contiguous subsequence sum
    algorithm.
  • seqStart and seqEnd represent the actual
    best sequence.
  • /
  • public static int maxSubSum1( int a )
  • int maxSum 0
  • for( int i 0 i lt a.length i )
  • for( int j i j lt a.length j )
  • int thisSum 0
  • for( int k i k lt j k )
  • thisSum a k
  • if( thisSum gt maxSum )
  • maxSum thisSum

42
Maximum Contiguous Subsequence Sum Problem
  • To calculate the analysis of the algorithm, you
    basically count the number of times each
    statement is executed and then pick the dominant
    one. In our case, the statement inside the 3rd
    for loop is executed a little less than N3 times,
    making it the dominant term.

43
Maximum Contiguous Subsequence Sum Problem
  • SHORTCUT
  • We see a loop of potentially size N inside a loop
    of potentially size N inside a loop of
    potentially size N NNN potential iterations!
  • Generally, this cost calculation is off by a
    constant factor (that gets removed by Big-Oh
    notation anyway), so we can get away with it.

44
Maximum Contiguous Subsequence Sum Problem
  • Since our cubic algorithm seems to be the result
    of statements inside of loops, can we lower the
    running time by removing a loop? Are they all
    necessary?
  • In some cases, we cant remove the loop. In this
    one

45
Maximum Contiguous Subsequence Sum Problem
  • Lets observe that we calculate the contiguous
    subsequence as we go we dont need to reinvent
    the wheel each time (as our algorithm does) we
    only need to add one additional number to what we
    just calculated. So, programming that perspective
    on the algorithm removes one of the loops and
    gives us a O(N2) algorithm.

46
  • /
  • Quadratic maximum contiguous subsequence
    sum algorithm.
  • seqStart and seqEnd represent the actual
    best sequence.
  • /
  • public static int maxSubSum2( int a )
  • int maxSum 0
  • for( int i 0 i lt a.length i )
  • int thisSum 0
  • for( int j i j lt a.length j )
  • thisSum a j
  • if( thisSum gt maxSum )
  • maxSum thisSum
  • seqStart i

47
Maximum Contiguous Subsequence Sum Problem
  • Can we do better?
  • Can we remove yet another loop?
  • We need a clever observation that allows us to
    eliminate some subsequences from consideration
    without calculating their sums. Can we do that?

48
Maximum Contiguous Subsequence Sum Problem
  • Intuitively, if a subsequences sum is negative,
    it cant be part of the maximum contiguous
    subsequence.
  • All contiguous subsequences that border the
    maximum contiguous subsequence must have negative
    or 0 sums, or they would be included.
  • When a negative subsequence is found, we can not
    only break the inner loop, we can advance i to
    j1 (Proof, Th 5.3 p. 175)

49
Maximum Contiguous Subsequence Sum Problem
  • With these observations, we can find the solution
    to this problem in linear time - O(N).

50
  • /
  • Linear-time maximum contiguous subsequence
    sum algorithm.
  • seqStart and seqEnd represent the actual
    best sequence.
  • /
  • public static int maxSubSum3( int a )
  • int maxSum 0
  • int thisSum 0
  • for( int i 0, j 0 j lt a.length j
    )
  • thisSum a j
  • if( thisSum gt maxSum )
  • maxSum thisSum
  • seqStart i
  • seqEnd j

51
Big-Oh Rules
  • Big-Oh T(N) is O(F(N)) if there are positive
    constants c and N0 such that T(N) lt cF(N) when
    Ngt N0

52
Big-Oh Rules
  • Big-Omega T(N) is (omega)(F(N)) if there are
    positive constants c and N0 such that T(N) gt
    cF(N) when Ngt N0

53
Big-Oh Rules
  • Big-Theta T(N) is (theta)(F(N)) if and only if
    T(N) is O(F(N)) and T(N) is (omega)(F(N))

54
Big-Oh Rules
  • Little-oh T(N) is o(F(N)) if and only if T(N) is
    O(F(N)) and T(N) is NOT (theta)(F(N))

55
(No Transcript)
56
Big-Oh Rules
  • Including constants or low order terms inside a
    Big-Oh is bad style (like O(2n) or O(n1))
  • In any analysis with a Big-Oh answer allows lots
    of shortcuts
  • Throw away low order terms
  • Throw away leading constants
  • Throw away relational symbols

57
Big-Oh Rules
  • Running time of a loop is at most the running
    time of the statements inside the loop (including
    tests) times the number of iterations.
  • Running time of statements inside of nested loops
    is the running time of the statements (including
    tests in the innermost loop) multiplied by the
    sizes of all the loops.

58
Big-Oh Rules
  • Guarantees ONLY an upper bound, not an exact
    asymptotic answer.
  • So, using actual figures, how does the running
    time grow for each type of curve seen earlier?
  • If an algorithm takes T(N) time to solve a
    problem of size N, how long does it take to solve
    a larger problem?

59
(No Transcript)
60
Big-Oh Extras
  • So, are all cubic and quadratic algorithms
    useless? NO!
  • Sometimes, thats the best you can do.
  • When the amount of input is low, any algorithm
    will do.
  • Usually easier to program.
  • Good for testing

61
Logarithms
  • The exponent that indicates the power to which a
    number (the base) is raised to produce a given
    number.
  • For any B, Ngt0,
  • logBN K if BK N
  • In computer science, when the base (B) is
    omitted, it defaults to 2. (also written lg
    instead of log)
  • Base is unimportant to Big-Oh (it can be
    anything!)

62
Logarithms
  • Important fact logs grow slowly.

63
Logarithms
  • How do we use logs in CSC?
  • How many bits are required to represent numbers?
  • Starting with X1, how many times should X be
    doubled before it is at least as large as N?
  • Starting with XN, how many times should X be
    halved before it is smaller than or equal to 1?

64
Logarithms
  • The number of bits necessary to represent numbers
    is logarithmic
  • The repeated doubling principle holds that,
    starting at 1, we can repeatedly double only
    logarithmically many times until we reach N.
  • The repeated halving principle holds that,
    starting at N, we can halve only logarithmically
    many times. This process is used to obtain
    logarithmic routines for searching.

65
Static Searching Problem
  • Given an integer X and an array A, return the
    position of X in A or an indication that it is
    not present. If X occurs more than once, return
    any occurrence. The array A is never altered.
  • EXAMPLE Phone Book
  • Look up a name (easy)
  • Look up a phone number (difficult)

66
Static Searching Problem
  • Sequential Search
  • Binary Search
  • Interpolation Search

67
Sequential Search
  • A linear searching technique that steps through
    an array sequentially until a match is found.
  • Analysis
  • Provide the cost of an unsuccessful search
  • Provide the cost of a worst-case successful
    search
  • Find the average cost of a successful search.

68
Sequential Search Analysis
  • Provide the cost of an unsuccessful search
  • Typically more time consuming than successful
    searches.
  • Requires the examination of every item in the
    array, so the time would be O(N).

69
Sequential Search Analysis
  • Provide the cost of a worst-case successful
    search
  • We may not find the item until we check the last
    element of the array.
  • Requires the examination of every item in the
    array, so the time would be O(N).

70
Sequential Search Analysis
  • Find the average cost of a successful search.
  • On average, we only search half of the array.
    This would be N/2.
  • What is Big-Oh?

71
Binary Search
  • Requires that the input array be sorted.
  • Works from the middle of the array instead of
    either end.
  • Analysis
  • Provide the cost of an unsuccessful search
  • Provide the cost of a worst-case successful
    search
  • Find the average cost of a successful search.

72
Binary Search Analysis
  • Provide the cost of an unsuccessful search
  • We halve the range in each iteration, rounding
    down if the range has an odd number of elements.
  • The number of iterations is the floor(logN) 1

73
Binary Search Analysis
  • Provide the cost of a worst-case successful
    search
  • We halve the range in each iteration, rounding
    down if the range has an odd number of elements.
  • The number of iterations is the floor(logN)

74
Binary Search Analysis
  • Find the average cost of a successful search.
  • Only one iteration better than the worse-case
    analysis.
  • BOTTOM LINE O(log N)

75
Interpolation Search
  • Requires that
  • Each access must be very expensive compared to a
    typical instruction. (Ex array is on a disk
    instead of memory such that each comparison
    requires disk access)
  • The data must be sorted AND fairly uniformly
    distributed.
  • Requires that we spend more time making an
    accurate guess as to where the item might be.
    Binary search uses the midpoint however,
    Interpolation search recognizes that we may want
    to start more toward one end or the other of the
    array.

76
Interpolation Search Analysis
  • Provide the cost of a worst-case successful
    search
  • If the data is not uniformly distributed, could
    be linear and every item might need to be
    examined.
  • Find the average cost of a successful search.
  • Has been shown to be O(loglogN)

77
Limitations of Big-Oh
  • Not appropriate for small inputs
  • Constants may be too large to be practical.
  • Can be an overestimate
  • While average case may be a better judge of
    algorithm effectiveness, it is more difficult to
    find than Big-Oh, so we usually use Big-Oh
    despite the limitations.
Write a Comment
User Comments (0)
About PowerShow.com