CSE 326: Data Structures Introduction - PowerPoint PPT Presentation

About This Presentation
Title:

CSE 326: Data Structures Introduction

Description:

In theory, theory is the same as practice, but not in practice. Time to find N items: Time to find one item: Binary Search Linear Search * Sheet3. Sheet2. Sheet1. – PowerPoint PPT presentation

Number of Views:173
Avg rating:3.0/5.0
Slides: 90
Provided by: HenryK152
Category:

less

Transcript and Presenter's Notes

Title: CSE 326: Data Structures Introduction


1
CSE 326 Data Structures Introduction Part
One Complexity
  • Henry Kautz
  • Autumn Quarter 2002

2
Overview of the Quarter
  • Part One Complexity
  • inductive proofs of program correctness
  • empirical and asymptotic complexity
  • order of magnitude notation logs series
  • analyzing recursive programs
  • Part Two List-like data structures
  • Part Three Sorting
  • Part Four Search Trees
  • Part Five Hash Tables
  • Part Six Heaps and Union/Find
  • Part Seven Graph Algorithms
  • Part Eight Advanced Topics

3
Material for Part One
  • Weiss Chapters 1 and 2
  • Additional material
  • Graphical analysis
  • Amortized analysis
  • Stretchy arrays
  • Any questions on course organization?

4
Program Analysis
  • Correctness
  • Testing
  • Proofs of correctness
  • Efficiency
  • How to define?
  • Asymptotic complexity - how running times scales
    as function of size of input

5
Proving Programs Correct
  • Often takes the form of an inductive proof
  • Example summing an array

int sum(int v, int n) if (n0) return
0 else return vn-1sum(v,n-1)
What are the parts of an inductive proof?
6
Inductive Proof of Correctness
  • int sum(int v, int n)
  • if (n0) return 0
  • else return vn-1sum(v,n-1)

Theorem sum(v,n) correctly returns sum of 1st n
elements of array v for any n. Basis Step
Program is correct for n0 returns 0.
? Inductive Hypothesis (nk) Assume sum(v,k)
returns sum of first k elements of v. Inductive
Step (nk1) sum(v,k1) returns vksum(v,k),
which is the same of the first k1 elements of v.
?
7
Proof by Contradiction
  • Assume negation of goal, show this leads to a
    contradiction
  • Example there is no program that solves the
    halting problem
  • Determines if any other program runs forever or
    not

Alan Turing, 1937
8
Program NonConformist (Program P) If ( HALT(P)
never halts ) Then Halt Else Do While (1
gt 0) Print Hello! End While End If End
Program
  • Does NonConformist(NonConformist) halt?
  • Yes? That means HALT(NonConformist) never
    halts
  • No? That means HALT(NonConformist) halts

Contradiction!
9
Defining Efficiency
  • Asymptotic Complexity - how running time scales
    as function of size of input
  • Why is this a reasonable definition?

10
Defining Efficiency
  • Asymptotic Complexity - how running time scales
    as function of size of input
  • Why is this a reasonable definition?
  • Many kinds of small problems can be solved in
    practice by almost any approach
  • E.g., exhaustive enumeration of possible
    solutions
  • Want to focus efficiency concerns on larger
    problems
  • Definition is independent of any possible
    advances in computer technology

11
Technology-Depended Efficiency
  • Drum Computers Popular technology from early
    1960s
  • Transistors too costly to use for RAM, so memory
    was kept on a revolving magnetic drum
  • An efficient program scattered instructions on
    the drum so that next instruction to execute was
    under read head just when it was needed
  • Minimized number of full revolutionsof drum
    during execution

12
The Apocalyptic Laptop
Speed ? Energy Consumption E m c 2 25
million megawatt-hours Quantum mechanicsSwitchin
g speed ? h / (2 Energy) h is Plancks
constant 5.4 x 10 50 operations per second
  • Seth Lloyd, SCIENCE, 31 Aug 2000

13
Big Bang
  • Ultimate Laptop,
  • 1 year
  • 1 second

1000 MIPS, since Big Bang
1000 MIPS, 1 day
14
Defining Efficiency
  • Asymptotic Complexity - how running time scales
    as function of size of input
  • What is size?
  • Often length (in characters) of input
  • Sometimes value of input (if input is a number)
  • Which inputs?
  • Worst case
  • Advantages / disadvantages ?
  • Best case
  • Why?

15
Average Case Analysis
  • More realistic analysis, first attempt
  • Assume inputs are randomly distributed according
    to some realistic distribution ?
  • Compute expected running time
  • Drawbacks
  • Often hard to define realistic random
    distributions
  • Usually hard to perform math

16
Amortized Analysis
  • Instead of a single input, consider a sequence of
    inputs
  • Choose worst possible sequence
  • Determine average running time on this sequence
  • Advantages
  • Often less pessimistic than simple worst-case
    analysis
  • Guaranteed results - no assumed distribution
  • Usually mathematically easier than average case
    analysis

17
Comparing Runtimes
  • Program A is asymptotically less efficient than
    program B iff
  • the runtime of A dominates the runtime of B, as
    the size of the input goes to infinity
  • Note RunTime can be worst case, best case,
    average case, amortized case

18
Which Function Dominates?
100n2 1000 log n 2n 10 log
n n! 1000n15 3n7 7n
n3 2n2 n0.1 n 100n0.1 5n5 n-152n/100 82l
og n
19
Race I
n3 2n2
100n2 1000
vs.
20
Race II
n0.1
log n
vs.
21
Race III
n 100n0.1
2n 10 log n
vs.
22
Race IV
5n5
n!
vs.
23
Race V
n-152n/100
1000n15
vs.
24
Race VI
82log(n)
3n7 7n
vs.
25
Order of Magnitude Notation (big O)
  • Asymptotic Complexity - how running time scales
    as function of size of input
  • We usually only care about order of magnitude of
    scaling
  • Why?

26
Order of Magnitude Notation (big O)
  • Asymptotic Complexity - how running time scales
    as function of size of input
  • We usually only care about order of magnitude of
    scaling
  • Why?
  • As we saw, some functions overwhelm other
    functions
  • So if running time is a sum of terms, can drop
    dominated terms
  • True constant factors depend on details of
    compiler and hardware
  • Might as well make constant factor 1

27
  • Eliminate low order terms
  • Eliminate constant coefficients

28
Common Names
  • Slowest Growth
  • constant O(1)
  • logarithmic O(log n)
  • linear O(n)
  • log-linear O(n log n)
  • quadratic O(n2)
  • exponential O(cn) (c is a constant gt 1)
  • Fastest Growth
  • superlinear O(nc) (c is a constant gt 1)
  • polynomial O(nc) (c is a constant gt 0)

29
Summary
  • Proofs by induction and contradiction
  • Asymptotic complexity
  • Worst case, best case, average case, and
    amortized asymptotic complexity
  • Dominance of functions
  • Order of magnitude notation
  • Next
  • Part One Complexity, continued
  • Read Chapters 1 and 2

30
Part One Complexity, continued
  • Friday, October 4th, 2002

31
Determining the Complexity of an Algorithm
  • Empirical measurement
  • Formal analysis (i.e. proofs)
  • Question what are likely advantages and
    drawbacks of each approach?

32
Determining the Complexity of an Algorithm
  • Empirical measurement
  • Formal analysis (i.e. proofs)
  • Question what are likely advantages and
    drawbacks of each approach?
  • Empirical
  • pro discover if constant factors are significant
  • con may be running on wrong inputs
  • Formal
  • pro no interference from implementation/hardware
    details
  • con can make mistake in a proof!

In theory, theory is the same as practice, but
not in practice.
33
Measuring Empirical ComplexityLinear vs. Binary
Search
  • Find a item in a sorted array of length N
  • Binary search algorithm

Linear Search Binary Search
Time to find one item
Time to find N items
34
My C Code
  • void bfind(int x, int a, int n)
  • m n / 2
  • if (x am) return
  • if (x lt am)
  • bfind(x, a, m)
  • else
  • bfind(x, am1, n-m-1)

void lfind(int x, int a, int n) for (i0
iltn i) if (ai x)
return
for (i0 iltn i) ai i for (i0 iltn i)
lfind(i,a,n)
or bfind
35
Graphical Analysis
36
Graphical Analysis
37
(No Transcript)
38
(No Transcript)
39
slope ? 2
slope ? 1
40
Property of Log/Log Plots
  • On a linear plot, a linear function is a straight
    line
  • On a log/log plot, any polynomial function is a
    straight line!
  • The slope ?y/? x is the same as the exponent

horizontal axis
vertical axis
slope
41
Why does O(n log n) look like a straight line?
slope ? 1
42
Summary
  • Empirical and formal analyses of runtime scaling
    are both important techniques in algorithm
    development
  • Large data sets may be required to gain an
    accurate empirical picture
  • Log/log plots provide a fast and simple visual
    tool for estimating the exponent of a polynomial
    function

43
Formal Asymptotic Analysis
  • In order to prove complexity results, we must
    make the notion of order of magnitude more
    precise
  • Asymptotic bounds on runtime
  • Upper bound
  • Lower bound

44
Definition of Order Notation
  • Upper bound T(n) O(f(n)) Big-O
  • Exist constants c and n such that
  • T(n) ? c f(n) for all n ? n
  • Lower bound T(n) ?(g(n)) Omega
  • Exist constants c and n such that
  • T(n) ? c g(n) for all n ? n
  • Tight bound T(n) ?(f(n)) Theta
  • When both hold
  • T(n) O(f(n))
  • T(n) ?(f(n))

45
Example Upper Bound
46
Using a Different Pair of Constants
47
Example Lower Bound
48
Conventions of Order Notation
49
Upper/Lower vs. Worst/Best
  • Worst case upper bound is f(n)
  • Guarantee that run time is no more than c f(n)
  • Best case upper bound is f(n)
  • If you are lucky, run time is no more than c f(n)
  • Worst case lower bound is g(n)
  • If you are unlikely, run time is at least c g(n)
  • Best case lower bound is g(n)
  • Guarantee that run time is at least c g(n)

50
Analyzing Code
  • primitive operations
  • consecutive statements
  • function calls
  • conditionals
  • loops
  • recursive functions

51
Conditionals
  • Conditional
  • if C then S1 else S2
  • Suppose you are doing a O( ) analysis?
  • Suppose you are doing a ?( ) analysis?

52
Conditionals
  • Conditional
  • if C then S1 else S2
  • Suppose you are doing a O( ) analysis?
  • Time(C) Max(Time(S1),Time(S2))
  • or Time(C)Time(S1)Time(S2)
  • or
  • Suppose you are doing a ?( ) analysis?
  • Time(C) Min(Time(S1),Time(S2))
  • or Time(C)
  • or

53
Nested Loops
  • for i 1 to n do
  • for j 1 to n do
  • sum sum 1

54
Nested Loops
  • for i 1 to n do
  • for j 1 to n do
  • sum sum 1

55
Nested Dependent Loops
  • for i 1 to n do
  • for j i to n do
  • sum sum 1

56
Nested Dependent Loops
  • for i 1 to n do
  • for j i to n do
  • sum sum 1

57
Summary
  • Formal definition of order of magnitude notation
  • Proving upper and lower asymptotic bounds on a
    function
  • Formal analysis of conditionals and simple loops
  • Next
  • Analyzing complex loops
  • Mathematical series
  • Analyzing recursive functions

58
Part One Complexity,Continued
  • Monday October 7, 2002

59
Todays Material
  • Running time of nested dependent loops
  • Mathematical series
  • Formal analysis of linear search
  • Formal analysis of binary search
  • Solving recursive equations
  • Stretchy arrays and the Stack ADT
  • Amortized analysis

60
Nested Dependent Loops
  • for i 1 to n do
  • for j i to n do
  • sum sum 1

61
Nested Dependent Loops
  • for i 1 to n do
  • for j i to n do
  • sum sum 1

62
Arithmetic Series
  • Note that S(1) 1, S(2) 3, S(3) 6, S(4)
    10,
  • Hypothesis S(N) N(N1)/2
  • Prove by induction
  • Base case for N 1, S(N) 1(2)/2 1 ?
  • Assume true for N k
  • Suppose N k1.
  • S(k1) S(k) (k1) k(k1)/2 (k1)
    (k1)(k/2 1) (k1)(k2)/2. ?

63
Other Important Series
  • Sum of squares
  • Sum of exponents
  • Geometric series
  • Novel series
  • Reduce to known series, or prove inductively

64
Nested Dependent Loops
  • for i 1 to n do
  • for j i to n do
  • sum sum 1

65
Linear Search Analysis
void lfind(int x, int a, int n) for (i0
iltn i) if (ai x)
return
  • Best case, tight analysis
  • Worst case, tight analysis

66
Iterated Linear Search Analysis
for (i0 iltn i) ai i for (i0 iltn i)
lfind(i,a,n)
  • Easy worst-case upper-bound
  • Worst-case tight analysis

67
Iterated Linear Search Analysis
for (i0 iltn i) ai i for (i0 iltn i)
lfind(i,a,n)
  • Easy worst-case upper-bound
  • Worst-case tight analysis
  • Just multiplying worst case by n does not justify
    answer, since each time lfind is called i is
    specified

68
Analyzing Recursive Programs
  • Express the running time T(n) as a recursive
    equation
  • Solve the recursive equation
  • For an upper-bound analysis, you can optionally
    simplify the equation to something larger
  • For a lower-bound analysis, you can optionally
    simplify the equation to something smaller

69
Binary Search
void bfind(int x, int a, int n) m n / 2
if (x am) return if (x lt am)
bfind(x, a, m) else bfind(x,
am1, n-m-1)
What is the worst-case upper bound?
70
Binary Search
void bfind(int x, int a, int n) m n / 2
if (x am) return if (x lt am)
bfind(x, a, m) else bfind(x,
am1, n-m-1)
What is the worst-case upper bound? Trick
question ?
71
Binary Search
void bfind(int x, int a, int n) m n / 2
if (n lt 1) return if (x am)
return if (x lt am) bfind(x, a,
m) else bfind(x, am1, n-m-1)
Okay, lets prove it is ?(log n)
72
Binary Search
void bfind(int x, int a, int n) m n / 2
if (n lt 1) return if (x am)
return if (x lt am) bfind(x, a,
m) else bfind(x, am1, n-m-1)
  • Introduce some constants
  • b time needed for base case
  • c time needed to get ready to do a recursive
    call
  • Running time is thus

73
Binary Search Analysis
  • One sub-problem, half as large
  • Equation T(1) ? b
  • T(n) ? T(n/2) c for ngt1
  • Solution

T(n) ? T(n/2) c write equation ? T(n/4) c
c expand ? T(n/8) c c c ? T(n/2k)
kc inductive leap ? T(1) c log n where k
log n select value for k ? b c log n
O(log n) simplify
74
Solving Recursive Equations by Repeated
Substitution
  • Somewhat informal, but intuitively clear and
    straightforward

75
Solving Recursive Equations by Telescoping
  • Create a set of equations, take their sum

76
Solving Recursive Equations by Induction
  • Repeated substitution and telescoping construct
    the solution
  • If you know the closed form solution, you can
    validate it by ordinary induction
  • For the induction, may want to increase n by a
    multiple (2n) rather than by n1

77
Inductive Proof
78
Example Sum of Integer Queue
  • sum_queue(Q)
  • if (Q.length() 0 ) return 0
  • else return Q.dequeue()
  • sum_queue(Q)
  • One subproblem
  • Linear reduction in size (decrease by 1)
  • Equation T(0) b
  • T(n) c T(n 1) for ngt0

79
Lower Bound Analysis Recursive Fibonacci
  • int Fib(n)
  • if (n 0 or n 1) return 1
  • else return Fib(n - 1) Fib(n - 2)
  • Lower bound analysis ?(n)
  • Instead of , equations will use ?
  • T(n) ? Some expression
  • Will simplify math by throwing out terms on the
    right-hand side

80
Analysis by Repeated Subsitution

81
Learning from Analysis
  • To avoid recursive calls
  • store all basis values in a table
  • each time you calculate an answer, store it in
    the table
  • before performing any calculation for a value n
  • check if a valid answer for n is in the table
  • if so, return it
  • Memoization
  • a form of dynamic programming
  • How much time does memoized version take?

82
Amortized Analysis
  • Consider any sequence of operations applied to a
    data structure
  • Some operations may be fast, others slow
  • Goal show that the average time per operation is
    still good

83
Stack Abstract Data Type
  • Stack operations
  • push
  • pop
  • is_empty
  • Stack property if x is on the stack before y is
    pushed, then x will be popped after y is popped
  • What is biggest problem with an array
    implementation?

84
Stretchy Stack Implementation
  • int data
  • int maxsize
  • int top
  • Push(e)
  • if (top maxsize)
  • temp new int2maxsize
  • for (i0iltmaxsizei) tempidatai
  • data temp
  • maxsize 2maxsize
  • else datatop e

Best case Push O( ) Worst case Push O( )
85
Stretchy Stack Amortized Analysis
  • Consider sequence of n operations
  • push(3) push(19) push(2)
  • What is the max number of stretches?
  • What is the total time?
  • lets say a regular push takes time a, and
    stretching an array contain k elements takes time
    bk.
  • Amortized time

86
Stretchy Stack Amortized Analysis
  • Consider sequence of n operations
  • push(3) push(19) push(2)
  • What is the max number of stretches?
  • What is the total time?
  • lets say a regular push takes time a, and
    stretching an array contain k elements takes time
    bk.
  • Amortized time

log n
87
Geometric Series
88
Stretchy Stack Amortized Analysis
  • Consider sequence of n operations
  • push(3) push(19) push(2)
  • What is the max number of stretches?
  • What is the total time?
  • lets say a regular push takes time a, and
    stretching an array contain k elements takes time
    bk.
  • Amortized time

log n
89
Surprise
  • In an asymptotic sense, there is no overhead in
    using stretchy arrays rather than regular arrays!
Write a Comment
User Comments (0)
About PowerShow.com