Title: CSE 326: Data Structures Introduction
1CSE 326 Data Structures Introduction Part
One Complexity
- Henry Kautz
- Autumn Quarter 2002
2Overview of the Quarter
- Part One Complexity
- inductive proofs of program correctness
- empirical and asymptotic complexity
- order of magnitude notation logs series
- analyzing recursive programs
- Part Two List-like data structures
- Part Three Sorting
- Part Four Search Trees
- Part Five Hash Tables
- Part Six Heaps and Union/Find
- Part Seven Graph Algorithms
- Part Eight Advanced Topics
3Material for Part One
- Weiss Chapters 1 and 2
- Additional material
- Graphical analysis
- Amortized analysis
- Stretchy arrays
- Any questions on course organization?
4Program Analysis
- Correctness
- Testing
- Proofs of correctness
- Efficiency
- How to define?
- Asymptotic complexity - how running times scales
as function of size of input
5Proving Programs Correct
- Often takes the form of an inductive proof
- Example summing an array
int sum(int v, int n) if (n0) return
0 else return vn-1sum(v,n-1)
What are the parts of an inductive proof?
6Inductive Proof of Correctness
- int sum(int v, int n)
-
- if (n0) return 0
- else return vn-1sum(v,n-1)
Theorem sum(v,n) correctly returns sum of 1st n
elements of array v for any n. Basis Step
Program is correct for n0 returns 0.
? Inductive Hypothesis (nk) Assume sum(v,k)
returns sum of first k elements of v. Inductive
Step (nk1) sum(v,k1) returns vksum(v,k),
which is the same of the first k1 elements of v.
?
7Proof by Contradiction
- Assume negation of goal, show this leads to a
contradiction - Example there is no program that solves the
halting problem - Determines if any other program runs forever or
not
Alan Turing, 1937
8Program NonConformist (Program P) If ( HALT(P)
never halts ) Then Halt Else Do While (1
gt 0) Print Hello! End While End If End
Program
- Does NonConformist(NonConformist) halt?
- Yes? That means HALT(NonConformist) never
halts - No? That means HALT(NonConformist) halts
Contradiction!
9Defining Efficiency
- Asymptotic Complexity - how running time scales
as function of size of input - Why is this a reasonable definition?
10Defining Efficiency
- Asymptotic Complexity - how running time scales
as function of size of input - Why is this a reasonable definition?
- Many kinds of small problems can be solved in
practice by almost any approach - E.g., exhaustive enumeration of possible
solutions - Want to focus efficiency concerns on larger
problems - Definition is independent of any possible
advances in computer technology
11Technology-Depended Efficiency
- Drum Computers Popular technology from early
1960s - Transistors too costly to use for RAM, so memory
was kept on a revolving magnetic drum - An efficient program scattered instructions on
the drum so that next instruction to execute was
under read head just when it was needed - Minimized number of full revolutionsof drum
during execution
12The Apocalyptic Laptop
Speed ? Energy Consumption E m c 2 25
million megawatt-hours Quantum mechanicsSwitchin
g speed ? h / (2 Energy) h is Plancks
constant 5.4 x 10 50 operations per second
- Seth Lloyd, SCIENCE, 31 Aug 2000
13Big Bang
- Ultimate Laptop,
- 1 year
- 1 second
1000 MIPS, since Big Bang
1000 MIPS, 1 day
14Defining Efficiency
- Asymptotic Complexity - how running time scales
as function of size of input - What is size?
- Often length (in characters) of input
- Sometimes value of input (if input is a number)
- Which inputs?
- Worst case
- Advantages / disadvantages ?
- Best case
- Why?
15Average Case Analysis
- More realistic analysis, first attempt
- Assume inputs are randomly distributed according
to some realistic distribution ? - Compute expected running time
- Drawbacks
- Often hard to define realistic random
distributions - Usually hard to perform math
16Amortized Analysis
- Instead of a single input, consider a sequence of
inputs - Choose worst possible sequence
- Determine average running time on this sequence
- Advantages
- Often less pessimistic than simple worst-case
analysis - Guaranteed results - no assumed distribution
- Usually mathematically easier than average case
analysis
17Comparing Runtimes
- Program A is asymptotically less efficient than
program B iff - the runtime of A dominates the runtime of B, as
the size of the input goes to infinity - Note RunTime can be worst case, best case,
average case, amortized case
18Which Function Dominates?
100n2 1000 log n 2n 10 log
n n! 1000n15 3n7 7n
n3 2n2 n0.1 n 100n0.1 5n5 n-152n/100 82l
og n
19Race I
n3 2n2
100n2 1000
vs.
20Race II
n0.1
log n
vs.
21Race III
n 100n0.1
2n 10 log n
vs.
22Race IV
5n5
n!
vs.
23Race V
n-152n/100
1000n15
vs.
24Race VI
82log(n)
3n7 7n
vs.
25Order of Magnitude Notation (big O)
- Asymptotic Complexity - how running time scales
as function of size of input - We usually only care about order of magnitude of
scaling - Why?
26Order of Magnitude Notation (big O)
- Asymptotic Complexity - how running time scales
as function of size of input - We usually only care about order of magnitude of
scaling - Why?
- As we saw, some functions overwhelm other
functions - So if running time is a sum of terms, can drop
dominated terms - True constant factors depend on details of
compiler and hardware - Might as well make constant factor 1
27- Eliminate low order terms
- Eliminate constant coefficients
28Common Names
- Slowest Growth
- constant O(1)
- logarithmic O(log n)
- linear O(n)
- log-linear O(n log n)
- quadratic O(n2)
- exponential O(cn) (c is a constant gt 1)
- Fastest Growth
- superlinear O(nc) (c is a constant gt 1)
- polynomial O(nc) (c is a constant gt 0)
29Summary
- Proofs by induction and contradiction
- Asymptotic complexity
- Worst case, best case, average case, and
amortized asymptotic complexity - Dominance of functions
- Order of magnitude notation
- Next
- Part One Complexity, continued
- Read Chapters 1 and 2
30Part One Complexity, continued
- Friday, October 4th, 2002
31Determining the Complexity of an Algorithm
- Empirical measurement
- Formal analysis (i.e. proofs)
- Question what are likely advantages and
drawbacks of each approach?
32Determining the Complexity of an Algorithm
- Empirical measurement
- Formal analysis (i.e. proofs)
- Question what are likely advantages and
drawbacks of each approach? - Empirical
- pro discover if constant factors are significant
- con may be running on wrong inputs
- Formal
- pro no interference from implementation/hardware
details - con can make mistake in a proof!
In theory, theory is the same as practice, but
not in practice.
33Measuring Empirical ComplexityLinear vs. Binary
Search
- Find a item in a sorted array of length N
- Binary search algorithm
Linear Search Binary Search
Time to find one item
Time to find N items
34My C Code
- void bfind(int x, int a, int n)
- m n / 2
- if (x am) return
- if (x lt am)
- bfind(x, a, m)
- else
- bfind(x, am1, n-m-1)
void lfind(int x, int a, int n) for (i0
iltn i) if (ai x)
return
for (i0 iltn i) ai i for (i0 iltn i)
lfind(i,a,n)
or bfind
35Graphical Analysis
36Graphical Analysis
37(No Transcript)
38(No Transcript)
39slope ? 2
slope ? 1
40Property of Log/Log Plots
- On a linear plot, a linear function is a straight
line - On a log/log plot, any polynomial function is a
straight line! - The slope ?y/? x is the same as the exponent
horizontal axis
vertical axis
slope
41Why does O(n log n) look like a straight line?
slope ? 1
42Summary
- Empirical and formal analyses of runtime scaling
are both important techniques in algorithm
development - Large data sets may be required to gain an
accurate empirical picture - Log/log plots provide a fast and simple visual
tool for estimating the exponent of a polynomial
function
43Formal Asymptotic Analysis
- In order to prove complexity results, we must
make the notion of order of magnitude more
precise - Asymptotic bounds on runtime
- Upper bound
- Lower bound
44Definition of Order Notation
- Upper bound T(n) O(f(n)) Big-O
- Exist constants c and n such that
- T(n) ? c f(n) for all n ? n
- Lower bound T(n) ?(g(n)) Omega
- Exist constants c and n such that
- T(n) ? c g(n) for all n ? n
- Tight bound T(n) ?(f(n)) Theta
- When both hold
- T(n) O(f(n))
- T(n) ?(f(n))
45Example Upper Bound
46Using a Different Pair of Constants
47Example Lower Bound
48Conventions of Order Notation
49Upper/Lower vs. Worst/Best
- Worst case upper bound is f(n)
- Guarantee that run time is no more than c f(n)
- Best case upper bound is f(n)
- If you are lucky, run time is no more than c f(n)
- Worst case lower bound is g(n)
- If you are unlikely, run time is at least c g(n)
- Best case lower bound is g(n)
- Guarantee that run time is at least c g(n)
50Analyzing Code
- primitive operations
- consecutive statements
- function calls
- conditionals
- loops
- recursive functions
51Conditionals
- Conditional
- if C then S1 else S2
- Suppose you are doing a O( ) analysis?
- Suppose you are doing a ?( ) analysis?
52Conditionals
- Conditional
- if C then S1 else S2
- Suppose you are doing a O( ) analysis?
- Time(C) Max(Time(S1),Time(S2))
- or Time(C)Time(S1)Time(S2)
- or
- Suppose you are doing a ?( ) analysis?
- Time(C) Min(Time(S1),Time(S2))
- or Time(C)
- or
53Nested Loops
- for i 1 to n do
- for j 1 to n do
- sum sum 1
54Nested Loops
- for i 1 to n do
- for j 1 to n do
- sum sum 1
55Nested Dependent Loops
- for i 1 to n do
- for j i to n do
- sum sum 1
56Nested Dependent Loops
- for i 1 to n do
- for j i to n do
- sum sum 1
57Summary
- Formal definition of order of magnitude notation
- Proving upper and lower asymptotic bounds on a
function - Formal analysis of conditionals and simple loops
- Next
- Analyzing complex loops
- Mathematical series
- Analyzing recursive functions
58Part One Complexity,Continued
59Todays Material
- Running time of nested dependent loops
- Mathematical series
- Formal analysis of linear search
- Formal analysis of binary search
- Solving recursive equations
- Stretchy arrays and the Stack ADT
- Amortized analysis
60Nested Dependent Loops
- for i 1 to n do
- for j i to n do
- sum sum 1
61Nested Dependent Loops
- for i 1 to n do
- for j i to n do
- sum sum 1
62Arithmetic Series
-
- Note that S(1) 1, S(2) 3, S(3) 6, S(4)
10, - Hypothesis S(N) N(N1)/2
- Prove by induction
- Base case for N 1, S(N) 1(2)/2 1 ?
- Assume true for N k
- Suppose N k1.
- S(k1) S(k) (k1) k(k1)/2 (k1)
(k1)(k/2 1) (k1)(k2)/2. ?
63Other Important Series
- Sum of squares
- Sum of exponents
- Geometric series
- Novel series
- Reduce to known series, or prove inductively
64Nested Dependent Loops
- for i 1 to n do
- for j i to n do
- sum sum 1
65Linear Search Analysis
void lfind(int x, int a, int n) for (i0
iltn i) if (ai x)
return
- Best case, tight analysis
- Worst case, tight analysis
66Iterated Linear Search Analysis
for (i0 iltn i) ai i for (i0 iltn i)
lfind(i,a,n)
- Easy worst-case upper-bound
- Worst-case tight analysis
67Iterated Linear Search Analysis
for (i0 iltn i) ai i for (i0 iltn i)
lfind(i,a,n)
- Easy worst-case upper-bound
- Worst-case tight analysis
- Just multiplying worst case by n does not justify
answer, since each time lfind is called i is
specified
68Analyzing Recursive Programs
- Express the running time T(n) as a recursive
equation - Solve the recursive equation
- For an upper-bound analysis, you can optionally
simplify the equation to something larger - For a lower-bound analysis, you can optionally
simplify the equation to something smaller
69Binary Search
void bfind(int x, int a, int n) m n / 2
if (x am) return if (x lt am)
bfind(x, a, m) else bfind(x,
am1, n-m-1)
What is the worst-case upper bound?
70Binary Search
void bfind(int x, int a, int n) m n / 2
if (x am) return if (x lt am)
bfind(x, a, m) else bfind(x,
am1, n-m-1)
What is the worst-case upper bound? Trick
question ?
71Binary Search
void bfind(int x, int a, int n) m n / 2
if (n lt 1) return if (x am)
return if (x lt am) bfind(x, a,
m) else bfind(x, am1, n-m-1)
Okay, lets prove it is ?(log n)
72Binary Search
void bfind(int x, int a, int n) m n / 2
if (n lt 1) return if (x am)
return if (x lt am) bfind(x, a,
m) else bfind(x, am1, n-m-1)
- Introduce some constants
- b time needed for base case
- c time needed to get ready to do a recursive
call - Running time is thus
73Binary Search Analysis
- One sub-problem, half as large
- Equation T(1) ? b
- T(n) ? T(n/2) c for ngt1
- Solution
T(n) ? T(n/2) c write equation ? T(n/4) c
c expand ? T(n/8) c c c ? T(n/2k)
kc inductive leap ? T(1) c log n where k
log n select value for k ? b c log n
O(log n) simplify
74Solving Recursive Equations by Repeated
Substitution
- Somewhat informal, but intuitively clear and
straightforward
75Solving Recursive Equations by Telescoping
- Create a set of equations, take their sum
76Solving Recursive Equations by Induction
- Repeated substitution and telescoping construct
the solution - If you know the closed form solution, you can
validate it by ordinary induction - For the induction, may want to increase n by a
multiple (2n) rather than by n1
77Inductive Proof
78Example Sum of Integer Queue
- sum_queue(Q)
- if (Q.length() 0 ) return 0
- else return Q.dequeue()
- sum_queue(Q)
- One subproblem
- Linear reduction in size (decrease by 1)
- Equation T(0) b
- T(n) c T(n 1) for ngt0
79Lower Bound Analysis Recursive Fibonacci
- int Fib(n)
- if (n 0 or n 1) return 1
- else return Fib(n - 1) Fib(n - 2)
- Lower bound analysis ?(n)
- Instead of , equations will use ?
- T(n) ? Some expression
- Will simplify math by throwing out terms on the
right-hand side -
80Analysis by Repeated Subsitution
81Learning from Analysis
- To avoid recursive calls
- store all basis values in a table
- each time you calculate an answer, store it in
the table - before performing any calculation for a value n
- check if a valid answer for n is in the table
- if so, return it
- Memoization
- a form of dynamic programming
- How much time does memoized version take?
82Amortized Analysis
- Consider any sequence of operations applied to a
data structure - Some operations may be fast, others slow
- Goal show that the average time per operation is
still good
83Stack Abstract Data Type
- Stack operations
- push
- pop
- is_empty
- Stack property if x is on the stack before y is
pushed, then x will be popped after y is popped - What is biggest problem with an array
implementation?
84Stretchy Stack Implementation
- int data
- int maxsize
- int top
- Push(e)
- if (top maxsize)
- temp new int2maxsize
- for (i0iltmaxsizei) tempidatai
- data temp
- maxsize 2maxsize
- else datatop e
-
Best case Push O( ) Worst case Push O( )
85Stretchy Stack Amortized Analysis
- Consider sequence of n operations
- push(3) push(19) push(2)
- What is the max number of stretches?
- What is the total time?
- lets say a regular push takes time a, and
stretching an array contain k elements takes time
bk. - Amortized time
86Stretchy Stack Amortized Analysis
- Consider sequence of n operations
- push(3) push(19) push(2)
- What is the max number of stretches?
- What is the total time?
- lets say a regular push takes time a, and
stretching an array contain k elements takes time
bk. - Amortized time
log n
87Geometric Series
88Stretchy Stack Amortized Analysis
- Consider sequence of n operations
- push(3) push(19) push(2)
- What is the max number of stretches?
- What is the total time?
- lets say a regular push takes time a, and
stretching an array contain k elements takes time
bk. - Amortized time
log n
89Surprise
- In an asymptotic sense, there is no overhead in
using stretchy arrays rather than regular arrays!