CSE 326: Data Structures Introduction presentation

About This Presentation

Transcript and Presenter's Notes

Title: CSE 326: Data Structures Introduction

1
CSE 326 Data Structures Introduction Part
One Complexity

Henry Kautz
Autumn Quarter 2002

2
Overview of the Quarter

Part One Complexity
inductive proofs of program correctness
empirical and asymptotic complexity
order of magnitude notation logs series
analyzing recursive programs
Part Two List-like data structures
Part Three Sorting
Part Four Search Trees
Part Five Hash Tables
Part Six Heaps and Union/Find
Part Seven Graph Algorithms
Part Eight Advanced Topics

3
Material for Part One

Weiss Chapters 1 and 2
Additional material
Graphical analysis
Amortized analysis
Stretchy arrays
Any questions on course organization?

4
Program Analysis

Correctness
Testing
Proofs of correctness
Efficiency
How to define?
Asymptotic complexity - how running times scales
as function of size of input

5
Proving Programs Correct

Often takes the form of an inductive proof
Example summing an array

int sum(int v, int n) if (n0) return
0 else return vn-1sum(v,n-1)
What are the parts of an inductive proof?
6
Inductive Proof of Correctness

int sum(int v, int n)
if (n0) return 0
else return vn-1sum(v,n-1)

Theorem sum(v,n) correctly returns sum of 1st n
elements of array v for any n. Basis Step
Program is correct for n0 returns 0.
? Inductive Hypothesis (nk) Assume sum(v,k)
returns sum of first k elements of v. Inductive
Step (nk1) sum(v,k1) returns vksum(v,k),
which is the same of the first k1 elements of v.
?
7
Proof by Contradiction

Assume negation of goal, show this leads to a
contradiction
Example there is no program that solves the
halting problem
Determines if any other program runs forever or
not

Alan Turing, 1937
8
Program NonConformist (Program P) If ( HALT(P)
never halts ) Then Halt Else Do While (1
gt 0) Print Hello! End While End If End
Program

Does NonConformist(NonConformist) halt?
Yes? That means HALT(NonConformist) never
halts
No? That means HALT(NonConformist) halts

Contradiction!
9
Defining Efficiency

Asymptotic Complexity - how running time scales
as function of size of input
Why is this a reasonable definition?

10
Defining Efficiency

Asymptotic Complexity - how running time scales
as function of size of input
Why is this a reasonable definition?
Many kinds of small problems can be solved in
practice by almost any approach
E.g., exhaustive enumeration of possible
solutions
Want to focus efficiency concerns on larger
problems
Definition is independent of any possible
advances in computer technology

11
Technology-Depended Efficiency

Drum Computers Popular technology from early
1960s
Transistors too costly to use for RAM, so memory
was kept on a revolving magnetic drum
An efficient program scattered instructions on
the drum so that next instruction to execute was
under read head just when it was needed
Minimized number of full revolutionsof drum
during execution

12
The Apocalyptic Laptop
Speed ? Energy Consumption E m c 2 25
million megawatt-hours Quantum mechanicsSwitchin
g speed ? h / (2 Energy) h is Plancks
constant 5.4 x 10 50 operations per second

Seth Lloyd, SCIENCE, 31 Aug 2000

13
Big Bang

Ultimate Laptop,
1 year
1 second

1000 MIPS, since Big Bang
1000 MIPS, 1 day
14
Defining Efficiency

Asymptotic Complexity - how running time scales
as function of size of input
What is size?
Often length (in characters) of input
Sometimes value of input (if input is a number)
Which inputs?
Worst case
Advantages / disadvantages ?
Best case
Why?

15
Average Case Analysis

More realistic analysis, first attempt
Assume inputs are randomly distributed according
to some realistic distribution ?
Compute expected running time
Drawbacks
Often hard to define realistic random
distributions
Usually hard to perform math

16
Amortized Analysis

Instead of a single input, consider a sequence of
inputs
Choose worst possible sequence
Determine average running time on this sequence
Advantages
Often less pessimistic than simple worst-case
analysis
Guaranteed results - no assumed distribution
Usually mathematically easier than average case
analysis

17
Comparing Runtimes

Program A is asymptotically less efficient than
program B iff
the runtime of A dominates the runtime of B, as
the size of the input goes to infinity
Note RunTime can be worst case, best case,
average case, amortized case

18
Which Function Dominates?
100n2 1000 log n 2n 10 log
n n! 1000n15 3n7 7n
n3 2n2 n0.1 n 100n0.1 5n5 n-152n/100 82l
og n
19
Race I
n3 2n2
100n2 1000
vs.
20
Race II
n0.1
log n
vs.
21
Race III
n 100n0.1
2n 10 log n
vs.
22
Race IV
5n5
n!
vs.
23
Race V
n-152n/100
1000n15
vs.
24
Race VI
82log(n)
3n7 7n
vs.
25
Order of Magnitude Notation (big O)

Asymptotic Complexity - how running time scales
as function of size of input
We usually only care about order of magnitude of
scaling
Why?

26
Order of Magnitude Notation (big O)

Asymptotic Complexity - how running time scales
as function of size of input
We usually only care about order of magnitude of
scaling
Why?
As we saw, some functions overwhelm other
functions
So if running time is a sum of terms, can drop
dominated terms
True constant factors depend on details of
compiler and hardware
Might as well make constant factor 1

Eliminate low order terms
Eliminate constant coefficients

28
Common Names

Slowest Growth
constant O(1)
logarithmic O(log n)
linear O(n)
log-linear O(n log n)
quadratic O(n2)
exponential O(cn) (c is a constant gt 1)
Fastest Growth
superlinear O(nc) (c is a constant gt 1)
polynomial O(nc) (c is a constant gt 0)

29
Summary

Proofs by induction and contradiction
Asymptotic complexity
Worst case, best case, average case, and
amortized asymptotic complexity
Dominance of functions
Order of magnitude notation
Next
Part One Complexity, continued
Read Chapters 1 and 2

30
Part One Complexity, continued

Friday, October 4th, 2002

31
Determining the Complexity of an Algorithm

Empirical measurement
Formal analysis (i.e. proofs)
Question what are likely advantages and
drawbacks of each approach?

32
Determining the Complexity of an Algorithm

Empirical measurement
Formal analysis (i.e. proofs)
Question what are likely advantages and
drawbacks of each approach?
Empirical
pro discover if constant factors are significant
con may be running on wrong inputs
Formal
pro no interference from implementation/hardware
details
con can make mistake in a proof!

In theory, theory is the same as practice, but
not in practice.
33
Measuring Empirical ComplexityLinear vs. Binary
Search

Find a item in a sorted array of length N
Binary search algorithm

Linear Search Binary Search
Time to find one item
Time to find N items
34
My C Code

void bfind(int x, int a, int n)
m n / 2
if (x am) return
if (x lt am)
bfind(x, a, m)
else
bfind(x, am1, n-m-1)

void lfind(int x, int a, int n) for (i0
iltn i) if (ai x)
return
for (i0 iltn i) ai i for (i0 iltn i)
lfind(i,a,n)
or bfind
35
Graphical Analysis
36
Graphical Analysis
37
(No Transcript)
38
(No Transcript)
39
slope ? 2
slope ? 1
40
Property of Log/Log Plots

On a linear plot, a linear function is a straight
line
On a log/log plot, any polynomial function is a
straight line!
The slope ?y/? x is the same as the exponent

horizontal axis
vertical axis
slope
41
Why does O(n log n) look like a straight line?
slope ? 1
42
Summary

Empirical and formal analyses of runtime scaling
are both important techniques in algorithm
development
Large data sets may be required to gain an
accurate empirical picture
Log/log plots provide a fast and simple visual
tool for estimating the exponent of a polynomial
function

43
Formal Asymptotic Analysis

In order to prove complexity results, we must
make the notion of order of magnitude more
precise
Asymptotic bounds on runtime
Upper bound
Lower bound

44
Definition of Order Notation

Upper bound T(n) O(f(n)) Big-O
Exist constants c and n such that
T(n) ? c f(n) for all n ? n
Lower bound T(n) ?(g(n)) Omega
Exist constants c and n such that
T(n) ? c g(n) for all n ? n
Tight bound T(n) ?(f(n)) Theta
When both hold
T(n) O(f(n))
T(n) ?(f(n))

45
Example Upper Bound
46
Using a Different Pair of Constants
47
Example Lower Bound
48
Conventions of Order Notation
49
Upper/Lower vs. Worst/Best

Worst case upper bound is f(n)
Guarantee that run time is no more than c f(n)
Best case upper bound is f(n)
If you are lucky, run time is no more than c f(n)
Worst case lower bound is g(n)
If you are unlikely, run time is at least c g(n)
Best case lower bound is g(n)
Guarantee that run time is at least c g(n)

50
Analyzing Code

primitive operations
consecutive statements
function calls
conditionals
loops
recursive functions

51
Conditionals

Conditional
if C then S1 else S2
Suppose you are doing a O( ) analysis?
Suppose you are doing a ?( ) analysis?

52
Conditionals

Conditional
if C then S1 else S2
Suppose you are doing a O( ) analysis?
Time(C) Max(Time(S1),Time(S2))
or Time(C)Time(S1)Time(S2)
or
Suppose you are doing a ?( ) analysis?
Time(C) Min(Time(S1),Time(S2))
or Time(C)
or

53
Nested Loops

for i 1 to n do
for j 1 to n do
sum sum 1

54
Nested Loops

for i 1 to n do
for j 1 to n do
sum sum 1

55
Nested Dependent Loops

for i 1 to n do
for j i to n do
sum sum 1

56
Nested Dependent Loops

for i 1 to n do
for j i to n do
sum sum 1

57
Summary

Formal definition of order of magnitude notation
Proving upper and lower asymptotic bounds on a
function
Formal analysis of conditionals and simple loops
Next
Analyzing complex loops
Mathematical series
Analyzing recursive functions

58
Part One Complexity,Continued

Monday October 7, 2002

59
Todays Material

Running time of nested dependent loops
Mathematical series
Formal analysis of linear search
Formal analysis of binary search
Solving recursive equations
Stretchy arrays and the Stack ADT
Amortized analysis

60
Nested Dependent Loops

for i 1 to n do
for j i to n do
sum sum 1

61
Nested Dependent Loops

for i 1 to n do
for j i to n do
sum sum 1

62
Arithmetic Series

Note that S(1) 1, S(2) 3, S(3) 6, S(4)
10,
Hypothesis S(N) N(N1)/2
Prove by induction
Base case for N 1, S(N) 1(2)/2 1 ?
Assume true for N k
Suppose N k1.
S(k1) S(k) (k1) k(k1)/2 (k1)
(k1)(k/2 1) (k1)(k2)/2. ?

63
Other Important Series

Sum of squares
Sum of exponents
Geometric series
Novel series
Reduce to known series, or prove inductively

64
Nested Dependent Loops

for i 1 to n do
for j i to n do
sum sum 1

65
Linear Search Analysis
void lfind(int x, int a, int n) for (i0
iltn i) if (ai x)
return

Best case, tight analysis
Worst case, tight analysis

66
Iterated Linear Search Analysis
for (i0 iltn i) ai i for (i0 iltn i)
lfind(i,a,n)

Easy worst-case upper-bound
Worst-case tight analysis

67
Iterated Linear Search Analysis
for (i0 iltn i) ai i for (i0 iltn i)
lfind(i,a,n)

Easy worst-case upper-bound
Worst-case tight analysis
Just multiplying worst case by n does not justify
answer, since each time lfind is called i is
specified

68
Analyzing Recursive Programs

Express the running time T(n) as a recursive
equation
Solve the recursive equation
For an upper-bound analysis, you can optionally
simplify the equation to something larger
For a lower-bound analysis, you can optionally
simplify the equation to something smaller

69
Binary Search
void bfind(int x, int a, int n) m n / 2
if (x am) return if (x lt am)
bfind(x, a, m) else bfind(x,
am1, n-m-1)
What is the worst-case upper bound?
70
Binary Search
void bfind(int x, int a, int n) m n / 2
if (x am) return if (x lt am)
bfind(x, a, m) else bfind(x,
am1, n-m-1)
What is the worst-case upper bound? Trick
question ?
71
Binary Search
void bfind(int x, int a, int n) m n / 2
if (n lt 1) return if (x am)
return if (x lt am) bfind(x, a,
m) else bfind(x, am1, n-m-1)
Okay, lets prove it is ?(log n)
72
Binary Search
void bfind(int x, int a, int n) m n / 2
if (n lt 1) return if (x am)
return if (x lt am) bfind(x, a,
m) else bfind(x, am1, n-m-1)

Introduce some constants
b time needed for base case
c time needed to get ready to do a recursive
call
Running time is thus

73
Binary Search Analysis

One sub-problem, half as large
Equation T(1) ? b
T(n) ? T(n/2) c for ngt1
Solution

T(n) ? T(n/2) c write equation ? T(n/4) c
c expand ? T(n/8) c c c ? T(n/2k)
kc inductive leap ? T(1) c log n where k
log n select value for k ? b c log n
O(log n) simplify
74
Solving Recursive Equations by Repeated
Substitution

Somewhat informal, but intuitively clear and
straightforward

75
Solving Recursive Equations by Telescoping

Create a set of equations, take their sum

76
Solving Recursive Equations by Induction

Repeated substitution and telescoping construct
the solution
If you know the closed form solution, you can
validate it by ordinary induction
For the induction, may want to increase n by a
multiple (2n) rather than by n1

77
Inductive Proof
78
Example Sum of Integer Queue

sum_queue(Q)
if (Q.length() 0 ) return 0
else return Q.dequeue()
sum_queue(Q)
One subproblem
Linear reduction in size (decrease by 1)
Equation T(0) b
T(n) c T(n 1) for ngt0

79
Lower Bound Analysis Recursive Fibonacci

int Fib(n)
if (n 0 or n 1) return 1
else return Fib(n - 1) Fib(n - 2)
Lower bound analysis ?(n)
Instead of , equations will use ?
T(n) ? Some expression
Will simplify math by throwing out terms on the
right-hand side

80
Analysis by Repeated Subsitution

81
Learning from Analysis

To avoid recursive calls
store all basis values in a table
each time you calculate an answer, store it in
the table
before performing any calculation for a value n
check if a valid answer for n is in the table
if so, return it
Memoization
a form of dynamic programming
How much time does memoized version take?

82
Amortized Analysis

Consider any sequence of operations applied to a
data structure
Some operations may be fast, others slow
Goal show that the average time per operation is
still good

83
Stack Abstract Data Type

Stack operations
push
pop
is_empty
Stack property if x is on the stack before y is
pushed, then x will be popped after y is popped
What is biggest problem with an array
implementation?

84
Stretchy Stack Implementation

int data
int maxsize
int top
Push(e)
if (top maxsize)
temp new int2maxsize
for (i0iltmaxsizei) tempidatai
data temp
maxsize 2maxsize
else datatop e

Best case Push O( ) Worst case Push O( )
85
Stretchy Stack Amortized Analysis

Consider sequence of n operations
push(3) push(19) push(2)
What is the max number of stretches?
What is the total time?
lets say a regular push takes time a, and
stretching an array contain k elements takes time
bk.
Amortized time

86
Stretchy Stack Amortized Analysis

Consider sequence of n operations
push(3) push(19) push(2)
What is the max number of stretches?
What is the total time?
lets say a regular push takes time a, and
stretching an array contain k elements takes time
bk.
Amortized time

log n
87
Geometric Series
88
Stretchy Stack Amortized Analysis

Consider sequence of n operations
push(3) push(19) push(2)
What is the max number of stretches?
What is the total time?
lets say a regular push takes time a, and
stretching an array contain k elements takes time
bk.
Amortized time

log n
89
Surprise

In an asymptotic sense, there is no overhead in
using stretchy arrays rather than regular arrays!

Write a Comment

User Comments (0)

About PowerShow.com

CSE 326: Data Structures Introduction PowerPoint PPT Presentation