Title: CS201: PART 1
1CS201 PART 1
- Data Structures Algorithms
- S. Kondakci
2Analysis of Algorithms
Algorithm
Input
Output
- An algorithm is a step-by-step procedure for
- solving a problem in a finite amount of time.
- Theoretical Analysis of Algorithms
- Uses a high-level description of the algorithm
instead of an implementation - Characterizes running time as a function of the
input size n. - Takes into account all possible inputs Allows us
to evaluate the speed of any design independent
of its implementation.
3Program Efficiency
Program efficiency is a measure of the amount
of resources required to produce desired results.
Efficiency Aspects 1) What are the important
resources we should try to optimize? 2) Where are
the important efficiency gains to be made? 3) How
important is efficiency in the first place?
4Efficiency Today
- User Efficiency. The amount of time and effort
users will spend to learn how o use the program,
how to prepare the data, how to configure and
customize the program, and how to interpret and
use the output. - Maintenance Efficiency. The amount of time and
effort maintenance group will spend reading a
program and its technical documentation in order
to understand it well enough to make any
necessary modifications. - Algorithmic Complexity. The inherent efficiency
of the method itself, regardless of which machine
we run it on or how we code it. - Coding Efficiency. This is the traditional
efficiency measure. Here we are concerned with
how much processor time and memory space a
computer program requires to produce desired
results. Coding efficiency is the key step
towards optimal usage of machine resources.
5Programmers Duty
- Programmers should should keep these in mind
- Correct, robust, and reliable.
- Easy to use for its intended end-user group.
- Easy to understand and easy to modify.
- Portable.
- Consistency in Input/Output behavior.
- User documentation.
6Optimization
- Optimization on CPU-Time Consider a network
security assessment tool as a real-time
application. The application works like a
security scanner protocol designed to audit,
monitor, and correct all aspects of network
security. Real-time processing of the intercepted
network packets containing inspection information
requires faster data processing. Besides, such a
process should generate some auditing
information. - Optimization on Memory Developing programs that
do not fit into the memory space available on
your systems is often quite a bit demanding.
Kernel level processing of the network packets
requires kernel memory optimization and a
powerful and failsafe memory management
capability. - Providing Run-time Continuity Extensive
machine-level optimization is a major requirement
for continuously running programs, such as the
security scanner daemons. - Reliability and Correctness One of the
inevitable efficiency requirements is the
absolute reliability. The second important
efficiency factor is correctness. That is, your
program should do exactly what it is supposed to
do. Choosing and implementing a reliable
inspection methodology should be done with
precision. - Optimization on Programmers Time How efficient
a programmer works depends on the choice of team
policy and developmen tool selection.
7Coding Efficiency Unstructured Code
/Efficient Programming/S. Kondakci-1999
8Coding Efficiency Structured Code
/Efficient Programming/S. Kondakci-1999
9Protecting Against Run-time Errors
- Illegal pointer operations.
- Array subscript out of bound.
- Endless loops may cause stacks grow into the heap
area. - Presentational errors, such as network byte
order, number conversions, division by zero,
undefined results, e.g., tan(90) undefined. - Trying to write over the kernels text area, or
the data area. - Referencing objects declared as prototype but not
defined. - Performing operations on a pointer pointing at
NULL. - Operating system weaknesses.
10Assertions
A general pitfall making assumptions that turn
out not to be justified. Most of the mistakes
arise from simply misunderstanding the
interaction between various pieces of code The
assertion rule states that you should always
express yourself boldly or forcefully of the fact
that there are some other things that you have
not covered clear enough yet. Any assumptions
you make in writing your programs should be
documented somewhere in the code itself,
particularly if you know or expect the assumption
to be false in other environments.
11Does the Machine Understand Your Assumptions?
Remember those assumptions are yours They should
be presented to the machine by any means that you
are supposed to provide in your code. The machine
will not be able to check your assumptions. This
is simply a matter of including explicit checks
in your code, even for things that cannot
happen.
if (p NULL) panic(Driver routine p is
NULL\n) if (p-gtp_flags BUSY) / Safe to
continue / ltetceteragt ASSERT(p !NULL) If
(p-gtp_flags BUSY) / Safe to continue
/ ltetceteragt
12Guidelines for the implementation
- Protect input parameters using call-by-value.
- Avoid global variables and functions with side
effects. - Make all temporary variables local to functions
where they are used. - Never halt or sleep in a function. Spawn a
dedicated function if necessary. - Avoid producing output within a function unless
the sole purpose of the function is output. - Where appropriate use return values to return the
status of function calls. - Avoid confusing programming tricks.
- Always strive for simplicity and clarity. Never
sacrifice clarity of expression for cleverness of
expression. - If any keep your assertions local to your code.
- Never sacrifice clarity of expression for minor
reductions in execution time.
13Debugging and Tracing
Making use of the preprocessor can allow you to
incorporate many debugging aids in your module,
for instance, the driver module. Later, in the
production version these debugging aids can be
removed.
ifdef DEBUG define TRACE_OPEN (debugging
0x01) define TRACE_CLOSE (debugging
0x02) define TRACE_READ (debugging
0x04) define TRACE_WRITE (debugging
0x08) int debugging -1 / enable all traces
output / else define TRACE_OPEN 0 define
TRACE_CLOSE 0 define TRACE_READ 0 define
TRACE_WRITE 0 endif ...
14Tracing Later in the Program
Later, in the code the output of the trace
information can be done by a manner similar to
this
if (TRACE_READ) printf(Device driver read,
Packet number (d) \n,pack_no) ltetceteragt
15Checking Programs With lint (Unix)
The lint utility is intended to verify some
facets of a C program, such as its potential
portability. lint derives from the idea of
picking the fluff out of a C program. It does
this, by advising on C constructs (including
functions) and usage which might turn out to be
bugs, portability problems, inconsistent
declarations, bad function and argument types, or
dead code. See the manual section lint(1) for
further explanations.
16Now, Linting
lint hxa mytest.c (8) warning loop not
entered at top (8) warning constant in
conditional context variable unused in
function (3) z in main implicitly declared
to return int (10) printf declaration
unused in block (5) duble function returns
value, which is always ignored printf
17Test Coverage Analysis
Yet another tool born for execution tracing and
analysis of programs called tcov,it can be used
to trace and analyze a source code to report a
coverage test. tcov does this by analysing the
source code step-by-step. The extra code is
generated by giving the xa option to the
compiler command, i.e., gcc -xa -o src
src.c The xa option invokes a runtime recording
mechanism that creates a .d file for every .c
file. The .d file accumulates execution data for
the corresponding source file. The tcov utility
can then be run on the source file to generate
statistics about the program. The following
example source file, getmygid.c, is analysed
as cc -xa -o getmygid getmygid.c tcov -a
getmygid.c ls l getmy??? -rwxr-xr-x 1 staff
25120 Feb 11 1207 getmygid -rw------- 1
staff 519 Sep 9 1994 getmygid.c -rw-r--r-
- 1 staff 9 Feb 11 1207
getmygid.d -rw-r--r-- 1 staff 1025 Feb 11
1208 getmygid.tcov
18Example getmygid.c
cat getmygid.c include ltstdio.hgt char msg
"I am sorry I cannot tell you everything" int
gid,egid int uid,euid, pid ,ppid, i int
main() gid getgid() if (gid
gt 0) printf("1- My GID is d\n", gid)
egid getegid() if (egid gt0 )
printf("2- My EGID is d\n", egid) uid
getuid() if ( uid gt0) printf("3- My
uid is d\n", uid) euid geteuid()
if (euid gt 0) printf("4- My Euid is d\n",
euid) pid getpid() if ( pid
gt0 ) printf("5- My pid is d\n", pid)
ppid getppid() if ( ppid gt 0)
printf("6- My ppid is d\n", ppid)
prt_msg("We came to end!!!") return 0
prt_msg(msg) prt_msg(char mesg)
printf("s \n", mesg)
19Tcoving getmygid.c
cat getmygid.tcov -gt include
ltstdio.hgt -gt char msg "I am sorry I
cannot tell you everything" -gt
-gt int gid,egid -gt int uid,euid, pid
,ppid, i -gt int main() -gt
2 -gt gid getgid() 2 -gt if
(gid gt 0) printf("1- My GID is d\n", gid)
2 -gt egid getegid() 2 -gt if
(egid gt0 ) printf("2- My EGID is d\n", egid)
2 -gt uid getuid() 2 -gt if
( uid gt0) printf("3- My uid is d\n", uid)
2 -gt euid geteuid() 2 -gt if
(euid gt 0) printf("4- My Euid is d\n", euid)
2 -gt pid getpid() 2 -gt if
( pid gt0 ) printf("5- My pid is d\n", pid)
2 -gt ppid getppid() 2 -gt if
( ppid gt 0) printf("6- My ppid is d\n",
ppid) 2 -gt prt_msg("We came to
end!!!") 2 -gt return 0 2 -gt
prt_msg(msg) 2 -gt 2
-gt prt_msg(mesg) 2 -gt char
mesg 2 -gt 2 -gt
printf("s \n", mesg) 2 -gt
20Tcoving getmygid.c
As shown, tcov(1) generates an annotated listing
of the source file (getmygid.tcov), where each
line is prefixed with a number indicating the
count of execution of each statement on the line.
Finally per line and per block statistics are
shown.
Top 10 Blocks Line
Count 9 2
11 2 13
2 15 2
17 2 19
2 21
2 29 2 8
Basic blocks in this file 8 Basic
blocks executed 100.00 Percent of the
file executed 16 Total basic block
executions 2.00 Average executions per
basic block
21Have nice break!
22Analysis of Algorithms
Algorithm
Input
Output
An algorithm is a step-by-step procedure
for solving a problem in a finite amount of time.
23Running Time
- Most algorithms transform input objects into
output objects. - The running time of an algorithm typically grows
with the input size. - Average case time is often difficult to
determine. - We focus on the worst case running time.
- Easier to analyze
- Crucial to applications such as games, finance
and robotics
24Experimental Studies
- Write a program implementing the algorithm
- Run the program with inputs of varying size and
composition - Use a function, like the built-in clock()
function, to get an accurate measure of the
actual running time - Plot the results
25Limitations of Experiments
- It is necessary to implement the algorithm, which
may be difficult - Results may not be indicative of the running time
on other inputs not included in the experiment. - In order to compare two algorithms, the same
hardware and software environments must be used
26Theoretical Analysis
- Uses a high-level description of the algorithm
instead of an implementation - Characterizes running time as a function of the
input size, n. - Takes into account all possible inputs
- Allows us to evaluate the speed of an algorithm
independent of the hardware/software environment
27Pseudocode
- High-level description of an algorithm
- More structured than English prose
- Less detailed than a program
- Preferred notation for describing algorithms
- Hides program design issues
28Pseudocode Details
- Control flow
- if then else
- while do
- repeat until
- for do
- Indentation replaces braces
- Method declaration
- Algorithm method (arg , arg)
- Input
- Output
- Method/Function call
- method (arg , arg)
- Return value
- return expression
- Expressions
- Assignment(like ? in C)
- Equality testing(like ?? in C)
- n2 Superscripts and other mathematical formatting
allowed
29The Random Access Machine (RAM) Model
- A CPU
- A potentially unbounded bank of memory cells,
each of which can hold an arbitrary number or
character
- Memory cells are numbered and accessing any cell
in memory takes unit time.
30Primitive Operations
- Basic computations performed by an algorithm
- Identifiable in pseudocode
- Largely independent from the programming language
- Exact definition not important
- Assumed to take a constant amount of time in the
RAM model
- Examples
- Evaluating an expression
- Assigning a value to a variable
- Indexing into an array
- Calling a method
- Returning from a method
31Counting Primitive Operations
- By inspecting the pseudocode, we can determine
the maximum number of primitive operations
executed by an algorithm, as a function of the
input size
- Algorithm arrayMax(A, n)
- operations
- currentMax ? A0 2
- for i ? 1 to n ? 1 do 2 n
- if Ai ? currentMax then 2(n ? 1)
- currentMax ? Ai 2(n ? 1)
- increment counter i 2(n ? 1)
- return currentMax 1
- Total 7n ? 1
32Estimating Running Time
- Algorithm arrayMax executes 7n ? 1 primitive
operations in the worst case. Define - a Time taken by the fastest primitive operation
- b Time taken by the slowest primitive
operation - Let T(n) be worst-case time of arrayMax. Then a
(7n ? 1) ? T(n) ? b(7n ? 1) - Hence, the running time T(n) is bounded by two
linear functions
33Growth Rate of Running Time
- Changing the hardware/ software environment
- Affects T(n) by a constant factor, but
- Does not alter the growth rate of T(n)
- The linear growth rate of the running time T(n)
is an intrinsic property of algorithm arrayMax
34Growth Rates
- Growth rates of functions
- Linear ? n
- Quadratic ? n2
- Cubic ? n3
- In a log-log chart, the slope of the line
corresponds to the growth rate of the function
35Constant Factors
- The growth rate is not affected by
- constant factors or
- lower-order terms
- Examples
- 102n 105 is a linear function
- 105n2 108n is a quadratic function
36Big-Oh Notation
- Given functions f(n) and g(n), we say that f(n)
is O(g(n)) if there are positive constantsc and
n0 such that - f(n) ? cg(n) for n ? n0
- Example 2n 10 is O(n)
- 2n 10 ? cn
- (c ? 2) n ? 10
- n ? 10/(c ? 2)
- Pick c 3 and n0 10
37Big-Oh Example
- Example the function n2 is not O(n)
- n2 ? cn
- n ? c
- The above inequality cannot be satisfied since c
must be a constant
38More Big-Oh Examples
- 7n-2 is O(n)
- need c gt 0 and n0 ? 1 such that 7n-2 ? cn for n
? n0 - this is true for c 7 and n0 1
3n3 20n2 5 is O(n3) need c gt 0 and n0 ? 1
such that 3n3 20n2 5 ? cn3 for n ? n0 this
is true for c 4 and n0 21
3 log n log log n is O(log n) need c gt 0 and n0
? 1 such that 3 log n log log n ? clog n for n
? n0 this is true for c 4 and n0 2
39Big-Oh and Growth Rate
- The big-Oh notation gives an upper bound on the
growth rate of a function - The statement f(n) is O(g(n)) means that the
growth rate of f(n) is no more than the growth
rate of g(n) - We can use the big-Oh notation to rank functions
according to their growth rate
40Big-Oh Rules
- If is f(n) a polynomial of degree d, then f(n) is
O(nd), i.e., - Drop lower-order terms
- Drop constant factors
- Use the smallest possible class of functions
- Say 2n is O(n) instead of 2n is O(n2)
- Use the simplest expression of the class
- Say 3n 5 is O(n) instead of 3n 5 is O(3n)
41Asymptotic Algorithm Analysis
- The asymptotic analysis of an algorithm
determines the running time in big-Oh notation - To perform the asymptotic analysis
- We find the worst-case number of primitive
operations executed as a function of the input
size - We express this function with big-Oh notation
- Example
- We determine that algorithm arrayMax executes at
most 7n ? 1 primitive operations - We say that algorithm arrayMax runs in O(n)
time - Since constant factors and lower-order terms are
eventually dropped anyhow, we can disregard them
when counting primitive operations
42Computing Prefix Averages
- We further illustrate asymptotic analysis with
two algorithms for prefix averages - The i-th prefix average of an array X is average
of the first (i 1) elements of X - Ai (X0 X1 Xi)/(i1)
43Prefix Averages (Quadratic)
- The following algorithm computes prefix averages
in quadratic time by applying the definition
Algorithm prefixAverages1(X, n) Input array X of
n integers Output array A of prefix averages of
X operations A ? new array of n integers
n for i ? 0 to n ? 1 do n s ? X0
n for j ? 1 to i do 1 2 (n ?
1) s ? s Xj 1 2 (n ? 1) Ai
? s / (i 1) n return A
1
44Arithmetic Progression
- The running time of prefixAverages1 isO(1 2
n) - The sum of the first n integers is n(n 1) / 2
- There is a simple visual proof of this fact
- Thus, algorithm prefixAverages1 runs in O(n2)
time
45Prefix Averages (Linear)
- The following algorithm computes prefix averages
in linear time by keeping a running sum
Algorithm prefixAverages2(X, n) Input array X of
n integers Output array A of prefix averages of
X operations A ? new array of n
integers n s ? 0 1 for i ? 0 to n ? 1
do n s ? s Xi n Ai ? s / (i 1)
n return A 1
- Algorithm prefixAverages2 runs in O(n) time
46Computing Spans
- We show how to use a stack as an auxiliary data
structure in an algorithm - Given an an array X, the span Si of Xi is the
maximum number of consecutive elements Xj
immediately preceding Xi and such that Xj ?
Xi - Spans have applications to financial analysis
- E.g., stock at 52-week high
X
S
47Quadratic Algorithm
Algorithm spans1(X, n) Input array X of n
integers Output array S of spans of X S
? new array of n integers n for i ? 0 to n ?
1 do n s ? 1 n while s ? i ? Xi - s ?
Xi 1 2 (n ? 1) s ? s 1 1 2
(n ? 1) Si ? s n return S
1
- Algorithm spans1 runs in O(n2) time
48Have nice break!
49Recursion
Recursion a function calls itself as a function
for unknown times. We call this recursive call
for (i 1 i lt n-1 i) sum sum 1
int sum(int n) if (n lt 1) return
1 else return (n sum(n-1))
50Recursive function
int f( int x ) if( x 0 ) return
0 else return 2 f( x - 1 ) x
x
51Recursion
Calculate factorial (n!) of a positive
integer n! n(n-1)(n-2)...(n-n-1), 0! 1! 1
int factorial(int n) if (n lt 1) return
1 else return (n factorial(n-1))
52Fibonacci numbers, Bad algorith for ngt40 !
long fib(int n) if (n lt 1) return
1 else return fib(n-1) fib(n-2)
53Algorithm IterativeLinearSum(A,n)
Algorithm IterativeLinearSum(A,n) Input An
integer array A and an integer n (size) Output
The sum of the first n integers if n 1
then return A0 else while n ? 0 do sum
sum An n ? n - 1 return sum
54Algorithm LinearSum(A,n)
Algorithm LinearSum(A,n) Input An integer array
A and an integer n (size) Output The sum of the
first n integers if n 1 then return A0 else
return LinearSum(A,n-1) An-1
55Iterative Approach Algorithm IterativeReverseArr
ay(A,i,n)
Algorithm IterativeReverseArray(A,i,n) Input An
integer array A and an integers i and n Output
The reversal of n integers in A starting at index
i while n gt 1 do swap Ai and Ain-1 i ? i
1 n ? n-2 return
56Algorithm ReverseArray(A,i,n)
Algorithm ReverseArray(A,i,n) Input An integer
array A and an integers i and n Output The
reversal of n integers in A starting at index
i if n gt 1 then swap Ai and Ain-1 call
ReverseArray(A, i1, n-2) return
57Higher-Order Recursion
Making recursive calls more than a single call at
a time.
Algorithm BinarySum(A,i,n) Input An integer
array A and an integers i and n Output The sum
of n integers in A starting at index i if n 1
then return Ai return BinarySum(A,i,n/2)Bin
arySum(A,in/2,n/2)
58Kth Fibonacci Numbers
59kth Fibonacci Numbers
Linear recursion
Algorithm BinaryFib(k) Input An integer
k Output A pair ( ) such that is the kth
Fibonacci number and is the (k-1)st
Fibonacci number if (k lt 1) then return (k,0)
else (i,j) ? LinearFibonacci(k-1) return
(ij,i)
60kth Fibonacci Numbers
Binary recursion
Algorithm BinaryFib(k) Input An integer
k Output The kth Fibonacci number if (k lt 1)
then return k else return
BinaryFib(k-1)BinaryFib(k-2)
61Math you need to Review
- Summations
- Logarithms and Exponents
- Proof techniques
- Basic probability
- properties of logarithms
- logb(xy) logbx logby
- logb (x/y) logbx - logby
- logbxa alogbx
- logba logxa/logxb
- properties of exponentials
- a(bc) aba c
- abc (ab)c
- ab /ac a(b-c)
- b a logab
- bc a clogab
62Relatives of Big-Oh
- big-Omega
- f(n) is ?(g(n)) if there is a constant c gt 0
- and an integer constant n0 ? 1 such that
- f(n) ? cg(n) for n ? n0
- big-Theta
- f(n) is ?(g(n)) if there are constants c gt 0 and
c gt 0 and an integer constant n0 ? 1 such that
cg(n) ? f(n) ? cg(n) for n ? n0 - little-oh
- f(n) is o(g(n)) if, for any constant c gt 0, there
is an integer constant n0 ? 0 such that f(n) ?
cg(n) for n ? n0 - little-omega
- f(n) is ?(g(n)) if, for any constant c gt 0, there
is an integer constant n0 ? 0 such that f(n) ?
cg(n) for n ? n0
63Intuition for Asymptotic Notation
- Big-Oh
- f(n) is O(g(n)) if f(n) is asymptotically less
than or equal to g(n) - big-Omega
- f(n) is ?(g(n)) if f(n) is asymptotically greater
than or equal to g(n) - big-Theta
- f(n) is ?(g(n)) if f(n) is asymptotically equal
to g(n) - little-oh
- f(n) is o(g(n)) if f(n) is asymptotically
strictly less than g(n) - little-omega
- f(n) is ?(g(n)) if is asymptotically strictly
greater than g(n)
64Example Uses of the Relatives of Big-Oh
f(n) is ?(g(n)) if there is a constant c gt 0 and
an integer constant n0 ? 1 such that f(n) ?
cg(n) for n ? n0 let c 5 and n0 1
f(n) is ?(g(n)) if there is a constant c gt 0 and
an integer constant n0 ? 1 such that f(n) ?
cg(n) for n ? n0 let c 1 and n0 1
f(n) is ?(g(n)) if, for any constant c gt 0, there
is an integer constant n0 ? 0 such that f(n) ?
cg(n) for n ? n0 need 5n02 ? cn0 ? given c, the
n0 that satisfies this is n0 ? c/5 ? 0