Title: WRITING EFFICIENT PROGRAMS
1WRITING EFFICIENT PROGRAMS
- Kemal Oflazer
- Modified and expanded by B. Yanikoglu
2Overview
- Part I (covered now)
- Algorithmic Efficiency versus Program Efficiency
- Common Patterns/Tricks to Speed up Your Code
- Part II (covered in later parts of the course)
- How to Measure Time Taken in a Particular Code
Piece - Profiling
- Advanced Techniques to Speed up Your Code
3Efficient Programs
- Algorithmic versus Programming Efficiency
- Your programs use the best algorithm you know
for the task (e.g. What you learn in Data
Structures) - You can now think about improving the efficiency
of your code
4Obvious Methods
- Compilers are VERY good at optimizing your code
- They analyze your code to death and do most
everything machinely possible - e.g. GNU g compiler on Linux/Cygwin
- g O5 o myprog myprog.c
- can improve your performance 10 to 300
5However...
- There are improvements that you can do that a
compiler cannot (vice versa too, but thats not
the point) - You should make your program as efficient as
possible - Especially if it will be used often or if time is
an issue - Start observing where your program is slow
- To see where should you concentrate to get the
best return
6Writing Efficient Code
- Identify sources of inefficiency
- redundant computation
- overhead
- procedure calls
- loops
- Improve efficiency
- General idea is to trade space for time
71) Remove Common Subexpressions
- Do not compute the same thing over and over
again! - Most compilers are good at recognizing most of
these and handling them.
81) Remove Common Subexpressions
int start 10 f.length() int endPos 15
f.length()
91) Remove Common Subexpressions
int length f.length() int start 10
length int endPos 15 length
102) Move Invariant Code out of the Loop
- Do not compute something unnecessarily in a loop
over and over again! - Compilers can detect most of these!
112) Move Invariant Code out of the Loop
void f(double d) for (int i0 ilt100 i)
plot(i, isin(d))
122) Move Invariant Code out of the Loop
void f(double d) double dsinsin(d) for (int
i0 ilt100 i) plot(i, idsin)
133) Precompute Values
- An example of space-time trade-off
- If you compute something over and over again
- You should compute it once (an preferably at
compile time!) and store the results - Just access the results at run time.
143) Precompute Values
int func(int quad) static double pi
3.1416 if (quad 1) return
sin(pi/180) else if (quad 2)
return cos(3/4pi) else
if (quad 10) return cos(1/4pi)
else return 0
153) Precompute Values
constant double pi 3.1416 static double
trigo sin(pi/180), cos(3/4pi/180),
cos(1/4pi) //return some precomputed
trigon. function of quad double func(int quad)
if (quad lt 10) return
trigo(quad-1) else return 0
164) Reorder logical test
if (sqrt(x) lt 10 y lt 0)
if (y lt 0 sqrt(x) lt 10)
175) Use Sentinels Avoid Unnecessary Tests
// Linear Search const int SIZE200 //max size
of int a SIZE ... //array a is
filled ... int pos 0 while (pos lt SIZE
apos ! searchValue) pos return
pos
18Use Sentinels Avoid Unnecessary Tests
- General Idea
- Put the value you are searching at the end of the
list - If you find it before, fine!
- If you find it at the end, then it was not in the
list!
19Use Sentinels Avoid Unnecessary Tests
// Linear Search static int SIZE201 int a
SIZE //array a is filled aSIZE
searchValue int pos 0 while (apos !
searchValue) pos return pos
20Use Sentinels Avoid Unnecessary Tests
const int SIZE200 int a new intSIZE
1 aSIZE searchValue int pos 0
while (apos ! searchValue) pos
return pos
The version with dynamic memory allocation you
may ignore for now.
21Use Sentinels Avoid Unnecessary Tests
- After
- You save about 200 comparisons
static int SIZE200 int a new intSIZE
1 aSIZE searchValue int pos 0
while (apos ! searchValue) pos
return pos
226) Use Algebra!
- Reorganize your tests and expressions so that you
exploit algebraic identities - This is done less often than other tricks, but if
you have a mathematical problem at hand, this may
be applicable. - This is quite harder for compilers!
236) Use Algebra!
if (a gt Math.sqrt(b)) return aa 3a 2
246) Use Algebra!
if (a gt Math.sqrt(b)) return aa 3a 2
if (a a gt b) return (a1)(a2)
257 ) Unroll short loops
- Avoid loop control overhead by repeating the body
of the loop. - This is not very common, but I dont think its
worth the effort or even losing readability, but
it can be done if very frequently used.
267) Unroll short loops
for (int ij iltj3 i) sum qi
-i7
277) Unroll short loops
int i j sum qi -i7 i sum qi
-i7 i sum qi-i7
288 ) Initialize Once Use Many Times
float f() double value sin(0.25)
..
29Initialize Once Use Many Times
double defaultValue sin(0.25) float f()
double value defaultValue ..
30Initialize Once Use Many Times
float f() static double defaultValue
sin(0.25) ..
31Static Variables, Inline Functions, Macros
32Static Variables
- Variables defined local to a function disappear
at the end of the function scope. When you call
the function again, storage for the variables is
created anew and the values are re-initialized. - If you want a value to be extant throughout the
life of a program, you can define a functions
local variable to be static and give it an
initial value. - Initialization is only performed the first time
the function is called and the data retains its
value between function calls. - This way, a function can remember some piece of
information between function calls. - Static versus Global variables
- The beauty of a static variable is that it is
unavailable outside the scope of the function, so
it cant be inadvertently changed. - Note The static keyword has several distinct
meanings within the context of programming
static variables is one.
33Inline functions
- If they contain just simple lines of code, use no
for loops or the like, C functions can be
declared inline. - Inline code will be inserted everywhere the
function is used by the compiler - The program will be a bit bigger
- Avoids function call overheads (stack handling)
34Inline functions
- include ltiostreamgt
- include ltcmathgt
- using namespace std
- inline double hypothenuse (double a, double b)
-
- return sqrt (a a b b)
-
- int main ()
- double k 6, m 9
- cout ltlt hypothenuse (k, m) ltlt endl
- //cout ltlt sqrt (k k m m) ltlt endl //The
compiled code looks like this - return 0
-
35Macros
- define max(a,b) (agtb?ab)
- Inline functions are similar to macros because
they both are expanded at compile time - macros are expanded by the preprocessor, while
inline functions are parsed by the compiler. - There are several important differences
- Inline functions follow all the protocols of type
safety enforced on normal functions. - Inline functions are specified using the same
syntax as any other function except that they
include the inline keyword in the function
declaration. - Expressions passed as arguments to inline
functions are evaluated once. In some cases,
expressions passed as arguments to macros can be
evaluated more than once. - You cannot debug macros, but you can use inline
functions.
36What to Speed Up
- Try speeding up the code that is responsible for
most of the action - Hot spots!
37Hints
- Try speeding up the code that is responsible for
most of the action - Hot spots!
- Speed-up Time-Before / Time-After
- Wasting speeding up efforts on infrequently
executed code has no return investment
38From Jon Louis Bentley's Writing Efficient
Programs''
- http//www.crowl.org/Lawrence/programming/Bentley8
2.html
39Fundamental Rules
- These rules underline basic principles detailed
in the next slides. - Code Simplification
- Most fast programs are simple. Therefore, keep
code simple to make it faster. - Problem Simplification
- To increase efficiency of a program, simplify the
problem it solves. - Relentless Suspicion
- Question the necessity of each instruction in a
time-critical piece of code and each field in a
space-critical data structure. - Early Binding
- Do work now just once in hope of avoiding doing
it many times later.
40Space for Time Rules
- Introducing redundant information can decrease
run time at the cost of increasing space used. - Data Structure Augmentation
- The time required for common operations on data
can often be reduced by augmenting the structure
with extra information or by changing the
information within the structure so that it can
be accessed more easily. - Store Precomputed Results
- The cost of recomputing an expensive function can
be reduced by computing the function only once
and storing the results. Subsequent requests for
the function are then handled by table lookup
rather than by computing the function. - ...
41Space for Time Rules cont.
- Introducing redundant information can decrease
run time at the cost of increasing space used.... - Caching
- Data that is accessed most often should be the
cheapest to access. (But note caching can
backfire'' and increase the run time of a
program if locality is not present in the
underlying data.) - Lazy Evaluation
- The strategy of never evaluating an item until it
is needed.
42Loop Rules
- Hot spots in most programs involve loops (roughly
in order of importance) - Move Code Out of Loops
- Instead of performing a certain computation in
each iteration of a loop, it is better to perform
it only once, outside the loop. - Loop Fusion
- If two nearby loops operate on the same set of
elements, then combine their operational parts
and use only one set of loop control operations.
43Loop Rules
- Combining Tests
- An efficient inner loop should contain as few
tests as possible, and preferably only one. The
programmer should therefore try to simulate some
of the exit conditions of the loop by other exit
conditions. Sentinels are a common application of
this rule. - Reordering Tests
- Logical tests should be arranged such that
inexpensive and often successful tests precede
expensive and rarely successful tests. - Loop Unrolling
- A large cost of some short loops is in modifying
the loop indices. That cost can often be reduced
by unrolling the loop.
44Procedure Rules
- Define very small (usually one line functions) as
inline - This replicates the function body everywhere the
function is called, and thus avoids procedure
call overhead - ...
- More at http//www.crowl.org/Lawrence/programming
/Bentley82.html
45EfficiencyPart II
- We will cover the rest towards the end of the
course
46Static Variables, Inline Functions, Macros
47Static Variables
- Static data refers to global or 'static'
variables whose storage spaces are determined at
compile-time. - int int_array100
- int main()
- static float float_array100
- double double_array100
- char pchar
- pchar (char )malloc(100)
-
- return (0)
-
48Static Variables
- Variables defined local to a function disappear
at the end of the function scope. - When you call the function again, storage for the
variables is created anew and the values are
re-initialized. - If you want a value to be extant throughout the
life of a program, you can define a functions
local variable to be static and give it an
initial value. - Initialization is only performed the first time
the function is called and the data retains its
value between function calls. - This way, a function can remember some piece of
information between function calls. - Static versus Global variables
- The beauty of a static variable is that it is
unavailable outside the scope of the function, so
it cant be inadvertently changed. - Note The static keyword has several distinct
meanings within the context of programming
static variables is one.
49Stack, heap
- At runtime, the data segment for a program is
broken down into three constituent parts - - static, stack, and heap data.
- Static global or static variables
- Stack data
- - variables that exist within a scope of a
function (memory allocated at runtime for local
(automatic) variables - - e.g., double_array in the above example.
- Heap data
- - data that is dynamically allocated at runtime
(e.g., pchar above). - - remains in memory so long as it either being
freed explicitly or until the program terminates.
50Program Instrumentation
- Understand how your program behaves
- Time
- Actual Time
- Number of operations
51Measuring program behaviour
record starting time // your code
record end time compute elapsed time
52- We are going to skip the details about how you
can actually measure the time taken, as we are
emphasizing efficiency concept. - You can read the second part of this lecture now
or wait a few months towards the end of CS204,
when we willcover these in more details.
53Measuring program behaviour
- Measure Time
- Some issues
- Timer resolution is too low
- CPUs are fast
- Your programs seem to take 0 time (?)
54Measuring program behaviour
record starting time iterate N times //
your code record
end time compute elapsed time divide elapsed time
by N
55Measuring program behaviour
record starting time iterate N times //
empty Loop record end time compute elapsed
time divide elapsed time by N
56Measuring program behaviour
- Measure Overhead
- Subtract overhead from actual time
record starting time iterate N times //
empty Loop record end time compute elapsed
time divide elapsed time by N
57Actual Code
. include ltsys/types.hgt include
ltsys/timeb.hgt int main(int argc, char
argv) struct _timeb tbeg,tend
_ftime( tbeg ) for (iterations 0 iterations
lt maxiter iterations)
// Code to be measured goes here _ftime(
tend) ElapsedTime ((double)tend.time
1000.0 (double)tend.millitm) -
((double)tbeg.time
1000.0 (double)tbeg.millitm)
ElapsedTime ElapsedTime / maxiter _ftime
returns time in seconds since midnight
(000000), January 1, 1970, coordinated
universal time (UTC).
58Actual Code
struct timeb time_t time // time in seconds
unsigned short millitm // fraction in
milliseconds short timezone short dstflag
Resolution is usually 1/60 of a second.
59Other tools under Linux /Solaris
- Profiling
- compile with pg option
- analyze with gprof/prof
- g -pg o mycode mycode.cpp
- ./mycode
- gprof mycode gt mycode_profile.txt
60Sample Profile Output (gprof)
cumulative self self
total time seconds seconds
calls ms/call ms/call name 51.7 0.15
0.15 592607 0.00 0.00
dyneditdistance 5 34.5 0.25 0.10
internal_mcount 6 10.3
0.28 0.03 52 0.58 3.65
depthfirst 1 3.4 0.29 0.01 93252
0.00 0.00 cuted 4 0.0 0.29
0.00 4607 0.00 0.00 editdistance
7 0.0 0.29 0.00 1 0.00
190.00 main 2
61Sample Profile Output (gprof)
called/total parents index time self
descendents calledself name index
called/total
children 0.03 0.16
52/52 main 2 1 65.5 0.03
0.16 52 depthfirst 1
0.01 0.15 93252/93252 cuted
4 0.00 0.00 4607/4607
editdistance 7 -------------------------
---------------------- 0.00
0.19 1/1 _start 3 2
65.5 0.00 0.19 1 main
2 0.03 0.16 52/52
depthfirst 1 ---------------------------
--------------------
....
62Other Tools on LinuX / Solaris
- You can also use gcov as a profiling tool to help
discover where your optimization efforts will
best affect your code - how often each line of code executes
- what lines of code are actually executed
- how much computing time each section of code uses
- http//gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html
Gcov-Intro
63Trading Space for Time
- Neural Network simulations very often use a
function called the sigmoid
64Trading Space for Time
- Neural Network simulations very often use a
function called the sigmoid - For large positive x sigmoid(x) ? 1
- For large negative x
- sigmoid (x) ? 0
65Computing the Sigmoid
double sigmoid (x) // k 1 return (1.0
/ (1.0 exp(-x)))
66Computing the Sigmoid
- exp(-x) takes too long to compute!
- Such functions use series expansion
- Taylor /Maclaurin Series
- Sum terms of the sort ((-x)n / n!)
- Each term involves floating point calculations
- Typical neural network simulations call this
functions hundreds of millions of times in a run. - Hence, sigmoid(x) accounts for most of the time
spent (possibly over 70-80)
67Computing the Sigmoid
- Instead of computing the function all the time
- Compute the function at N points once and build a
table. - During actual call to sigmoid
- Find nearest entries to x and lookup values for
those - Perform linear interpolation
68Computing the Sigmoid
. . .
69Computing the Sigmoid
if (x ltx0) return (0.0)
. . .
if (x gt x99) return (1.0)
70Computing the Sigmoid
Actual sigmoid
Local Linear Interpolation
. . .
X
Xi
Xi1
71Computing the Sigmoid
slope(0)
x0
inter(0)
slope(1)
inter(1)
Actual sigmoid
slope(2)
inter(2)
slope(3)
inter(3)
slope(4)
inter(4)
slope(5)
inter(5)
Local Linear Interpolation
slope(6)
inter(6)
. . .
. . .
X
Xi
Xi1
sigmoid(x) ? slope(i) x inter(i)
slope(98)
inter(98)
72Computing the Sigmoid
slope(0)
x0
inter(0)
slope(1)
inter(1)
slope(2)
inter(2)
slope(3)
inter(3)
slope(4)
inter(4)
slope(5)
inter(5)
slope(6)
inter(6)
. . .
. . .
sigmoid(x) ? slope(i) x inter(i)
slope(98)
inter(98)
73Computing the Sigmoid
- Select the number of points (N 1000, 10000,
etc.) depending on accuracy you require - Space needed is 2 float/double for each point for
slope and intercept (8 16 bytes per point) - Initialize the table when you start execution
74Computing the Sigmoid
- You also know X0
- Compute Delta X1-X0
- Compute Xmax X0 N Delta
- Given X
- Compute i (X X0)/Delta
- 1 float add, 1 float divide
- Compute sigmoid(x)
- 1 float multiplication, 1 float add
75Results
- Using exp(x)
- Each call takes about 300 nanoseconds (2 Ghz
Pent. V)
76Results
- Using exp(x)
- Each call takes about 300 nanoseconds on a 2 Ghz
Pentium 4. - Using table lookup and linear interpolation.
- Each call takes about 30 nanoseconds
- Ten fold speed-up
- in exchange for 64K to 640 K extra memory usage.