Title: WRITING EFFICIENT PROGRAMS
1WRITING EFFICIENT PROGRAMS
- Kemal Oflazer
- Modified and expanded by B. Yanikoglu
2Overview
- Part I (covered now)
- Algorithmic Efficiency versus Program Efficiency
- Common Patterns/Tricks to Speed up Your Code
- Part II (covered in later parts of the course)
- How to Measure Time Taken in a Particular Code
Piece - Profiling
- Advanced Techniques to Speed up Your Code
3Efficient Programs
- Algorithmic versus Programming Efficiency
- Your programs use the best algorithm you know
for the task (e.g. What you learn in Data
Structures) - You can now think about improving the efficiency
of your code
4Obvious Methods
- Compilers are VERY good at optimizing your code
- They analyze your code to death and do most
everything machinely possible - e.g. GNU g compiler on Linux/Cygwin
- g O5 o myprog myprog.c
- can improve your performance 10 to 300
5However...
- There are improvements that you can do that a
compiler cannot (vice versa too, but thats not
the point) - You should make your program as efficient as
possible - Especially if it will be used often or if time is
an issue - Start observing where your program is slow
- To see where should you concentrate to get the
best return
6Writing Efficient Code
- Identify sources of inefficiency
- redundant computation
- overhead
- procedure calls
- Loops
- Systematically improve efficiency
- General idea is to trade space for time
7Initialize Once Use Many Times
float f() double value sin(0.25)
..
8Initialize Once Use Many Times
double defaultValue sin(0.25) float f()
double value defaultValue ..
9Initialize Once Use Many Times
float f() static double defaultValue
sin(0.25) ..
10Static Variables, Inline Functions, Macros
11Static Variables
- Variables defined local to a function disappear
at the end of the function scope. - When you call the function again, storage for the
variables is created anew and the values are
re-initialized. - If you want a value to be extant throughout the
life of a program, you can define a functions
local variable to be static and give it an
initial value. - Initialization is only performed the first time
the function is called and the data retains its
value between function calls. - This way, a function can remember some piece of
information between function calls. - Static versus Global variables
- The beauty of a static variable is that it is
unavailable outside the scope of the function, so
it cant be inadvertently changed. - Note The static keyword has several distinct
meanings within the context of programming
static variables is one.
12Inline functions
- If they contain just simple lines of code, use no
for loops or the like, C functions can be
declared inline. - Inline code will be inserted everywhere the
function is used by the compiler - The program will be a bit bigger
- Avoids function call overheads (stack handling)
13Inline functions
- include ltiostreamgt
- include ltcmathgt
- using namespace std
- inline double hypothenuse (double a, double b)
-
- return sqrt (a a b b)
-
- int main ()
- double k 6, m 9
- cout ltlt hypothenuse (k, m) ltlt endl
- //cout ltlt sqrt (k k m m) ltlt endl //The
compiled code looks like this - return 0
-
14Macros
- define max(a,b) (agtb?ab)
- Inline functions are similar to macros because
they both are expanded at compile time - macros are expanded by the preprocessor, while
inline functions are parsed by the compiler. - There are several important differences
- Inline functions follow all the protocols of type
safety enforced on normal functions. - Inline functions are specified using the same
syntax as any other function except that they
include the inline keyword in the function
declaration. - Expressions passed as arguments to inline
functions are evaluated once. In some cases,
expressions passed as arguments to macros can be
evaluated more than once. - You cannot debug macros, but you can use inline
functions.
15Back to Efficiency
16Initialize Once Use Many Times
float f() static double defaultValue
sin(0.25) ..
17Precompute Values
- An example of space-time trade-off
- If you compute something over and over again
- You should compute it once (an preferably at
compile time!) and store the results - Just access the results at run time.
18Precompute Values
int f(int i) if (i lt 10 i gt 0)
return i i - i return 0
19Precompute Values
static int values 00-0, 11-1,
22-2, ..., 99-9 int f(int i) if (i lt
10 i gt 0) return valuesi
return 0
20Remove Common Subexpressions
- Do not compute the same thing over and over
again! - Most compilers are good at recognizing most of
these and handling them.
21Remove Common Subexpressions
int start 10 f.length() int endPos 15
f.length()
22Remove Common Subexpressions
int length f.length() int start 10
length int endPos 15 length
23Use Algebra!
- Reorganize your tests and expressions so that you
exploit algebraic identities
24Use Algebra!
- Reorganize your tests and expressions so that you
exploit algebraic identities - This is quite harder for compilers!
25Use Algebra!
if (a gt Math.sqrt(b)) return aa 3a 2
26Use Algebra!
if (a gt Math.sqrt(b)) return aa 3a 2
if (a a gt b) return (a1)(a2)
27Use Sentinels Avoid Unnecessary Tests
// Linear Search static int SIZE200 int a
new intSIZE //array a is filled .... int pos
0 while (pos lt SIZE apos ! searchValue)
pos return pos
28Use Sentinels Avoid Unnecessary Tests
// Linear Search static int SIZE200 int a
new intSIZE //array a is filled int pos 0
while (pos lt SIZE apos ! searchValue)
pos return pos
29Use Sentinels Avoid Unnecessary Tests
- General Idea
- Put the value you are searching at the end of the
list - If you find it before, fine!
- If you find it at the end, then it was not in the
list!
30Use Sentinels Avoid Unnecessary Tests
static int SIZE200 int a new intSIZE
1 aSIZE searchValue int pos 0
while (apos ! searchValue) pos
return pos
31Use Sentinels Avoid Unnecessary Tests
- After
- You save about 200 comparisons
static int SIZE200 int a new intSIZE
1 aSIZE searchValue int pos 0
while (apos ! searchValue) pos
return pos
32Move Invariant Code out of the Loop
- Do not compute something unnecessarily in a loop
over and over again! - Compilers can detect most of these!
33Move Invariant Code out of the Loop
void f(double d) for (int i0 ilt100 i)
plot(i, isin(d))
34Move Invariant Code out of the Loop
void f(double d) double dsinsin(d) for (int
i0 ilt100 i) plot(i, idsin)
35Unroll short loops
- Avoid loop control overhead by repeating the body
of the loop
36Unroll short loops
for (int ij iltj3 i) sum qi
-i7
37Unroll short loops
int i j sum qi -i7 i sum qi
-i7 i sum qi-i7
38What to Speed Up
- Try speeding up the code that is responsible for
most of the action - Hot spots!
39Hints
- Try speeding up the code that is responsible for
most of the action - Hot spots!
- Speed-up Time-Before / Time-After
40Hints
- Try speeding up the code that is responsible for
most of the action - Hot spots!
- Wasting speeding up efforts on infrequently
executed code has no return investment
41From Jon Louis Bentley's Writing Efficient
Programs''
- http//www.crowl.org/Lawrence/programming/Bentley8
2.html
42Fundamental Rules
- These rules underline basic principles detailed
in the next slides. - Code Simplification
- Most fast programs are simple. Therefore, keep
code simple to make it faster. - Problem Simplification
- To increase efficiency of a program, simplify the
problem it solves. - Relentless Suspicion
- Question the necessity of each instruction in a
time-critical piece of code and each field in a
space-critical data structure. - Early Binding
- Move work forward in time. Specifically, do work
now just once in hope of avoiding doing it many
times later.
43Space for Time Rules
- Introducing redundant information can decrease
run time at the cost of increasing space used. - Data Structure Augmentation
- The time required for common operations on data
can often be reduced by augmenting the structure
with extra information or by changing the
information within the structure so that it can
be accessed more easily. - Store Precomputed Results
- The cost of recomputing an expensive function can
be reduced by computing the function only once
and storing the results. Subsequent requests for
the function are then handled by table lookup
rather than by computing the function. - ...
44Space for Time Rules cont.
- Introducing redundant information can decrease
run time at the cost of increasing space used.... - Caching
- Data that is accessed most often should be the
cheapest to access. (But note caching can
backfire'' and increase the run time of a
program if locality is not present in the
underlying data.) - Lazy Evaluation
- The strategy of never evaluating an item until it
is needed avoids evaluations of unnecessary
items.
45Loop Rules
- Hot spots in most programs involve loops (roughly
in order of importance) - Move Code Out of Loops
- Instead of performing a certain computation in
each iteration of a loop, it is better to perform
it only once, outside the loop. - Loop Fusion
- If two nearby loops operate on the same set of
elements, then combine their operational parts
and use only one set of loop control operations.
46Loop Rules
- Combining Tests
- An efficient inner loop should contain as few
tests as possible, and preferably only one. The
programmer should therefore try to simulate some
of the exit conditions of the loop by other exit
conditions. Sentinels are a common application of
this rule. - Loop Unrolling
- A large cost of some short loops is in modifying
the loop indices. That cost can often be reduced
by unrolling the loop.
47Logic Rules
- Reordering Tests
- Logical tests should be arranged such that
inexpensive and often successful tests precede
expensive and rarely successful tests. - ...
48Procedure Rules
- Define very small (usually one line functions) as
inline - This replicates the function body everywhere the
function is called, and thus avoids procedure
call overhead - ...
- More at http//www.crowl.org/Lawrence/programming
/Bentley82.html
49Efficiency
50Static Variables, Inline Functions, Macros
51Static Variables
- Static data refers to global or 'static'
variables whose storage spaces are determined at
compile-time. - int int_array100
- int main()
- static float float_array100
- double double_array100
- char pchar
- pchar (char )malloc(100)
-
- return (0)
-
52Static Variables
- Variables defined local to a function disappear
at the end of the function scope. - When you call the function again, storage for the
variables is created anew and the values are
re-initialized. - If you want a value to be extant throughout the
life of a program, you can define a functions
local variable to be static and give it an
initial value. - Initialization is only performed the first time
the function is called and the data retains its
value between function calls. - This way, a function can remember some piece of
information between function calls. - Static versus Global variables
- The beauty of a static variable is that it is
unavailable outside the scope of the function, so
it cant be inadvertently changed. - Note The static keyword has several distinct
meanings within the context of programming
static variables is one.
53Stack, heap
- At runtime, the data segment for a program is
broken down into three constituent parts - - static, stack, and heap data.
- Static global or static variables
- Stack data
- - variables that exist within a scope of a
function (memory allocated at runtime for local
(automatic) variables - - e.g., double_array in the above example.
- Heap data
- - data that is dynamically allocated at runtime
(e.g., pchar above). - - remains in memory so long as it either being
freed explicitly or until the program terminates.
54Program Instrumentation
- Understand how your program behaves
- Time
- Actual Time
- Number of operations
55Measuring program behaviour
record starting time // your code
record end time compute elapsed time
56- We are going to skip the details about how you
can actually measure the time taken, as we are
emphasizing efficiency concept. - You can read the second part of this lecture now
or wait a few months until we and towards the end
of CS204, we are going to cover these in more
details.
57Measuring program behaviour
- Measure Time
- Some issues
- Timer resolution is too low
- CPUs are fast
- Your programs seem to take 0 time (?)
58Measuring program behaviour
record starting time iterate N times //
your code record
end time compute elapsed time divide elapsed time
by N
59Measuring program behaviour
record starting time iterate N times //
empty Loop record end time compute elapsed
time divide elapsed time by N
60Measuring program behaviour
- Measure Overhead
- Subtract overhead from actual time
record starting time iterate N times //
empty Loop record end time compute elapsed
time divide elapsed time by N
61Actual Code
. include ltsys/types.hgt include
ltsys/timeb.hgt int main(int argc, char
argv) struct _timeb tbeg,tend
_ftime( tbeg ) for (iterations 0 iterations
lt maxiter iterations)
// Code to be measured goes here _ftime(
tend) ElapsedTime ((double)tend.time
1000.0 (double)tend.millitm) -
((double)tbeg.time
1000.0 (double)tbeg.millitm)
ElapsedTime ElapsedTime / maxiter _ftime
returns time in seconds since midnight
(000000), January 1, 1970, coordinated
universal time (UTC).
62Actual Code
struct timeb time_t time // time in seconds
unsigned short millitm // fraction in
milliseconds short timezone short dstflag
Resolution is usually 1/60 of a second.
63Other tools under Linux /Solaris
- Profiling
- compile with pg option
- analyze with gprof/prof
- g -pg o mycode mycode.cpp
- ./mycode
- gprof mycode gt mycode_profile.txt
64Sample Profile Output (gprof)
cumulative self self
total time seconds seconds
calls ms/call ms/call name 51.7 0.15
0.15 592607 0.00 0.00
dyneditdistance 5 34.5 0.25 0.10
internal_mcount 6 10.3
0.28 0.03 52 0.58 3.65
depthfirst 1 3.4 0.29 0.01 93252
0.00 0.00 cuted 4 0.0 0.29
0.00 4607 0.00 0.00 editdistance
7 0.0 0.29 0.00 1 0.00
190.00 main 2
65Sample Profile Output (gprof)
called/total parents index time self
descendents calledself name index
called/total
children 0.03 0.16
52/52 main 2 1 65.5 0.03
0.16 52 depthfirst 1
0.01 0.15 93252/93252 cuted
4 0.00 0.00 4607/4607
editdistance 7 -------------------------
---------------------- 0.00
0.19 1/1 _start 3 2
65.5 0.00 0.19 1 main
2 0.03 0.16 52/52
depthfirst 1 ---------------------------
--------------------
....
66Other Tools on LinuX / Solaris
- You can also use gcov as a profiling tool to help
discover where your optimization efforts will
best affect your code - how often each line of code executes
- what lines of code are actually executed
- how much computing time each section of code uses
- http//gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html
Gcov-Intro
67Trading Space for Time
- Neural Network simulations very often use a
function called the sigmoid
68Trading Space for Time
- Neural Network simulations very often use a
function called the sigmoid - For large positive x sigmoid(x) ? 1
- For large negative x
- sigmoid (x) ? 0
69Computing the Sigmoid
double sigmoid (x) // k 1 return (1.0
/ (1.0 exp(-x)))
70Computing the Sigmoid
- exp(-x) takes too long to compute!
- Such functions use series expansion
- Taylor /Maclaurin Series
- Sum terms of the sort ((-x)n / n!)
- Each term involves floating point calculations
- Typical neural network simulations call this
functions hundreds of millions of times in a run. - Hence, sigmoid(x) accounts for most of the time
spent (possibly over 70-80)
71Computing the Sigmoid
- Instead of computing the function all the time
- Compute the function at N points once and build a
table. - During actual call to sigmoid
- Find nearest entries to x and lookup values for
those - Perform linear interpolation
72Computing the Sigmoid
. . .
73Computing the Sigmoid
if (x ltx0) return (0.0)
. . .
if (x gt x99) return (1.0)
74Computing the Sigmoid
Actual sigmoid
Local Linear Interpolation
. . .
X
Xi
Xi1
75Computing the Sigmoid
slope(0)
x0
inter(0)
slope(1)
inter(1)
Actual sigmoid
slope(2)
inter(2)
slope(3)
inter(3)
slope(4)
inter(4)
slope(5)
inter(5)
Local Linear Interpolation
slope(6)
inter(6)
. . .
. . .
X
Xi
Xi1
sigmoid(x) ? slope(i) x inter(i)
slope(98)
inter(98)
76Computing the Sigmoid
slope(0)
x0
inter(0)
slope(1)
inter(1)
slope(2)
inter(2)
slope(3)
inter(3)
slope(4)
inter(4)
slope(5)
inter(5)
slope(6)
inter(6)
. . .
. . .
sigmoid(x) ? slope(i) x inter(i)
slope(98)
inter(98)
77Computing the Sigmoid
- Select the number of points (N 1000, 10000,
etc.) depending on accuracy you require - Space needed is 2 float/double for each point for
slope and intercept (8 16 bytes per point) - Initialize the table when you start execution
78Computing the Sigmoid
- You also know X0
- Compute Delta X1-X0
- Compute Xmax X0 N Delta
- Given X
- Compute i (X X0)/Delta
- 1 float add, 1 float divide
- Compute sigmoid(x)
- 1 float multiplication, 1 float add
79Results
- Using exp(x)
- Each call takes about 300 nanoseconds (2 Ghz
Pent. V)
80Results
- Using exp(x)
- Each call takes about 300 nanoseconds on a 2 Ghz
Pentium 4. - Using table lookup and linear interpolation.
- Each call takes about 30 nanoseconds
- Ten fold speed-up
- in exchange for 64K to 640 K extra memory usage.