WRITING EFFICIENT PROGRAMS

About This Presentation

Title:

WRITING EFFICIENT PROGRAMS

Description:

Part I (covered now) Algorithmic Efficiency versus Program Efficiency ... e.g. GNU g compiler on Linux/Cygwin. g O5 o myprog myprog.c ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 81

Provided by: peopleSab

Category:

more less

Transcript and Presenter's Notes

Title: WRITING EFFICIENT PROGRAMS

1
WRITING EFFICIENT PROGRAMS

Kemal Oflazer
Modified and expanded by B. Yanikoglu

2
Overview

Part I (covered now)
Algorithmic Efficiency versus Program Efficiency
Common Patterns/Tricks to Speed up Your Code
Part II (covered in later parts of the course)
How to Measure Time Taken in a Particular Code
Piece
Profiling
Advanced Techniques to Speed up Your Code

3
Efficient Programs

Algorithmic versus Programming Efficiency
Your programs use the best algorithm you know
for the task (e.g. What you learn in Data
Structures)
You can now think about improving the efficiency
of your code

4
Obvious Methods

Compilers are VERY good at optimizing your code
They analyze your code to death and do most
everything machinely possible
e.g. GNU g compiler on Linux/Cygwin
g O5 o myprog myprog.c
can improve your performance 10 to 300

5
However...

There are improvements that you can do that a
compiler cannot (vice versa too, but thats not
the point)
You should make your program as efficient as
possible
Especially if it will be used often or if time is
an issue
Start observing where your program is slow
To see where should you concentrate to get the
best return

6
Writing Efficient Code

Identify sources of inefficiency
redundant computation
overhead
procedure calls
Loops
Systematically improve efficiency
General idea is to trade space for time

7
Initialize Once Use Many Times

Before

float f() double value sin(0.25)
..
8
Initialize Once Use Many Times

Alternative

double defaultValue sin(0.25) float f()
double value defaultValue ..
9
Initialize Once Use Many Times

Better Alternative

float f() static double defaultValue
sin(0.25) ..
10
Static Variables, Inline Functions, Macros
11
Static Variables

Variables defined local to a function disappear
at the end of the function scope.
When you call the function again, storage for the
variables is created anew and the values are
re-initialized.
If you want a value to be extant throughout the
life of a program, you can define a functions
local variable to be static and give it an
initial value.
Initialization is only performed the first time
the function is called and the data retains its
value between function calls.
This way, a function can remember some piece of
information between function calls.
Static versus Global variables
The beauty of a static variable is that it is
unavailable outside the scope of the function, so
it cant be inadvertently changed.
Note The static keyword has several distinct
meanings within the context of programming
static variables is one.

12
Inline functions

If they contain just simple lines of code, use no
for loops or the like, C functions can be
declared inline.
Inline code will be inserted everywhere the
function is used by the compiler
The program will be a bit bigger
Avoids function call overheads (stack handling)

13
Inline functions

include ltiostreamgt
include ltcmathgt
using namespace std
inline double hypothenuse (double a, double b)
return sqrt (a a b b)
int main ()
double k 6, m 9
cout ltlt hypothenuse (k, m) ltlt endl
//cout ltlt sqrt (k k m m) ltlt endl //The
compiled code looks like this
return 0

14
Macros

define max(a,b) (agtb?ab)
Inline functions are similar to macros because
they both are expanded at compile time
macros are expanded by the preprocessor, while
inline functions are parsed by the compiler.
There are several important differences
Inline functions follow all the protocols of type
safety enforced on normal functions.
Inline functions are specified using the same
syntax as any other function except that they
include the inline keyword in the function
declaration.
Expressions passed as arguments to inline
functions are evaluated once. In some cases,
expressions passed as arguments to macros can be
evaluated more than once.
You cannot debug macros, but you can use inline
functions.

15
Back to Efficiency
16
Initialize Once Use Many Times

Better Alternative

float f() static double defaultValue
sin(0.25) ..
17
Precompute Values

An example of space-time trade-off
If you compute something over and over again
You should compute it once (an preferably at
compile time!) and store the results
Just access the results at run time.

18
Precompute Values

Before

int f(int i) if (i lt 10 i gt 0)
return i i - i return 0
19
Precompute Values

After

static int values 00-0, 11-1,
22-2, ..., 99-9 int f(int i) if (i lt
10 i gt 0) return valuesi
return 0
20
Remove Common Subexpressions

Do not compute the same thing over and over
again!
Most compilers are good at recognizing most of
these and handling them.

21
Remove Common Subexpressions

Before

int start 10 f.length() int endPos 15
f.length()
22
Remove Common Subexpressions

After

int length f.length() int start 10
length int endPos 15 length
23
Use Algebra!

Reorganize your tests and expressions so that you
exploit algebraic identities

24
Use Algebra!

Reorganize your tests and expressions so that you
exploit algebraic identities
This is quite harder for compilers!

25
Use Algebra!

Before

if (a gt Math.sqrt(b)) return aa 3a 2
26
Use Algebra!

After

if (a gt Math.sqrt(b)) return aa 3a 2
if (a a gt b) return (a1)(a2)
27
Use Sentinels Avoid Unnecessary Tests

Before

// Linear Search static int SIZE200 int a
new intSIZE //array a is filled .... int pos
0 while (pos lt SIZE apos ! searchValue)
pos return pos
28
Use Sentinels Avoid Unnecessary Tests

Before

// Linear Search static int SIZE200 int a
new intSIZE //array a is filled int pos 0
while (pos lt SIZE apos ! searchValue)
pos return pos
29
Use Sentinels Avoid Unnecessary Tests

General Idea
Put the value you are searching at the end of the
list
If you find it before, fine!
If you find it at the end, then it was not in the
list!

30
Use Sentinels Avoid Unnecessary Tests

After

static int SIZE200 int a new intSIZE
1 aSIZE searchValue int pos 0
while (apos ! searchValue) pos
return pos
31
Use Sentinels Avoid Unnecessary Tests

After
You save about 200 comparisons

static int SIZE200 int a new intSIZE
1 aSIZE searchValue int pos 0
while (apos ! searchValue) pos
return pos
32
Move Invariant Code out of the Loop

Do not compute something unnecessarily in a loop
over and over again!
Compilers can detect most of these!

33
Move Invariant Code out of the Loop

Before

void f(double d) for (int i0 ilt100 i)
plot(i, isin(d))
34
Move Invariant Code out of the Loop

After

void f(double d) double dsinsin(d) for (int
i0 ilt100 i) plot(i, idsin)

35
Unroll short loops

Avoid loop control overhead by repeating the body
of the loop

36
Unroll short loops

Before

for (int ij iltj3 i) sum qi
-i7
37
Unroll short loops

After

int i j sum qi -i7 i sum qi
-i7 i sum qi-i7
38
What to Speed Up

Try speeding up the code that is responsible for
most of the action
Hot spots!

39
Hints

Try speeding up the code that is responsible for
most of the action
Hot spots!
Speed-up Time-Before / Time-After

40
Hints

Try speeding up the code that is responsible for
most of the action
Hot spots!
Wasting speeding up efforts on infrequently
executed code has no return investment

41
From Jon Louis Bentley's Writing Efficient
Programs''

http//www.crowl.org/Lawrence/programming/Bentley8
2.html

42
Fundamental Rules

These rules underline basic principles detailed
in the next slides.
Code Simplification
Most fast programs are simple. Therefore, keep
code simple to make it faster.
Problem Simplification
To increase efficiency of a program, simplify the
problem it solves.
Relentless Suspicion
Question the necessity of each instruction in a
time-critical piece of code and each field in a
space-critical data structure.
Early Binding
Move work forward in time. Specifically, do work
now just once in hope of avoiding doing it many
times later.

43
Space for Time Rules

Introducing redundant information can decrease
run time at the cost of increasing space used.
Data Structure Augmentation
The time required for common operations on data
can often be reduced by augmenting the structure
with extra information or by changing the
information within the structure so that it can
be accessed more easily.
Store Precomputed Results
The cost of recomputing an expensive function can
be reduced by computing the function only once
and storing the results. Subsequent requests for
the function are then handled by table lookup
rather than by computing the function.
...

44
Space for Time Rules cont.

Introducing redundant information can decrease
run time at the cost of increasing space used....
Caching
Data that is accessed most often should be the
cheapest to access. (But note caching can
backfire'' and increase the run time of a
program if locality is not present in the
underlying data.)
Lazy Evaluation
The strategy of never evaluating an item until it
is needed avoids evaluations of unnecessary
items.

45
Loop Rules

Hot spots in most programs involve loops (roughly
in order of importance)
Move Code Out of Loops
Instead of performing a certain computation in
each iteration of a loop, it is better to perform
it only once, outside the loop.
Loop Fusion
If two nearby loops operate on the same set of
elements, then combine their operational parts
and use only one set of loop control operations.

46
Loop Rules

Combining Tests
An efficient inner loop should contain as few
tests as possible, and preferably only one. The
programmer should therefore try to simulate some
of the exit conditions of the loop by other exit
conditions. Sentinels are a common application of
this rule.
Loop Unrolling
A large cost of some short loops is in modifying
the loop indices. That cost can often be reduced
by unrolling the loop.

47
Logic Rules

Reordering Tests
Logical tests should be arranged such that
inexpensive and often successful tests precede
expensive and rarely successful tests.
...

48
Procedure Rules

Define very small (usually one line functions) as
inline
This replicates the function body everywhere the
function is called, and thus avoids procedure
call overhead
...
More at http//www.crowl.org/Lawrence/programming
/Bentley82.html

49
Efficiency

Part II

50
Static Variables, Inline Functions, Macros

Reminder

51
Static Variables

Static data refers to global or 'static'
variables whose storage spaces are determined at
compile-time.
int int_array100
int main()
static float float_array100
double double_array100
char pchar
pchar (char )malloc(100)
return (0)

52
Static Variables

Variables defined local to a function disappear
at the end of the function scope.
When you call the function again, storage for the
variables is created anew and the values are
re-initialized.
If you want a value to be extant throughout the
life of a program, you can define a functions
local variable to be static and give it an
initial value.
Initialization is only performed the first time
the function is called and the data retains its
value between function calls.
This way, a function can remember some piece of
information between function calls.
Static versus Global variables
The beauty of a static variable is that it is
unavailable outside the scope of the function, so
it cant be inadvertently changed.
Note The static keyword has several distinct
meanings within the context of programming
static variables is one.

53
Stack, heap

At runtime, the data segment for a program is
broken down into three constituent parts
- static, stack, and heap data.
Static global or static variables
Stack data
- variables that exist within a scope of a
function (memory allocated at runtime for local
(automatic) variables
- e.g., double_array in the above example.
Heap data
- data that is dynamically allocated at runtime
(e.g., pchar above).
- remains in memory so long as it either being
freed explicitly or until the program terminates.

54
Program Instrumentation

Understand how your program behaves
Time
Actual Time
Number of operations

55
Measuring program behaviour

Measure Time

record starting time // your code
record end time compute elapsed time
56

We are going to skip the details about how you
can actually measure the time taken, as we are
emphasizing efficiency concept.
You can read the second part of this lecture now
or wait a few months until we and towards the end
of CS204, we are going to cover these in more
details.

57
Measuring program behaviour

Measure Time
Some issues
Timer resolution is too low
CPUs are fast
Your programs seem to take 0 time (?)

58
Measuring program behaviour

Measure Time

record starting time iterate N times //
your code record
end time compute elapsed time divide elapsed time
by N
59
Measuring program behaviour

Measure Overhead

record starting time iterate N times //
empty Loop record end time compute elapsed
time divide elapsed time by N
60
Measuring program behaviour

Measure Overhead
Subtract overhead from actual time

record starting time iterate N times //
empty Loop record end time compute elapsed
time divide elapsed time by N
61
Actual Code
. include ltsys/types.hgt include
ltsys/timeb.hgt int main(int argc, char
argv) struct _timeb tbeg,tend
_ftime( tbeg ) for (iterations 0 iterations
lt maxiter iterations)
// Code to be measured goes here _ftime(
tend) ElapsedTime ((double)tend.time
1000.0 (double)tend.millitm) -
((double)tbeg.time
1000.0 (double)tbeg.millitm)
ElapsedTime ElapsedTime / maxiter _ftime
returns time in seconds since midnight
(000000), January 1, 1970, coordinated
universal time (UTC).
62
Actual Code
struct timeb time_t time // time in seconds
unsigned short millitm // fraction in
milliseconds short timezone short dstflag
Resolution is usually 1/60 of a second.
63
Other tools under Linux /Solaris

Profiling
compile with pg option
analyze with gprof/prof
g -pg o mycode mycode.cpp
./mycode
gprof mycode gt mycode_profile.txt

64
Sample Profile Output (gprof)
cumulative self self
total time seconds seconds
calls ms/call ms/call name 51.7 0.15
0.15 592607 0.00 0.00
dyneditdistance 5 34.5 0.25 0.10
internal_mcount 6 10.3
0.28 0.03 52 0.58 3.65
depthfirst 1 3.4 0.29 0.01 93252
0.00 0.00 cuted 4 0.0 0.29
0.00 4607 0.00 0.00 editdistance
7 0.0 0.29 0.00 1 0.00
190.00 main 2
65
Sample Profile Output (gprof)

called/total parents index time self
descendents calledself name index
called/total
children 0.03 0.16
52/52 main 2 1 65.5 0.03
0.16 52 depthfirst 1
0.01 0.15 93252/93252 cuted
4 0.00 0.00 4607/4607
editdistance 7 -------------------------
---------------------- 0.00
0.19 1/1 _start 3 2
65.5 0.00 0.19 1 main
2 0.03 0.16 52/52
depthfirst 1 ---------------------------
--------------------
....
66
Other Tools on LinuX / Solaris

You can also use gcov as a profiling tool to help
discover where your optimization efforts will
best affect your code
how often each line of code executes
what lines of code are actually executed
how much computing time each section of code uses
http//gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html
Gcov-Intro

67
Trading Space for Time

Neural Network simulations very often use a
function called the sigmoid

68
Trading Space for Time

Neural Network simulations very often use a
function called the sigmoid
For large positive x sigmoid(x) ? 1
For large negative x
sigmoid (x) ? 0

69
Computing the Sigmoid
double sigmoid (x) // k 1 return (1.0
/ (1.0 exp(-x)))
70
Computing the Sigmoid

exp(-x) takes too long to compute!
Such functions use series expansion
Taylor /Maclaurin Series
Sum terms of the sort ((-x)n / n!)
Each term involves floating point calculations
Typical neural network simulations call this
functions hundreds of millions of times in a run.
Hence, sigmoid(x) accounts for most of the time
spent (possibly over 70-80)

71
Computing the Sigmoid

Instead of computing the function all the time
Compute the function at N points once and build a
table.
During actual call to sigmoid
Find nearest entries to x and lookup values for
those
Perform linear interpolation

72
Computing the Sigmoid
. . .
73
Computing the Sigmoid
if (x ltx0) return (0.0)
. . .
if (x gt x99) return (1.0)
74
Computing the Sigmoid
Actual sigmoid
Local Linear Interpolation
. . .
X
Xi
Xi1
75
Computing the Sigmoid
slope(0)
x0
inter(0)
slope(1)
inter(1)
Actual sigmoid
slope(2)
inter(2)
slope(3)
inter(3)
slope(4)
inter(4)
slope(5)
inter(5)
Local Linear Interpolation
slope(6)
inter(6)
. . .
. . .
X
Xi
Xi1
sigmoid(x) ? slope(i) x inter(i)
slope(98)
inter(98)
76
Computing the Sigmoid
slope(0)
x0
inter(0)
slope(1)
inter(1)
slope(2)
inter(2)
slope(3)
inter(3)
slope(4)
inter(4)
slope(5)
inter(5)
slope(6)
inter(6)
. . .
. . .
sigmoid(x) ? slope(i) x inter(i)
slope(98)
inter(98)
77
Computing the Sigmoid

Select the number of points (N 1000, 10000,
etc.) depending on accuracy you require
Space needed is 2 float/double for each point for
slope and intercept (8 16 bytes per point)
Initialize the table when you start execution

78
Computing the Sigmoid

You also know X0
Compute Delta X1-X0
Compute Xmax X0 N Delta
Given X
Compute i (X X0)/Delta
1 float add, 1 float divide
Compute sigmoid(x)
1 float multiplication, 1 float add

79
Results

Using exp(x)
Each call takes about 300 nanoseconds (2 Ghz
Pent. V)

80
Results

Using exp(x)
Each call takes about 300 nanoseconds on a 2 Ghz
Pentium 4.
Using table lookup and linear interpolation.
Each call takes about 30 nanoseconds
Ten fold speed-up
in exchange for 64K to 640 K extra memory usage.

Write a Comment

User Comments (0)

About PowerShow.com

WRITING EFFICIENT PROGRAMS - PowerPoint PPT Presentation

WRITING EFFICIENT PROGRAMS

Part I (covered now) Algorithmic Efficiency versus Program Efficiency ... e.g. GNU g compiler on Linux/Cygwin. g O5 o myprog myprog.c ... – PowerPoint PPT presentation