WRITING EFFICIENT PROGRAMS - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

WRITING EFFICIENT PROGRAMS

Description:

Part I (covered now) Algorithmic Efficiency versus Program Efficiency ... e.g. GNU g compiler on Linux/Cygwin. g O5 o myprog myprog.c ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 81
Provided by: peopleSab
Category:

less

Transcript and Presenter's Notes

Title: WRITING EFFICIENT PROGRAMS


1
WRITING EFFICIENT PROGRAMS
  • Kemal Oflazer
  • Modified and expanded by B. Yanikoglu

2
Overview
  • Part I (covered now)
  • Algorithmic Efficiency versus Program Efficiency
  • Common Patterns/Tricks to Speed up Your Code
  • Part II (covered in later parts of the course)
  • How to Measure Time Taken in a Particular Code
    Piece
  • Profiling
  • Advanced Techniques to Speed up Your Code

3
Efficient Programs
  • Algorithmic versus Programming Efficiency
  • Your programs use the best algorithm you know
    for the task (e.g. What you learn in Data
    Structures)
  • You can now think about improving the efficiency
    of your code

4
Obvious Methods
  • Compilers are VERY good at optimizing your code
  • They analyze your code to death and do most
    everything machinely possible
  • e.g. GNU g compiler on Linux/Cygwin
  • g O5 o myprog myprog.c
  • can improve your performance 10 to 300

5
However...
  • There are improvements that you can do that a
    compiler cannot (vice versa too, but thats not
    the point)
  • You should make your program as efficient as
    possible
  • Especially if it will be used often or if time is
    an issue
  • Start observing where your program is slow
  • To see where should you concentrate to get the
    best return

6
Writing Efficient Code
  • Identify sources of inefficiency
  • redundant computation
  • overhead
  • procedure calls
  • Loops
  • Systematically improve efficiency
  • General idea is to trade space for time

7
Initialize Once Use Many Times
  • Before

float f() double value sin(0.25)
..
8
Initialize Once Use Many Times
  • Alternative

double defaultValue sin(0.25) float f()
double value defaultValue ..
9
Initialize Once Use Many Times
  • Better Alternative

float f() static double defaultValue
sin(0.25) ..
10
Static Variables, Inline Functions, Macros
11
Static Variables
  • Variables defined local to a function disappear
    at the end of the function scope.
  • When you call the function again, storage for the
    variables is created anew and the values are
    re-initialized.
  • If you want a value to be extant throughout the
    life of a program, you can define a functions
    local variable to be static and give it an
    initial value.
  • Initialization is only performed the first time
    the function is called and the data retains its
    value between function calls.
  • This way, a function can remember some piece of
    information between function calls.
  • Static versus Global variables
  • The beauty of a static variable is that it is
    unavailable outside the scope of the function, so
    it cant be inadvertently changed.
  • Note The static keyword has several distinct
    meanings within the context of programming
    static variables is one.

12
Inline functions
  • If they contain just simple lines of code, use no
    for loops or the like, C functions can be
    declared inline.
  • Inline code will be inserted everywhere the
    function is used by the compiler
  • The program will be a bit bigger
  • Avoids function call overheads (stack handling)

13
Inline functions
  • include ltiostreamgt
  • include ltcmathgt
  • using namespace std
  • inline double hypothenuse (double a, double b)
  • return sqrt (a a b b)
  • int main ()
  • double k 6, m 9
  • cout ltlt hypothenuse (k, m) ltlt endl
  • //cout ltlt sqrt (k k m m) ltlt endl //The
    compiled code looks like this
  • return 0

14
Macros
  • define max(a,b) (agtb?ab)
  • Inline functions are similar to macros because
    they both are expanded at compile time
  • macros are expanded by the preprocessor, while
    inline functions are parsed by the compiler.
  • There are several important differences
  • Inline functions follow all the protocols of type
    safety enforced on normal functions.
  • Inline functions are specified using the same
    syntax as any other function except that they
    include the inline keyword in the function
    declaration.
  • Expressions passed as arguments to inline
    functions are evaluated once. In some cases,
    expressions passed as arguments to macros can be
    evaluated more than once.
  • You cannot debug macros, but you can use inline
    functions.

15
Back to Efficiency
16
Initialize Once Use Many Times
  • Better Alternative

float f() static double defaultValue
sin(0.25) ..
17
Precompute Values
  • An example of space-time trade-off
  • If you compute something over and over again
  • You should compute it once (an preferably at
    compile time!) and store the results
  • Just access the results at run time.

18
Precompute Values
  • Before

int f(int i) if (i lt 10 i gt 0)
return i i - i return 0
19
Precompute Values
  • After

static int values 00-0, 11-1,
22-2, ..., 99-9 int f(int i) if (i lt
10 i gt 0) return valuesi
return 0
20
Remove Common Subexpressions
  • Do not compute the same thing over and over
    again!
  • Most compilers are good at recognizing most of
    these and handling them.

21
Remove Common Subexpressions
  • Before

int start 10 f.length() int endPos 15
f.length()
22
Remove Common Subexpressions
  • After

int length f.length() int start 10
length int endPos 15 length
23
Use Algebra!
  • Reorganize your tests and expressions so that you
    exploit algebraic identities

24
Use Algebra!
  • Reorganize your tests and expressions so that you
    exploit algebraic identities
  • This is quite harder for compilers!

25
Use Algebra!
  • Before

if (a gt Math.sqrt(b)) return aa 3a 2
26
Use Algebra!
  • After

if (a gt Math.sqrt(b)) return aa 3a 2
if (a a gt b) return (a1)(a2)
27
Use Sentinels Avoid Unnecessary Tests
  • Before

// Linear Search static int SIZE200 int a
new intSIZE //array a is filled .... int pos
0 while (pos lt SIZE apos ! searchValue)
pos return pos
28
Use Sentinels Avoid Unnecessary Tests
  • Before

// Linear Search static int SIZE200 int a
new intSIZE //array a is filled int pos 0
while (pos lt SIZE apos ! searchValue)
pos return pos
29
Use Sentinels Avoid Unnecessary Tests
  • General Idea
  • Put the value you are searching at the end of the
    list
  • If you find it before, fine!
  • If you find it at the end, then it was not in the
    list!

30
Use Sentinels Avoid Unnecessary Tests
  • After

static int SIZE200 int a new intSIZE
1 aSIZE searchValue int pos 0
while (apos ! searchValue) pos
return pos
31
Use Sentinels Avoid Unnecessary Tests
  • After
  • You save about 200 comparisons

static int SIZE200 int a new intSIZE
1 aSIZE searchValue int pos 0
while (apos ! searchValue) pos
return pos
32
Move Invariant Code out of the Loop
  • Do not compute something unnecessarily in a loop
    over and over again!
  • Compilers can detect most of these!

33
Move Invariant Code out of the Loop
  • Before

void f(double d) for (int i0 ilt100 i)
plot(i, isin(d))
34
Move Invariant Code out of the Loop
  • After

void f(double d) double dsinsin(d) for (int
i0 ilt100 i) plot(i, idsin)

35
Unroll short loops
  • Avoid loop control overhead by repeating the body
    of the loop

36
Unroll short loops
  • Before

for (int ij iltj3 i) sum qi
-i7
37
Unroll short loops
  • After

int i j sum qi -i7 i sum qi
-i7 i sum qi-i7
38
What to Speed Up
  • Try speeding up the code that is responsible for
    most of the action
  • Hot spots!

39
Hints
  • Try speeding up the code that is responsible for
    most of the action
  • Hot spots!
  • Speed-up Time-Before / Time-After

40
Hints
  • Try speeding up the code that is responsible for
    most of the action
  • Hot spots!
  • Wasting speeding up efforts on infrequently
    executed code has no return investment

41
From Jon Louis Bentley's Writing Efficient
Programs''
  • http//www.crowl.org/Lawrence/programming/Bentley8
    2.html

42
Fundamental Rules
  • These rules underline basic principles detailed
    in the next slides.
  • Code Simplification
  • Most fast programs are simple. Therefore, keep
    code simple to make it faster.
  • Problem Simplification
  • To increase efficiency of a program, simplify the
    problem it solves.
  • Relentless Suspicion
  • Question the necessity of each instruction in a
    time-critical piece of code and each field in a
    space-critical data structure.
  • Early Binding
  • Move work forward in time. Specifically, do work
    now just once in hope of avoiding doing it many
    times later.

43
Space for Time Rules
  • Introducing redundant information can decrease
    run time at the cost of increasing space used.
  • Data Structure Augmentation
  • The time required for common operations on data
    can often be reduced by augmenting the structure
    with extra information or by changing the
    information within the structure so that it can
    be accessed more easily.
  • Store Precomputed Results
  • The cost of recomputing an expensive function can
    be reduced by computing the function only once
    and storing the results. Subsequent requests for
    the function are then handled by table lookup
    rather than by computing the function.
  • ...

44
Space for Time Rules cont.
  • Introducing redundant information can decrease
    run time at the cost of increasing space used....
  • Caching
  • Data that is accessed most often should be the
    cheapest to access. (But note caching can
    backfire'' and increase the run time of a
    program if locality is not present in the
    underlying data.)
  • Lazy Evaluation
  • The strategy of never evaluating an item until it
    is needed avoids evaluations of unnecessary
    items.

45
Loop Rules
  • Hot spots in most programs involve loops (roughly
    in order of importance)
  • Move Code Out of Loops
  • Instead of performing a certain computation in
    each iteration of a loop, it is better to perform
    it only once, outside the loop.
  • Loop Fusion
  • If two nearby loops operate on the same set of
    elements, then combine their operational parts
    and use only one set of loop control operations.

46
Loop Rules
  • Combining Tests
  • An efficient inner loop should contain as few
    tests as possible, and preferably only one. The
    programmer should therefore try to simulate some
    of the exit conditions of the loop by other exit
    conditions. Sentinels are a common application of
    this rule.
  • Loop Unrolling
  • A large cost of some short loops is in modifying
    the loop indices. That cost can often be reduced
    by unrolling the loop.

47
Logic Rules
  • Reordering Tests
  • Logical tests should be arranged such that
    inexpensive and often successful tests precede
    expensive and rarely successful tests.
  • ...

48
Procedure Rules
  • Define very small (usually one line functions) as
    inline
  • This replicates the function body everywhere the
    function is called, and thus avoids procedure
    call overhead
  • ...
  • More at http//www.crowl.org/Lawrence/programming
    /Bentley82.html

49
Efficiency
  • Part II

50
Static Variables, Inline Functions, Macros
  • Reminder

51
Static Variables
  • Static data refers to global or 'static'
    variables whose storage spaces are determined at
    compile-time.
  • int int_array100
  • int main()
  • static float float_array100
  • double double_array100
  • char pchar
  • pchar (char )malloc(100)
  • return (0)

52
Static Variables
  • Variables defined local to a function disappear
    at the end of the function scope.
  • When you call the function again, storage for the
    variables is created anew and the values are
    re-initialized.
  • If you want a value to be extant throughout the
    life of a program, you can define a functions
    local variable to be static and give it an
    initial value.
  • Initialization is only performed the first time
    the function is called and the data retains its
    value between function calls.
  • This way, a function can remember some piece of
    information between function calls.
  • Static versus Global variables
  • The beauty of a static variable is that it is
    unavailable outside the scope of the function, so
    it cant be inadvertently changed.
  • Note The static keyword has several distinct
    meanings within the context of programming
    static variables is one.

53
Stack, heap
  • At runtime, the data segment for a program is
    broken down into three constituent parts
  • - static, stack, and heap data.
  • Static global or static variables
  • Stack data
  • - variables that exist within a scope of a
    function (memory allocated at runtime for local
    (automatic) variables
  • - e.g., double_array in the above example.
  • Heap data
  • - data that is dynamically allocated at runtime
    (e.g., pchar above).
  • - remains in memory so long as it either being
    freed explicitly or until the program terminates.

54
Program Instrumentation
  • Understand how your program behaves
  • Time
  • Actual Time
  • Number of operations

55
Measuring program behaviour
  • Measure Time

record starting time // your code
record end time compute elapsed time
56
  • We are going to skip the details about how you
    can actually measure the time taken, as we are
    emphasizing efficiency concept.
  • You can read the second part of this lecture now
    or wait a few months until we and towards the end
    of CS204, we are going to cover these in more
    details.

57
Measuring program behaviour
  • Measure Time
  • Some issues
  • Timer resolution is too low
  • CPUs are fast
  • Your programs seem to take 0 time (?)

58
Measuring program behaviour
  • Measure Time

record starting time iterate N times //
your code record
end time compute elapsed time divide elapsed time
by N
59
Measuring program behaviour
  • Measure Overhead

record starting time iterate N times //
empty Loop record end time compute elapsed
time divide elapsed time by N
60
Measuring program behaviour
  • Measure Overhead
  • Subtract overhead from actual time

record starting time iterate N times //
empty Loop record end time compute elapsed
time divide elapsed time by N
61
Actual Code
. include ltsys/types.hgt include
ltsys/timeb.hgt int main(int argc, char
argv) struct _timeb tbeg,tend
_ftime( tbeg ) for (iterations 0 iterations
lt maxiter iterations)
// Code to be measured goes here _ftime(
tend) ElapsedTime ((double)tend.time
1000.0 (double)tend.millitm) -
((double)tbeg.time
1000.0 (double)tbeg.millitm)
ElapsedTime ElapsedTime / maxiter _ftime
returns time in seconds since midnight
(000000), January 1, 1970, coordinated
universal time (UTC).
62
Actual Code
struct timeb time_t time // time in seconds
unsigned short millitm // fraction in
milliseconds short timezone short dstflag
Resolution is usually 1/60 of a second.
63
Other tools under Linux /Solaris
  • Profiling
  • compile with pg option
  • analyze with gprof/prof
  • g -pg o mycode mycode.cpp
  • ./mycode
  • gprof mycode gt mycode_profile.txt

64
Sample Profile Output (gprof)
cumulative self self
total time seconds seconds
calls ms/call ms/call name 51.7 0.15
0.15 592607 0.00 0.00
dyneditdistance 5 34.5 0.25 0.10
internal_mcount 6 10.3
0.28 0.03 52 0.58 3.65
depthfirst 1 3.4 0.29 0.01 93252
0.00 0.00 cuted 4 0.0 0.29
0.00 4607 0.00 0.00 editdistance
7 0.0 0.29 0.00 1 0.00
190.00 main 2
65
Sample Profile Output (gprof)

called/total parents index time self
descendents calledself name index
called/total
children 0.03 0.16
52/52 main 2 1 65.5 0.03
0.16 52 depthfirst 1
0.01 0.15 93252/93252 cuted
4 0.00 0.00 4607/4607
editdistance 7 -------------------------
---------------------- 0.00
0.19 1/1 _start 3 2
65.5 0.00 0.19 1 main
2 0.03 0.16 52/52
depthfirst 1 ---------------------------
--------------------
....
66
Other Tools on LinuX / Solaris
  • You can also use gcov as a profiling tool to help
    discover where your optimization efforts will
    best affect your code
  • how often each line of code executes
  • what lines of code are actually executed
  • how much computing time each section of code uses
  • http//gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html
    Gcov-Intro

67
Trading Space for Time
  • Neural Network simulations very often use a
    function called the sigmoid

68
Trading Space for Time
  • Neural Network simulations very often use a
    function called the sigmoid
  • For large positive x sigmoid(x) ? 1
  • For large negative x
  • sigmoid (x) ? 0

69
Computing the Sigmoid
double sigmoid (x) // k 1 return (1.0
/ (1.0 exp(-x)))
70
Computing the Sigmoid
  • exp(-x) takes too long to compute!
  • Such functions use series expansion
  • Taylor /Maclaurin Series
  • Sum terms of the sort ((-x)n / n!)
  • Each term involves floating point calculations
  • Typical neural network simulations call this
    functions hundreds of millions of times in a run.
  • Hence, sigmoid(x) accounts for most of the time
    spent (possibly over 70-80)

71
Computing the Sigmoid
  • Instead of computing the function all the time
  • Compute the function at N points once and build a
    table.
  • During actual call to sigmoid
  • Find nearest entries to x and lookup values for
    those
  • Perform linear interpolation

72
Computing the Sigmoid
. . .
73
Computing the Sigmoid
if (x ltx0) return (0.0)
. . .
if (x gt x99) return (1.0)
74
Computing the Sigmoid
Actual sigmoid
Local Linear Interpolation
. . .
X
Xi
Xi1
75
Computing the Sigmoid
slope(0)
x0
inter(0)
slope(1)
inter(1)
Actual sigmoid
slope(2)
inter(2)
slope(3)
inter(3)
slope(4)
inter(4)
slope(5)
inter(5)
Local Linear Interpolation
slope(6)
inter(6)
. . .
. . .
X
Xi
Xi1
sigmoid(x) ? slope(i) x inter(i)
slope(98)
inter(98)
76
Computing the Sigmoid
slope(0)
x0
inter(0)
slope(1)
inter(1)
slope(2)
inter(2)
slope(3)
inter(3)
slope(4)
inter(4)
slope(5)
inter(5)
slope(6)
inter(6)
. . .
. . .
sigmoid(x) ? slope(i) x inter(i)
slope(98)
inter(98)
77
Computing the Sigmoid
  • Select the number of points (N 1000, 10000,
    etc.) depending on accuracy you require
  • Space needed is 2 float/double for each point for
    slope and intercept (8 16 bytes per point)
  • Initialize the table when you start execution

78
Computing the Sigmoid
  • You also know X0
  • Compute Delta X1-X0
  • Compute Xmax X0 N Delta
  • Given X
  • Compute i (X X0)/Delta
  • 1 float add, 1 float divide
  • Compute sigmoid(x)
  • 1 float multiplication, 1 float add

79
Results
  • Using exp(x)
  • Each call takes about 300 nanoseconds (2 Ghz
    Pent. V)

80
Results
  • Using exp(x)
  • Each call takes about 300 nanoseconds on a 2 Ghz
    Pentium 4.
  • Using table lookup and linear interpolation.
  • Each call takes about 30 nanoseconds
  • Ten fold speed-up
  • in exchange for 64K to 640 K extra memory usage.
Write a Comment
User Comments (0)
About PowerShow.com