Title: OR682Math685CSI700
1OR682/Math685/CSI700
2Todays Topics
- Timing Profiling
- Chapter 6
- Eliminating Clutter
- Chapter 7
- Loop Optimizations
- Chapter 8
3Timing
- The Unix time command (last week)
- User time time spent executing the instructions
in your program - System time Input/Output, page faults,
floating-point exceptions - CPU time sum of above
- Elapsed time wall-clock time
- Useful for timing entire programs
4Timing a Section of a Program
- Use software that is part of the language
- Fortran etime
- real4 tarray(2), etime
- start etime(tarray)
-
- finish etime(tarray)
- print , finish-start
- Matlab tic, toc
- Work on an empty machine
- Average several runs
5Profiling
- ltexample using Matlabgt
- Gives detailed information on execution times of
individual functions and statements - In many cases, dominated by a single function or
statement
Matlab files Heath10_3.m, etc.
6Using .mex Files
- Replace a Matlab function with a function
programmed in C or Fortran - ltsee examplegt
- Rest of lecture
- other ways to improve performance
- eliminating clutter
- loop optimizations
Matlab files run_mex_setup.m, run_mex.m, yp.m,
yprime.c
7Eliminating Clutter
- Contributions to overhead
- subroutine calls, indirect memory references,
tests within loops, type conversions - Restrictions to compiler flexibility
- subroutine calls, indirect memory references,
tests within loops, ambiguous pointers
8Subroutine Calls
- Overhead to process subroutine call
- Compiler often cannot optimize code across
subroutines (especially if in different files) - ltsee example pages 129-130gt
- BUT dont make your program unreadable
fortran files eg_sub1.f, eg_sub2.f, eg_sub_run
9Alternatives to Subroutines
- Macros simple procedures
- substituted at pre-processor stage
- Procedure inlining
- handled by compiler directives
- useful for not so simple procedures
- see man f77 search for inlin
fortran files eg_macro.F, eg_macro_run
10Branches Within Loops
- Can lead to severe inefficiencies
- see examples on pages 134-139
- go over
- loop invariant conditionals
- loop index dependent conditionals
- conditionals that transfer control
fortran files eg_except1.f, eg_except2.f,
eg_except_run, man f77 search for except
11Data Type Conversions
- Conversions integer, single, double
- Adds overhead to loops
- Can be subtle
- 1, 0.3, 3.0e-1, 3.0d-1, .3000000000000000
12Loop Optimizations
- Loop unrolling
- Loop interchange
- Blocking
- plus others in text
13Loop Unrolling
- Sometimes done automatically by compiler
- Reduce loop overhead by explicitly expanding
several iterations - DO I 1,N,4
- A(I) A(I) B(I) C
- A(I1) A(I1) B(I1) C
- A(I2) A(I2) B(I2) C
- A(I3) A(I3) B(I3) C
- END DO
- Needs tidy up calculations at end
http//www.netlib.org/slatec/lin/scopy.f
14Loop Interchange
- For nested loops, consider changing the order of
the inner and outer loops - example pages 156-161
- May not be possible for the compiler to figure out
15Example Matrix Multiplication
- Traditional formula for C AB
- In original form, not ideal for computation
- Can be improved by re-arrangement
fortran files eg_matmul1.f, eg_matmul2.f,
eg_matmul_run
16Example Blocking
- Organize a matrix calculation to work with
submatrices of a specified size k - Allows program to exploit
- cache
- parallelism
- computation proportional to k2 or k3
- communication proportional to k
fortran file eg_block.f
17To Finish Up
- What weve accomplished
- What comes next