OR682Math685CSI700 - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

OR682Math685CSI700

Description:

The Unix 'time' command (last week) User time: time spent executing the ... parallelism. computation proportional to k2 or k3. communication proportional to k ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 18
Provided by: stephe129
Category:

less

Transcript and Presenter's Notes

Title: OR682Math685CSI700


1
OR682/Math685/CSI700
  • Lecture 14
  • Fall 2000

2
Todays Topics
  • Timing Profiling
  • Chapter 6
  • Eliminating Clutter
  • Chapter 7
  • Loop Optimizations
  • Chapter 8

3
Timing
  • The Unix time command (last week)
  • User time time spent executing the instructions
    in your program
  • System time Input/Output, page faults,
    floating-point exceptions
  • CPU time sum of above
  • Elapsed time wall-clock time
  • Useful for timing entire programs

4
Timing a Section of a Program
  • Use software that is part of the language
  • Fortran etime
  • real4 tarray(2), etime
  • start etime(tarray)
  • finish etime(tarray)
  • print , finish-start
  • Matlab tic, toc
  • Work on an empty machine
  • Average several runs

5
Profiling
  • ltexample using Matlabgt
  • Gives detailed information on execution times of
    individual functions and statements
  • In many cases, dominated by a single function or
    statement

Matlab files Heath10_3.m, etc.
6
Using .mex Files
  • Replace a Matlab function with a function
    programmed in C or Fortran
  • ltsee examplegt
  • Rest of lecture
  • other ways to improve performance
  • eliminating clutter
  • loop optimizations

Matlab files run_mex_setup.m, run_mex.m, yp.m,
yprime.c
7
Eliminating Clutter
  • Contributions to overhead
  • subroutine calls, indirect memory references,
    tests within loops, type conversions
  • Restrictions to compiler flexibility
  • subroutine calls, indirect memory references,
    tests within loops, ambiguous pointers

8
Subroutine Calls
  • Overhead to process subroutine call
  • Compiler often cannot optimize code across
    subroutines (especially if in different files)
  • ltsee example pages 129-130gt
  • BUT dont make your program unreadable

fortran files eg_sub1.f, eg_sub2.f, eg_sub_run
9
Alternatives to Subroutines
  • Macros simple procedures
  • substituted at pre-processor stage
  • Procedure inlining
  • handled by compiler directives
  • useful for not so simple procedures
  • see man f77 search for inlin

fortran files eg_macro.F, eg_macro_run
10
Branches Within Loops
  • Can lead to severe inefficiencies
  • see examples on pages 134-139
  • go over
  • loop invariant conditionals
  • loop index dependent conditionals
  • conditionals that transfer control

fortran files eg_except1.f, eg_except2.f,
eg_except_run, man f77 search for except
11
Data Type Conversions
  • Conversions integer, single, double
  • Adds overhead to loops
  • Can be subtle
  • 1, 0.3, 3.0e-1, 3.0d-1, .3000000000000000

12
Loop Optimizations
  • Loop unrolling
  • Loop interchange
  • Blocking
  • plus others in text

13
Loop Unrolling
  • Sometimes done automatically by compiler
  • Reduce loop overhead by explicitly expanding
    several iterations
  • DO I 1,N,4
  • A(I) A(I) B(I) C
  • A(I1) A(I1) B(I1) C
  • A(I2) A(I2) B(I2) C
  • A(I3) A(I3) B(I3) C
  • END DO
  • Needs tidy up calculations at end

http//www.netlib.org/slatec/lin/scopy.f
14
Loop Interchange
  • For nested loops, consider changing the order of
    the inner and outer loops
  • example pages 156-161
  • May not be possible for the compiler to figure out

15
Example Matrix Multiplication
  • Traditional formula for C AB
  • In original form, not ideal for computation
  • Can be improved by re-arrangement

fortran files eg_matmul1.f, eg_matmul2.f,
eg_matmul_run
16
Example Blocking
  • Organize a matrix calculation to work with
    submatrices of a specified size k
  • Allows program to exploit
  • cache
  • parallelism
  • computation proportional to k2 or k3
  • communication proportional to k

fortran file eg_block.f
17
To Finish Up
  • What weve accomplished
  • What comes next
Write a Comment
User Comments (0)
About PowerShow.com