Fortran 90 Parallel Programming with - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Fortran 90 Parallel Programming with

Description:

OpenMP is an Application Programming Interface (API) consisting of: OpenMP Compiler ... Sentinel. OpenMP Directive. Compile and Execute F90 program. SERIAL run: ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 25
Provided by: nickche
Category:

less

Transcript and Presenter's Notes

Title: Fortran 90 Parallel Programming with


1
Fortran 90 Parallel Programming with
  • CESUP/UFRGS June 10, 2003

2
Components of OpenMP
  • OpenMP is an Application Programming Interface
    (API) consisting of
  • OpenMP Compiler Directives
  • Runtime Library Routines
  • Environmental Variables

3
My First F90 OpenMP program
  • PROGRAM HELLO
  • INTEGER NTHREADS, TID
  • ! integer OMP_GET_NUM_THREADS,
    OMP_GET_THREAD_NUM
  • TID 0 NTHREADS 1
  • ! Fork a team of NTHREADS threads
  • !OMP PARALLEL PRIVATE(TID)
  • ! Obtain thread number
  • ! TID OMP_GET_THREAD_NUM()
  • PRINT , 'Hello World from thread ', TID
  • ! Only master thread does this
  • IF (TID .EQ. 0) THEN
  • ! NTHREADS OMP_GET_NUM_THREADS()
  • PRINT , 'Number of threads ', NTHREADS
  • END IF

Runtime Library Routines
OpenMP Directive
Sentinel
OpenMP Directive
4
Compile and Execute F90 program
  • SERIAL run
  • f90 -x omp -o hello.x omp_hello.f90
  • hello.x
  • PARALLEL OpenMP run
  • f90 -o hello.x omp_hello.f90
  • env OMP_NUM_THREADS4 \
  • env OMP_SCHEDULESTATIC hello.x

5
Output from My First F90 program
  • jupiter/home0/nick/115 f90 -x omp -o hello.x \
  • My_First_OpemMP.f90 hello.x
  • Hello World from thread 0
  • Number of threads 1
  • jupiter/home0/nick/116 f90 -o hello_p.x \
  • My_First_OpemMP.f90
  • jupiter/home0/nick/117 env \
  • OMP_NUM_THREADS4 env \
  • OMP_SCHEDULESTATIC hello_p.x
  • Hello World from thread 2
  • Number of threads 4
  • Hello World from thread 0
  • Hello World from thread 1
  • Hello World from thread 3

6
Parallel Region Constructor
!OMP PARALLEL PRIVATE(TID) ! TID
OMP_GET_THREAD_NUM() PRINT , 'Hello World
from thread ', TID !OMP END PARALLEL

TID0
Serial Region
TID3
TID0
TID1
TID2
Parallel Region
7
Parallel Region Sum Series Example
  • The program presented in the NOTES PAGES sums the
    following series up to N terms
  • Prod_ABC 1.2.3 2.3.4 N(N1)(N2)
  • The closed form for the above sum is
  • Close_Form_Ans N(N1)(N2)(N3)/4
  • The program is serial (i.e. it uses 1 cpu) and
    our job is to parallelize this program.

8
First Attempt at Parallelizing the Program
  • Following segments were added
  • Defaults for serial run
  • Parallel directive in which
  • i_start and i_end are calculated
  • adjustment of i_end for last slave
  • Many printout statements
  • Looks good, but will it work ?
  • Are there any DATA DEPENDENCES ?

9
Comments about the First Attempt
  • The variable NUM_THREADS has a SHARED attribute.
    There is no need for each thread in the parallel
    region to call the omp_get_num_threads()
    function.
  • The shared variables Scal_Prod_ABC and Prod_ABC
    must be synchronized.
  • The work in the original DO loop was distributed
    among the THREADS. Each thread starts at its own
    i_start value. All threads perform my_width
    iterations, except for the last thread which
    might be shorter.

10
Work-sharing construct inside a parallel region
  • The parallel region
  • !OMP PARALLEL
  • !OMP DO
  • ltdo-loopgt
  • !OMP END DO
  • !OMP DO
  • ltanother do-loopgt
  • !OMP END DO
  • !OMP END PARALLEL

11
PARALLEL DO construct
  • The parallel construct
  • !OMP PARALLEL DO clause1,
  • ltsingle do-loopgt
  • !OMP END PARALLEL DO

12
A more compact formulation
  • The following program uses the PARALLEL DO
    construct and also resolves the DATA DEPENDENCE
    by introducing a CRITICAL directive

13
Comments on the Compact Formulation
  • The directive !OMP PARALLEL DO is the same as
    the pair of directives
  • !OMP PARALLEL
  • !OMP DO
  • single-DO-LOOP
  • !OMP END DO
  • !OMP END PARALLEL

14
Final version with a SUM REDUCTION
  • The PARALLEL DO construct has two clauses
  • REDUCTION(Scal_Prod_ABC) to take care of the
    critical section.
  • PRIVATE clause for the Prod_ABC variable.

15
Parallel Region Constructor - Revisited
!OMP PARALLEL clause1 clause2 ltstructured
block of codegt !OMP END PARALLEL
Master thread
Serial Region
thread3
thread0
thread1
thread2
Parallel Region
16
FIRSTPRIVATE and LASTPRIVATE (1/3)

X
Serial Region
Parallel Region
X
X
X
X
Parallel Region
A
B
C
D
Serial Region
D
17
FIRSTPRIVATE and LASTPRIVATE (2/3)
  • The initial and final values of PRIVATE variables
    are unspecified
  • A FIRSTPRIVATE variable is private, and its
    initial value is copied from the preceding serial
    region into the current parallel region
  • A LASTPRIVATE variable is private, and its final
    value is copied into the serial region following
    the current parallel region

18
FIRSTPRIVATE and LASTPRIVATE (3/3)
  • Before and after parallel region array variable
    zzz is global.
  • Inside parallel region zzz is recomputed but only
    last two elements of array zzz are copied to the
    next serial region.
  • All elements of zzz are zero in the serial region
    following the parallel region, except for the
    first two elements.

19
DATA SCOPING - Lexical and Dynamic
  • program bom_dia
  • !OMP PARALLEL
  • call greetings
  • !OMP END PARALLEL
  • end
  • subroutine greetings
  • ! external OMP_GET_THREAD_NUM
  • ! integer OMP_GET_THREAD_NUM
  • integer TID
  • character12,dimension(03) saludos
  • DATA saludos /"Bom dia","Buenos dias,
  • Good morning","Bon jour"/
  • !OMP CRITICAL
  • TID OMP_GET_THREAD_NUM()
  • write(6,1001) TID, saludos(TID)
  • !OMP END CRITICAL
  • 1001 format("TID",i1," ",a12)
  • return
  • end

LEXICAL EXTENT
D Y E N X A T M E I N C T
20
Threadprivate Directive (1/5)
  • The program appearing in NOTES PAGES produces
    different results when run serially and when run
    in parallel. Why ?

21
Threadprivate Directive (2/5)
  • The variables istart, iend are used as arguments
    in the call work subroutine, the COMMON was
    removed, but the results are still different. Why
    ?
  • Because the DO REDUCTION is using some values of
    array iarray before they are defined !
  • Must first define all values of iarray then start
  • the summation process !

22
Threadprivate Directive (3/5)
  • Here we use a separate region for the sum
    REDUCTION. Now parallel and serial versions
    produce identical results
  • isum 338350
  • closed_form 338350

23
Threadprivate Directive (4/5)
  • The threadprivate directive makes the COMMON
    block above it private to each thread. This is a
    useful alternative to the previous program,
    especially when the number of variables that need
    to be private is large.

24
Threadprivate Directive (5/5)
  • The COPYIN clause copies the threadprivate
    variables from the master thread to all the
    slaves
  • OUTPUT
  • NN 10
  • isum 385
  • closed_form 385
  • NN 100
  • isum 338350
  • closed_form 338350
Write a Comment
User Comments (0)
About PowerShow.com