Introduction to OpenMP - PowerPoint PPT Presentation

About This Presentation

Title:

Introduction to OpenMP

Description:

Threads all start at same time then synchronize at a barrier at the end to ... firstprivate(var) and lastprivate(var) clauses. x[0] ... variable set and then ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 30

Provided by: quinno

Learn more at: https://students.cs.byu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to OpenMP

1
Introduction to OpenMP

For a more detailed tutorial see
http//www.openmp.org
Look at the presentations

2
Concepts

Directive based programming
declare properties of language structures
(sections, loops)
scope variables
A few service routines
get information
Compiler options
Environment variables

3
OpenMP Programming Model

fork-join parallelism
Master thread spawns a team of threads as needed.

4
Typical OpenMP Use

Generally used to parallelize loops
Find most time consuming loops
Split iterations up between threads

void main() double Res1000 pragma omp
parallel for for(int i0ilt1000i)
do_huge_comp(Resi)
void main() double Res1000 for(int
i0ilt1000i) do_huge_comp(Resi)

5
Thread Interaction

OpenMP operates using shared memory
Threads communicate via shared variables
Unintended sharing can lead to race conditions
output changes due to thread scheduling
Control race conditions using synchronization
synchronization is expensive
change the way data is stored to minimize the
need for synchronization

6
Syntax format

Compiler directives
C/C
pragma omp construct clause clause
Fortran
COMP construct clause clause
!OMP construct clause clause
OMP construct clause clause
Since we use directives, no changes need to be
made to a program for a compiler that doesnt
support OpenMP

7
Using OpenMP

Compilers can automatically place directives with
option
-qsmpauto
xlf_r and xlc do a good job
some loops may speed up, some may slow down
Compiler option required when you write in
directives
-qsmpomp (ibm)
-mp (sgi)
Can mix directives with automatic parallelization
-qsmpautoomp
Scoping variables is the hard part!
shared variables, thread private variables

8
OpenMP Directives

5 categories
Parallel Regions
Worksharing
Data Environment
Synchronization
Runtime functions / environment variables
Basically the same between C/C and Fortran

9
Parallel Regions

Create threads with omp parallel
Threads share A (default behavior)
Threads all start at same time then synchronize
at a barrier at the end to continue with code.

double A1000 omp_set_num_threads(4) pragma
omp parallel int ID omp_get_thread_num() do
something(ID, A)
10
Sections construct

The sections construct gives a different
structured block to each thread
By default there is a barrier at the end. Use the
nowait clause to turn off.

pragma omp parallel pragma omp
sections X_calculation() pragma omp
section y_calculation() pragma omp
section z_calculation()
11
Work-sharing constructs

the for construct splits up loop iterations
By default, there is a barrier at the end of the
omp for. Use the nowait clause to turn off
the barrier.

pragma omp parallel pragma omp for for
(I0IltNI) NEAT_STUFF(I)
12
Short-hand notation

Can combine parallel and work sharing constructs
There is also a parallel sections construct

pragma omp parallel for for (I0IltNI) NEA
T_STUFF(I)
13
A Rule

In order to be made parallel, a loop must have
canonical shape

index index index-- --index index
inc index - inc index index inc index
inc index index index inc
lt lt gt gt
for (indexstart index end
)
14
An example
pragma omp parallel for private(j) for (i 0 i
lt BLOCK_SIZE(id,p,n) i) for (j 0 j lt n
j) aij MIN(aij, aik tmpj)
By definition, private variable values are
undefined at loop entry and exit To change this
behavior, you can use the firstprivate(var) and
lastprivate(var) clauses
x0 complex_function() pragma omp parallel
for private(j) firstprivate(x) for (i 0 i lt n
i) for (j 0 j lt m j) xj g(i,
xj-1) answeri xj xi
15
Scheduling Iterations

The schedule clause effects how loop iterations
are mapped onto threads
schedule(static ,chunk)
Deal-out blocks of iterations of size chunk to
each thread.
schedule(dynamic,chunk)
Each thread grabs chunk iterations off a queue
until all iterations have been handled.
schedule(guided,chunk)
Threads dynamically grab blocks of iterations.
The size of he block starts large and shrinks
down to size chunk as the calculation proceeds.
schedule(runtime)
Schedule and chunk size taken from the
OMP_SCHEDULE environment variable.

16
An example
pragma omp parallel for private(j)
schedule(static, 2) for (i 0 i lt n i) for
(j 0 j lt m j) xjj g(i, xj-1)
You can play with the chunk size to meet load
balancing issues, etc.
17
Scheduling considerations

Dynamic is most general and provides load
balancing
If choice of scheduling has (big) impact on
performance, something is wrong
overhead too big gt work in loop too small
n can be specification expression, not just
constant

18
Reductions

Sometimes you want each thread to calculate part
of a value then collapse all that into a single
value
Done with reduction clause

area 0.0 pragma omp parallel for private(x)
reduction (area) for (i 0 i lt n i) x
(i 0.5)/n area 4.0/(1.0 xx) pi
area / n
19
Fortran Parallel Directives

PARALLEL / END PARALLEL
PARALLEL SECTIONS / SECTION / SECTION / END
PARALLEL SECTIONS
DO / END DO
work sharing directive for DO loop immediately
following
PARALLEL DO / END PARALLEL DO
combined section and work sharing

20
Serial Directives

MASTER / END MASTER
executed by master thread only
DO SERIAL / END DO SERIAL
loop immediately following should not be
parallelized
useful with -qsmpompauto

21
Synchronization Directives

BARRIER
inside PARALLEL, all threads synchronize
CRITICAL (lock) / END CRITICAL (lock)
section that can be executed by one thread only
lock is optional name to distinguish several
critical constructs from each other

22
An example
double area, pi, x int i, n area
0.0 pragma omp parallel for private(x) for (i
0 i lt n i) x (i 0.5)/n pragma omp
critical area 4.0/(1.0 xx) pi area /
n
23
Scope Rules

Shared memory programming model
most variables are shared by default
Global variables are shared
But not everything is shared
stack variables in functions are private
variable set and then used in DO is PRIVATE
array whose subscript is constant w.r.t. PARALLEL
DO and is set and then used within the DO is
PRIVATE

24
Scope Clauses

DO and for directive has extra clauses, the most
important
PRIVATE (variable list)
REDUCTION (op variable list)
op is sum, min, max
variable is scalar, XLF allows array

25
Scope Clauses (2)

PARALLEL and PARALELL DO and PARALLEL SECTIONS
have also
DEFAULT (variable list)
scope determined by rules
SHARED (variable list)
IF (scalar logical expression)
directives are like programming language
extension, not compiler option

26
integer i,j,n real8 a(n,n), b(n)
read (1) b !OMP PARALLEL DO !OMP PRIVATE (i,j)
SHARED (a,b,n) do j1,n do i1,n
a(i,j) sqrt(1.d0 b(j)i) end
do end do !OMP END PARALLEL DO
27
Matrix Multiply
!OMP PARALLEL DO PRIVATE(i,j,k) do j1,n do
i1,n do k1,n c(i,j) c(i,j)
a(i,k) b(k,j) end do end do end do
28
Analysis