High Performance Fortran HPF - PowerPoint PPT Presentation

About This Presentation

Title:

High Performance Fortran HPF

Description:

HPF$ DISTRIBUTE A(BLOCK) ONTO pr !HPF$ DISTRIBUTE C(CYCLIC) ONTO pr ... Fortran: an historical object lesson' by Ken Kennedy, Charles Koelbel, Hans Zima. ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 26

Provided by: csVu

Category:

more less

Transcript and Presenter's Notes

Title: High Performance Fortran HPF

1
High Performance Fortran (HPF)

Source
Chapter 7 of "Designing and building parallel
programs (Ian Foster, 1995)

2
Question

Can't we just have a clever compiler generate a
parallel program from a sequential program?
Fine-grained parallelism
x ab cd
Trivial parallelism
for i 1 to 100 do
for j 1 to 100 do
C i, j dotproduct ( A i,, B , j
)
od
od

3
Automatic parallelism

Automatic parallelization of any program is
extremely hard
Solutions
Make restrictions on source program
Restrict kind of parallelism used
Use semi-automatic approach
Use application-domain oriented languages

4
High Performance Fortran (HPF)

Designed by a forum from industry, government,
universities
Extends Fortran 90
To be used for computationally expensive
numerical applications
Portable to SIMD machines, vector processors,
shared-memory MIMD and distributed-memory MIMD

5
Fortran 90 - Base language of HPF

Extends Fortran 77 with 'modern' features
abstract data types, modules
recursion
pointers, dynamic storage
Array operators
A B C
A A 1.0
A(17) B(17) B(28)
WHERE (X / 0) X 1.0/X

6
Data parallelism

Data parallelism same operation applied to
different data elements in parallel
Data parallel program sequence of data parallel
operations
Overall approach
Programmer does domain decomposition
Compiler partitions operations automatically
Data may be regular (array) orirregular (tree,
sparse matrix)
Most data parallel languages only dealwith arrays

7
Data parallelism - Concurrency

Explicit parallel operations
A B C ! A, B, and C are arrays
Implicit parallelism
do i 1,m
do j 1,n
A(i,j) B(i,j) C(i,j)
enddo
enddo

8
Compiling data parallel programs

Programs are translated automatically into
parallel SPMD (Single Program Multiple Data)
programs
Each processor executes same program on subset of
the data
Owner computes rule
- Each processor owns subset of the data
structures
- Operations required for an element are executed
by the owner
- Each processor may read (but not modify) other
elements

9
Example

real s, X(100), Y(100) ! s is scalar, X and Y
are arrays
X X 3.0 ! Multiply each X(i) by 3.0
do i 2,99
Y(i) (X(i-1) X(i1))/2 ! Communication
required
enddo
s SUM(X) ! Communication required
X and Y are distributed (partitioned)
s is replicated on each machine

10
HPF primitives for data distribution

Directives
PROCESSORS shape size of abstract processors
ALIGN align elements of different arrays
DISTRIBUTE distribute (partition) an array
Directives affect performance of the program, not
its result

11
Processors directive

!HPF PROCESSORS P(32)
!HPF PROCESSORS Q(4,8)
Mapping of abstract to physical processors not
specified in HPF (implementation-dependent)

12
Alignment directive

Aligns an array with another array
Species that specific elements should be mapped
to the same processor
real A(50), B(50), C(50,50)
!HPF ALIGN A(I) WITH B(I)
!HPF ALIGN A(I) WITH B(I2)
!HPF ALIGN A() WITH C(,)

13
Figure 7.6 from Foster's book
14
Distribution directive

Species how elements should be partitioned among
the local memories
Each dimension can be distributed as follows
no distribution
BLOCK (n) block distribution
CYCLIC (n) cyclic distribution

15
Figure 7.7 from Foster's book
16
Example Successive Over relaxation (SOR)

Recall algorithm discussed in Introduction

float G1N, 1M, Gnew1N, 1M for (step
0 step lt NSTEPS step) for (i 2 i lt N
i) / update grid / for (j 2 j lt M
j) Gnewi,j f(Gi,j, Gi-1,j,
Gi1,j,Gi,j-1, Gi,j1) G Gnew
17
Parallel SOR with message passing

float Glb-1ub1, 1M, Gnewlb-1ub1, 1M
for (step 0 step lt NSTEPS step)
SEND(cpuid-1, Glb) / send 1st row left
/
SEND(cpuid1, Gub) / send last row
right /
RECEIVE(cpuid-1, Glb-1) / receive from
left /
RECEIVE(cpuid1, Gub1) / receive from
right /
for (i lb i lt ub i) / update my rows
/
for (j 2 j lt M j)
Gnewi,j f(Gi,j, Gi-1,j, Gi1,j,
Gi,j-1, Gi,j1)
G Gnew

18
Finite differencing ( SOR) in HPF

See Ian Foster, Program 7.2 uses convergence
criterion instead of fixed number of steps
program hpf_finite_difference
!HPF PROCESSORS pr(4) ! use 4 CPUs
real X(100, 100), New(100, 100) ! data
arrays
!HPF ALIGN New(,) WITH X(,)
!HPF DISTRIBUTE X(BLOCK,) ONTO pr ! row-wise
New(299, 299) (X(198, 299) X(3100,
299)
X(299, 198) X(299, 3100))/4
diffmax MAXVAL (ABS (New-X))
end

19
Changing the distribution

Use block distribution instead of row
distribution
program hpf_finite_difference
!HPF PROCESSORS pr(2,2) ! use 2x2 grid
real X(100, 100), New(100, 100) ! data
arrays
!HPF ALIGN New(,) WITH X(,)
!HPF DISTRIBUTE X(BLOCK, BLOCK) ONTO pr !
block-wise
New(299, 299) (X(198, 299) X(3100,
299) X(299, 198) X(299, 3100))/4
diffmax MAXVAL (ABS (New-X))
end

20
Performance

Distribution affects
Load balance
Amount of communication
Example (communication costs)
!HPF PROCESSORS pr(3)
integer A(8), B(8), C(8)
!HPF ALIGN B() WITH A()
!HPF DISTRIBUTE A(BLOCK) ONTO pr
!HPF DISTRIBUTE C(CYCLIC) ONTO pr

21
Figure 7.9 from Foster's book
22
Historical Evaluation

See The rise and fall of High Performance
Fortran an historical object lesson by Ken
Kennedy, Charles Koelbel, Hans Zima.In
Proceedings of the third ACM SIGPLAN conference
on History of programming languages, June 2007
Optional, obtainable from ACM Digital Library

23
Problems with HPF

Immature compiler technology
Upgrading to Fortran 90 was complicated
Implementing HPF extensions took much time
HPC community was impatient and started using MPI
Missing features
Support for sparse array and other irregular data
structures
Obtaining portable performance was difficult
Performance tuning was difficult

24
Impact of HPF

Huge impact on parallel language design
Very frequently cited
Some impact on OpenMP (shared-memory standard)
New wave of High Productivity Computing Systems
(HPCS) languages Chapel (Cray), Fortress (Sun),
X10 (IBM)
Used in extended form (HPF/JA) for Japanese Earth
Simulator

25
Conclusions