Title: HPF%20(High%20Performance%20Fortran)
1- HPF (High Performance Fortran)
2What is HPF?
- HPF is a standard for data-parallel programming.
- Extends Fortran-77 or Fortran-90.
- Similar extensions exist for C and C, but
Fortran is really the focus.
3Principle of HPF
- Extending sequential language with data
distribution directives. - Data distribution directives specify on which
processor a certain part of an array should
reside. - Compiler then produces
- parallel program,
- communication between the processes.
4What the Standard Says
- Can be used with both Fortran-77 and Fortran-90.
- Distribution directives are just a hint, compiler
can ignore them. - HPF can be used on both shared memory and
distributed memory hardware platforms.
5In Commercial Use
- HPF is always used with Fortran-90.
- Distribution directives are a must.
- HPF used on both shared memory and distributed
memory platforms. - But the truth is that the language was really
meant for distributed memory platforms.
6Not to Confuse You
- We will discuss commercial use
- Fortran-90
- Concurrency extensions to Fortran-90 in HPF.
- HPF data distribution directives.
- How HPF maps to a distributed memory platform.
- Afterwards, we will discuss what the standard
allows in addition.
7Fortran-90
- Fortran a number of array features.
- Scalar operations are extended to arrays.
- Intrinsic functions are extended to arrays.
- Additional array-based intrinsic functions.
8Array Assignment
- Scalar assignment
- integer a, b, c
- a b c
- Array assignment
- integer A(10,10), B(10,10), C(10,10)
- A B C
9Requirements for Array Assignment
- Arrays must be comformable
- have the same number of dimensions, and
- have the same size in each dimension.
- One major exception for scalar is allowed
- integer A(10,10), B(10,10), c
- A B c
10Intrinsic Functions Extended to Arrays
- integer A(10,10), B(10,10)
- A SQRT(A)
- B ABS(A)
11Additional Array Intrinsic Functions
- MAXVAL, MINVAL
- MAXLOC, MINLOC
- return array of indices
- SUM, PRODUCT
- MATMUL, DOT_PRODUCT, TRANSPOSE
12Examples
- real A(100,100), B(100), s
- int i(1), j(2)
- s SUM(A)
- i MAXLOC(B)
- j MINLOC(A)
- C DOT_PRODUCT(B, A)
13Array Sections
- array( lower_bound upper_bound stride )
- Refers to the section of the array between
lower_bound and upper_bound, with an optional
stride specified. - Multiple dimensions may be specified, with the
obvious meaning. - Array sections may be used wherever arrays may be
used.
14Examples
- int A(10), B(10), C(10)
- int D(50), E(100), F(100)
- int max
- int G(100), H(100,100)
- A(18) B(18) C(29)
- D E(11002) F(2992)
- max MAXVAL( G(110010) )
- max MINVAL( H(1100, 150) )
15Semantics of Array Assignments
- First, the entire right hand side is evaluated.
- Then, assignments are made to the left hand side.
16Example
- int A(4) 7, 8, 12, 14
- A(23) A(12)
- gt results in A being 7, 7, 8, 14
- gt not 7, 7, 7, 14
17Sequential/Parallel Fortran-90
- Fortran-90 is a sequential language.
- However, its array assignment semantics makes it
easy to parallelize it (automatically).
18Not Perfect, Though (1 of 2)
- do i 1,100
- X(i,i) 0.0
- enddo
- Obviously parallelizable.
- Not expressible as a Fortran-90 array assignment
(only regular sections).
19Not Perfect, Though (2 of 2)
- int D(50), E(100), F(100)
- D E(11002) F(2992)
- is correct, but
- int D(100), E(100), F(100)
- D E(11002) F(2992)
- is not, because array D is not conformable.
20HPF Additional Expressions of Parallelism
- FORALL array assignment.
- INDEPENDENT construct.
21FORALL Array Assignment
- FORALL( subscript lower_bound upper_bound
stride, mask) array-assignment - Execute all iterations of the subscript loop in
parallel for the given set of indices, where mask
is true. - May have multiple dimensions.
- Same semantics first compute right hand side,
then assign to left hand side. - Only one assignment to particular element (not
checked by the compiler!).
22Examples (1 of 3)
- do i 1,100
- X(i,i) 0.0
- enddo
- becomes
- FORALL(i1100) X(i,i) 0.0
23Examples (2 of 3)
- int D(100), E(100), F(100)
- D E(11002) F(21002)
- becomes (correctly)
- FORALL(i150) D(i) E(2i-1) E(2i)
24Examples (3 of 3)
- A multiple dimension example with use of the mask
option. - Set all the elements of X above the diagonal to
the sum of their indices. - FORALL(i1100, j1100, iltj) X(i,j) ij
25The INDEPENDENT Clause
- !HPF INDEPENDENT
- DO
-
- ENDDO
- Specifies that the iterations of the loop can be
executed in any order.
26Examples (1 of 2)
- !HPF INDEPENDENT
- DO i1, 100
- DO j 1, 100
- IF(i.NE.j) A(i,j) 1.0
- IF(i.EQ.j) A(i,j) 0.0
- ENDDO
- ENDDO
27Examples (2 of 2) Nesting
- !HPF INDEPENDENT
- DO i1, 100
- !HPF INDEPENDENT
- DO j 1, 100
- IF(i.NE.j) A(i,j) 1.0
- IF(i.EQ.j) A(i,j) 0.0
- ENDDO
- ENDDO
28HPF/Fortran-90 Matrix Multiply (1 of 4)
29HPF Matrix Multiply (2 of 4)
- C 0.0
- FORALL(i1n, j1n )
- C(i,j) C(i,j) A(i,k) B(k,j)
30HPF Matrix Multiply (3 of 4)
- !HPF INDEPENDENT
- DO i1,n
- DO j1,n
- C(i,j) 0.0
- DO k1,n
- C(i,j) C(i,j) A(i,k) B(k,j)
- ENDDO
- ENDDO
- ENDDO
31HPF Matrix Multiply (4 of 4)
- !HPF INDEPENDENT
- DO i1,n
- !HPF INDEPENDENT
- DO j1,n
- C(i,j) 0.0
- DO k1,n
- C(i,j) C(i,j) A(i,k) B(k,j)
- ENDDO
- ENDDO
- ENDDO
32HPF/Fortran-90 SOR (1 of 4)
- TEMP(1n,1n) 0.25
- ( GRID(1n,0n-1) GRID(1n,2n1)
- GRID(0n-1,1n) GRID(2n1,1n) )
- GRID(1n,1n) TEMP(1n,1n)
33HPF/Fortran-90 SOR (1 of 4)
- GRID(1n,1n) 0.25
- ( GRID(1n,0n-1) GRID(1n,2n1)
- GRID(0n-1,1n) GRID(2n1,1n) )
- Also works, because of array assignment rules
34HPF SOR (2 of 4)
- FORALL(i1n,j1n)
- TEMP(i,j) 0.25
- ( GRID(i-1,j) GRID(i1,j)
- GRID(i,j-1) GRID(i,j1) )
- FORALL(i1n,j1,n)
- GRID(i,j) TEMP(i,j)
35HPF SOR (3 of 4)
- !HPF INDEPENDENT
- DO I1,n
- DO j1,n
- TEMP(i,j) 0.25
- ( GRID(i-1,j) GRID(i1,j)
- GRID(i,j-1) GRID(i,j1) )
- !HPF INDEPENDENT
- DO I1,n
- DO j1,n
- GRID(i,j) TEMP(i,j)
36HPF SOR (4 of 4)
- !HPF INDEPENDENT
- DO I1,n
- !HPF INDEPENDENT
- DO j1,n
- TEMP(i,j) 0.25
- ( GRID(i-1,j) GRID(i1,j)
- GRID(i,j-1) GRID(i,j1) )
- !HPF INDEPENDENT
- DO I1,n
- !HPF INDEPENDENT
- DO j1,n
- GRID(i,j) TEMP(i,j)