Administrative Stuff - PowerPoint PPT Presentation

About This Presentation
Title:

Administrative Stuff

Description:

... example with use of the mask option. ... to be side-effect free so can be executed ... TEMPLATE Directive. Declares an abstract scalar or an array. no ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 49
Provided by: amitab
Learn more at: http://charm.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Administrative Stuff


1
Administrative Stuff
  • Location change for the lecture on Friday March 2
    - Education Building Room 2 - one lecture only.

2
CS 320/ECE 392/CSE 302
Data Parallelism in High Performance Fortran

Department of Computer Science University of
Illinois at Urbana-Champaign
3
Contents
  • High Performance Fortran
  • Parallelism constructs
  • FORALL
  • PURE functions
  • INDEPENDENT
  • Data Distribution Directives
  • ALIGN
  • DISTRIBUTE
  • TEMPLATE
  • PROCESSORS

4
References
  • HPF specification (v2.0 available online
    http//dacnet.rice.edu/Depts/CRPC/HPFF/versions/hp
    f2/hpf-v20/index.html
  • Includes material from Documentation, slides and
    Papers on HPF at Rice University.

5
What is HPF?
  • HPF is a standard for data-parallel programming.
  • Extends Fortran-77 or Fortran-90 (in theory also
    C - not used in practice).

6
Principle of HPF
  • Extending sequential language with data
    distribution directives that specify on which
    processor a certain part of an array should
    reside.
  • Source program for single processor.
  • Compiler then produces
  • data parallel program (SPMD),
  • communication between the processes.

7
What the Standard Says
  • Can be used with both Fortran-77 and Fortran-90.
  • Distribution directives are just a hint, compiler
    can ignore them.
  • HPF can be used on both shared memory and
    distributed memory hardware platforms.

8
In Reality
  • HPF is always used with Fortran-90.
  • Distribution directives are a must.
  • HPF used on both shared memory and distributed
    memory platforms.
  • But the truth is that the language was really
    meant for distributed memory platforms.

9
HPF Additional Expressions of Parallelism
  • FORALL (data parallel) array assignment.
  • PURE functions
  • INDEPENDENT construct.

10
FORALL Array Assignment
  • FORALL( subscript lower_bound upper_bound
    stride, mask) array-assignment
  • Execute all iterations of the subscript loop in
    parallel for the given set of indices, where mask
    is true.
  • May have multiple dimensions.
  • Same semantics first compute right hand side,
    then assign to left hand side.
  • Only one assignment to particular element (not
    checked by the compiler!).

11
Examples
  • Example1
  • do i 1,100
  • X(i,i) 0.0
  • enddo
  • becomes
  • FORALL(i1100) X(i,i) 0.0
  • Example2
  • FORALL(i150) D(i) E(2i-1) E(2i)

12
Examples
  • A multiple dimension example with use of the mask
    option.
  • Set all the elements of X above the diagonal to
    the sum of their indices.
  • FORALL(i1100, j1100, iltj) X(i,j) ij

13
PURE functions/subroutines
  • Defined to be side-effect free so can be executed
    concurrently.
  • Example If nitns() is declared as a PURE
    functions then
  • FORALL(i1M, j1N) mandel(i,j)
    nitns(CMPLX(.1I, .1J))

14
The INDEPENDENT Clause
  • !HPF INDEPENDENT
  • DO
  • ENDDO
  • Specifies that the iterations of the loop can be
    executed in any order (concurrently).

15
Examples
  • !HPF INDEPENDENT
  • DO i1, 100
  • DO j 1, 100
  • IF(i.NE.j) A(i,j) 1.0
  • IF(i.EQ.j) A(i,j) 0.0
  • ENDDO
  • ENDDO

16
Examples Nesting
  • !HPF INDEPENDENT
  • DO i1, 100
  • !HPF INDEPENDENT
  • DO j 1, 100
  • IF(i.NE.j) A(i,j) 1.0
  • IF(i.EQ.j) A(i,j) 0.0
  • ENDDO
  • ENDDO

17
HPF/Fortran-90 Matrix Multiply
  • C MATMUL( A, B )

18
HPF Matrix Multiply
  • C 0.0
  • do k 1, n
  • FORALL(i1n, j1n )
  • C(i,j) C(i,j) A(i,k) B(k,j)
  • enddo

19
HPF Matrix Multiply
  • !HPF INDEPENDENT
  • DO i1,n
  • DO j1,n
  • C(i,j) 0.0
  • DO k1,n
  • C(i,j) C(i,j) A(i,k) B(k,j)
  • ENDDO
  • ENDDO
  • ENDDO

20
HPF Matrix Multiply
  • !HPF INDEPENDENT
  • DO i1,n
  • !HPF INDEPENDENT
  • DO j1,n
  • C(i,j) 0.0
  • DO k1,n
  • C(i,j) C(i,j) A(i,k) B(k,j)
  • ENDDO
  • ENDDO
  • ENDDO

21
PROCESSORS Directive
  • Declare abstract processor arrangements (single
    processors or processor arrays)

!HPF PROCESSORS p(4), q(NUMBER_OF_PROCESSORS()
/2, 2)
22
ALIGN Directive
  • Relates elements of an array to those of another
    array or templace such that the aligned elements
    are stored on the same processor(s)

REAL a(4), b(4), c(8), a2(4,4), b2(4,4) !HPF
ALIGN a() with b() !HPF ALIGN a() with
c(282) !HPF ALIGN a() with c(41-1) !HPF
ALIGN a() with b2(, ) !HPF ALIGN a2(,) with
b () !HPF ALIGN a2(I,J) with b2(J,I)
23
TEMPLATE Directive
  • Declares an abstract scalar or an array
  • no storage allocated
  • used just for data alignment and distribution
    data objects can be aligned with templates and
    templates can be distributed.

!HPF TEMPLATE t(10), t2(10,10), u(mn)
24
Data Distributions
  • HPF provides data distribution directives to
    specify which processor owns what data.
  • Owner-computes rule the owner of the data does
    the computation on the data.
  • Goal improve locality, reduce communication, and
    improve performance.

25
Data Distribution Definition
  • !HPF DISTRIBUTE ltarraygt ltdistributiongt
  • ltdistributiongt (in each dimension)
  • no distribution
  • BLOCK
  • BLOCK(k) k is block size, default n/p
  • CYCLIC
  • CYCLIC(k) k is cycle size, default 1
  • Array without distribution is replicated.

26
Data Distribution Examples
  • !HPF DISTRIBUTE A(BLOCK,BLOCK)

27
Data Distribution Examples
  • !HPF DISTRIBUTE A(BLOCK,)

28
Data Distribution Examples
  • !HPF DISTRIBUTE A(,BLOCK)

29
Data Distribution Examples
  • !HPF DISTRIBUTE A(,CYCLIC)

30
Data Distribution Examples
  • !HPF DISTRIBUTE A(,CYCLIC(2))

31
Data Distribution Examples
  • !HPF DISTRIBUTE A(BLOCK,CYCLIC)

32
Difference between OpenMP and HPF
  • In OpenMP, user specifies distribution of
    iterations.
  • Data travels to processor executing iteration.
  • In HPF, user specifies distribution of data.
  • Computation is done by processor owning data.

33
HPF Matrix Multiply
  • !HPF DISTRIBUTE C(BLOCK,)
  • ltstandard matrix multiply codegt
  • Leads to the same computation as the OpenMP
    expression of matrix multiply each processor
    computes a contiguous set of rows.

34
HPF Matrix Multiply
  • !HPF DISTRIBUTE C(,BLOCK)
  • lt standard matrix multiply code gt
  • Would cause each processor compute a contiguous
    set of columns.

35
HPF Matrix Multiply
  • !HPF DISTRIBUTE C(BLOCK,BLOCK)
  • lt standard matrix multiply code gt
  • Each processor computes a rectangular sub-array
    of the result.

36
Gaussian elimination
  • (without pivoting)
  • for( i0 iltn i )
  • for( ji1 jltn j )
  • for( ki1 kltn k )
  • ajk ajk - aikaij / ajj
  • For-j loop is outermost parallelizable loop.

37
OpenMP Gauss
  • for( i0 iltn i )
  • pragma omp parallel for private(k)
  • for( ji1 jltn j )
  • for( ki1 kltn k )
  • ajk ajk - aikaij / ajj

38
HPF Gauss
  • !HPF DISTRIBUTE A(CYCLIC,)
  • DO i1,n
  • !HPF INDEPENDENT
  • DO ji1,n
  • DO ki1,n
  • A(j,k) A(j,k) - A(i,k)A(i,j)/A(j,j)

39
Difference with OpenMP Gauss
  • In HPF, cyclic distribution of A is useful for
    load balance.
  • In OpenMP, block scheduling of iterations
    suffices (because iterations are re-distributed
    at each new pragma).

40
Difference with OpenMP Gauss
  • In HPF, each processor keeps on working on the
    same data/row (owner computes).
  • In OpenMP, data/rows move between processors.
  • HPF potentially more efficient (increased
    locality, more about this later).

41
How an HPF compiler works
  • Parallelization based on Fortran-90 and HPF
    concurrency constructs.
  • Assign data to processor based on distributions.
  • Compute data on owning processor.
  • Move other data necessary for computation to that
    processor.

42
Hard Part of HPF Compiler
  • Communication optimization.
  • Avoid lots of small messages, optimize towards
    few large messages.
  • Absolutely critical to good performance.

43
Performance impact of distribution
  • Back to Matrix Multiply
  • !HPF DISTRIBUTE C(BLOCK,)
  • Causes C to be row-distributed, A and B to be
    replicated.
  • No communication.

44
Performance impact of distribution
  • Back to Matrix Multiply
  • !HPF DISTRIBUTE C(BLOCK,), A(BLOCK,)
  • Causes C and A to be row-distributed, and B to be
    replicated.
  • No communication.

45
Performance impact of distribution
  • Back to Matrix Multiply
  • !HPF DISTRIBUTE C(BLOCK,), A(,BLOCK)
  • Causes C to be row-distributed, A to be
    column-distributed, and B to be replicated.
  • Lots of communication!

46
Performance impact of distribution
C
A
B

X
(replicated)
This data will have to move to processor 0
47
Things Can Get Worse
  • Sometimes the compiler cannot determine exactly
    what data needs to be moved.
  • B(i) A(INDEX(i))
  • (where INDEX(i) is determined dynamically)
  • Conservative estimation needs to be made.
  • Often leads to broadcast of all data.
  • Better methods are known but difficult.

48
Summary
  • Data parallelism features of HPF
  • Comparison with OpenMP
Write a Comment
User Comments (0)
About PowerShow.com