An Evaluation of AutoScoping in OpenMP - PowerPoint PPT Presentation

About This Presentation
Title:

An Evaluation of AutoScoping in OpenMP

Description:

user scoping overrides Polaris scoping. can parallelize loops that cannot be fully auto-scoped ... Sun compiler could auto-scope some regions that Polaris could not ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 20
Provided by: Michae7
Category:

less

Transcript and Presenter's Notes

Title: An Evaluation of AutoScoping in OpenMP


1
An Evaluation of Auto-Scoping in OpenMP
  • Michael Voss, Eric Chiu, Patrick Chow, Catherine
    Wong and Kevin Yuen
  • ECE Department
  • University of Toronto

2
An Overview of Auto-scoping
  • Dieter an Mey proposed Auto-scoping as an
    extension to OpenMP (www.cOMPunity.org)
  • Relieve users from burden of explicit scoping
  • error prone
  • tedious
  • compromise explicit and automatic
    parallelization
  • analysis is similar to automatic parallelization
  • successful in 1 of 2 scientific programs

3
Using DEFAULT(AUTO)
  • COMP PARALLEL DO SHARED(A,B)
  • COMPPRIVATE(I,J)
  • DO I 1,100
  • DO J 1,100
  • A(I,J) A(J,I)
  • B(I,J)
  • ENDDO
  • ENDDO
  • COMP END PARALLEL DO

COMP PARALLEL DO COMPDEFAULT(AUTO) DO I
1,100 DO J 1,100 A(I,J) A(J,I)
B(I,J) ENDDO ENDDO COMP END
PARALLEL DO
4
Outline of Talk
  • Introduction
  • Implementing DEFAULT(AUTO) in Polaris
  • An evaluation of DEFAULT(AUTO) in Polaris
  • comparison with EA Sun Studio 9 F95 compiler
  • A Discussion of runtime support
  • Related Work
  • Conclusion

5
Implementing DEFAULT(AUTO) in Polaris
  • Polaris is auto-parallelizer for Fortran 77
  • Supports a range of advanced techniques
  • The Range Test
  • The Omega Test
  • Array and Scalar Privatization
  • Array and Scalar Reduction Recognition
  • Induction Variables Substitution
  • Interprocedural Constant Propagation
  • Most Interprocedural Optimization by Inlining

6
Polaris as an OMP to OMP Translator
Polaris
Parser DDtest pass Reduction pass Privatization
pass OpenMP Backend
Fortran 77
Fortran 77 OpenMP
Polaris
Parser Moerae Backend
Fortran 77 Moerae calls
Fortran 77 OpenMP
Original automatic parallelization path OpenMP to
explicitly threaded code path New OpenMP to
OpenMP path
7
Supporting DEFAULT(AUTO)
  • Parse DEFAULT(AUTO)
  • React appropriately to user directives
  • selective loop parallelization
  • no changes without AUTO directive
  • user scoping overrides Polaris scoping
  • can parallelize loops that cannot be fully
    auto-scoped
  • Limitations
  • only regions with PARALLEL DO semantics
  • bails out on general parallel regions

8
Example 1 No explicit scoping
!OMP PARALLEL DEFAULT(AUTO) DO N 1,7
DO M 1,7 !OMP DO DO L
LSS(itsub),LEE(itsub) I IG(L)
J JG(L) K KG(L)
LIJK L2IJK(L) RHS(L,M) RHS(L,M)
- FJAC(LIJK,LM00,M,N)DQCO(i-1,j,k,n,NB)FM00(
L) - FJAC(LIJK,LP00,M,N)DQCO(i1,j,k,n,NB)
FP00(L) - FJAC(LIJK,L0M0,M,N)DQCO(i,j-1,k
,n,NB)F0M0(L) - FJAC(LIJK,L0P0,M,N)DQCO(i
,j1,k,n,NB)F0P0(L) ENDDO !OMP END DO
NOWAIT ENDDO ENDDO !OMP END
PARALLEL
9
Example 1 No explicit scoping
!OMP PARALLEL !OMPDEFAULT(SHARED)!OMPPRIVAT
E(M,L,N) DO n 1, 7, 1 DO m 1,7,
1 !OMP DO DO l lss(itsub),
lee(itsub), 1 rhs(l, m) rhs(l,
m)(-dqco(ig(l), (-1)jg(l), kg(l), n, nb))
f0m0(l)fjac(l2ijk(l), l0m0, m, n)(-dqco(ig(l),
1jg(l), kg(l), n , nb))f0p0(l)fjac(l2ijk(
l), l0p0, m, n)(-dqco((-1)ig(l), jg(l) ,
kg(l), n, nb))fjac(l2ijk(l), lm00, m,
n)fm00(l)(-dqco(1ig(l) , jg(l), kg(l),
n, nb))fjac(l2ijk(l), lp00, m,
n)fp00(l) ENDDO !OMP END DO NOWAIT
ENDDO ENDDO !OMP END PARALLEL
10
Example 2 Explicit scoping
SUBROUTINE RECURSION(n,k,a,b,c,d,e,f,g,h,s)
REAL8 A(),B(),C(),D(),E(),F(),G(),H(
) REAL8 T,S INTEGER N,K,I S
0.0D0 COMP PARALLEL SHARED(D) COMPDEFAULT(AUTO)
COMP DO DO I 1,N T F(I)
G(I) A(I) B(I) C(I) D(IK)
D(I) E(I) H(I) H(I) T S S
H(I) END DO COMP END DO COMP END
PARALLEL END
11
Example 2 Explicit scoping
SUBROUTINE recursion(n, k, a, b, c, d, e,
f, g, h, s) DOUBLE PRECISION a, b, c, d, e,
f, g, h, s, t INTEGER4 i, k, n
DIMENSION a(), b(), c(), d(), e(), f(),
g(), h() s 0.0D0 !OMP
PARALLEL !OMPDEFAULT(SHARED) !OMPPRIVATE(T,I)
!OMP DO !OMPREDUCTION(s) DO i 1, n,
1 t f(i)g(i) a(i) b(i)c(i)
d(ik) d(i)e(i) h(i) h(i)t
s h(i)s ENDDO !OMP END DO !OMP
END PARALLEL RETURN END
12
Evaluation of DEFAULT(AUTO)
  • Fortran 77 Benchmarks from SPEC OpenMP
  • removed all explicit scoping
  • added DEFAULT(AUTO) to all regions
  • used Omni OpenMP compiler as backend (-O2)
  • Explicit speedup vs- auto-scope speedup
  • four processor Xeon server
  • 1.8 GHz processors, 16 GBytes main memory
  • Hyperthreaded, but only used 1 thread per CPU
  • Also used EA Sun Studio 9 Fortran 95 compiler
  • supports DEFAULT(__AUTO)
  • report number of regions auto-scoped

13
Performance of Auto-scoping
Sun results are for the Early Access Version of
the Sun Microsystems Studio 9 Fortran 95 compiler.
14
Discussion
  • Many regions were not fully analyzable
  • Polaris could not fully inline the regions
  • several regions were general parallel regions
  • Early Access Sun Studio 9 compiler
  • auto-scoped fewer regions in general
  • missed important regions in Swim and Mgrid
  • regions could be parallelized but not auto-scoped
  • Sun compiler could auto-scope some regions that
    Polaris could not
  • can analyze general parallel regions

15
A general parallel region from WupwisePolaris
fails but the Sun compiler succeeds
COMP PARALLEL DEFAULT(AUTO) LSCALE ZERO
LSSQ ONE COMP DO DO IX 1, 1 (N
- 1) INCX, INCX IF (DBLE (X(IX)) .NE.
ZERO) THEN ... LSSQ ONE
LSSQ (LSCALE / TEMP) 2 LSCALE
TEMP END IF ... END
DO COMP END DO COMP CRITICAL IF (SCALE
.LT. LSCALE) THEN SSQ ((SCALE / LSCALE)
2) SSQ LSSQ SCALE LSCALE
ELSE SSQ SSQ ((LSCALE / SCALE) 2)
LSSQ END IF COMP END CRITICAL COMP END
PARALLEL
16
Runtime Support for Auto-scoping
  • add speculate directive for regions that cannot
    be auto-scoped
  • applies to very few regions in SPEC OpenMP
  • requires interprocedural marking of reads/writes
  • only 2 regions not auto-scoped can be fully
    analyzed

!OMP PARALLEL !OMPDEFAULT(SHARED) !OMPPRIVATE
(U51K,U41K,U31K,Q,U21K,M,K,I,U41,U31KM1,U51KM1,U21
KM1) !OMPPRIVATE(U41KM1,TMP,J) !OMPSPECULATE(U
TMP,RTMP) !OMP DO !OMPLASTPRIVATE(FLUX2)
DO j jst, jend, 1 ... ENDDO !OMP
END DO !OMP END PARALLEL (a region from the RHS
subroutine of Applu)
17
Related Work
  • DEFAULT(AUTO) proposed by Dieter an Mey
  • Many commercial and research auto-parallelizers
  • Polaris, SUIF, CAPO,
  • Perform parallelization and scoping
  • The EA Sun Studio 9 Fortran 95 Compiler
  • paper also here at WOMPAT
  • thanks to Yuan Lin for pointing me to it
  • Runtime dependence testing
  • Saltz, Rauchwerger,

18
Conclusion
  • Implemented DEFAULT(AUTO) in Polaris
  • created full OpenMP to OpenMP translator
  • added facilities for auto-scoping
  • Evaluated implementation
  • 2 of 5 benchmarks fully auto-scoped
  • remainder showed significant loss of speedup
  • results different from EA Sun compiler
  • performance not portable across compilers
  • Discussed speculative parallelization support

19
Conclusion cont
  • Combination of loop and region analyzer
  • Polaris auto-scoped more regions
  • Sun compiler can handle general regions
  • Performance not be portable across compilers
  • never is but
  • sacrifice performance for convenience
  • perhaps a useful tool during manual
    parallelization
  • Future work
  • general region support in Polaris
Write a Comment
User Comments (0)
About PowerShow.com