Co-Array Fortran - PowerPoint PPT Presentation

About This Presentation
Title:

Co-Array Fortran

Description:

expect best performance by leveraging vendor F90 compiler. Co-arrays ... Porting to a new compiler / architecture ... Compiler optimization of communication and I/O ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 14
Provided by: rick367
Category:

less

Transcript and Presenter's Notes

Title: Co-Array Fortran


1
Co-Array Fortran
  • Open-source compilers and tools for
  • scalable global address space computing
  • John Mellor-Crummey
  • Rice University

2
Outline
  • Co-array Fortran
  • language overview
  • CAF compiler status and preliminary results
  • language and compiler research issues
  • interactions
  • OpenMP
  • compiler and runtime strategies for improving
    scalability
  • Dragon tool
  • hybrid MPI OpenMP
  • Open64 infrastructure
  • source-to-source and source-to-object code
    infrastructure

3
Co-Array Fortran (CAF)
  • Explicitly-parallel extension of Fortran 90/95
    (Numrich Reid)
  • Global address space SPMD parallel programming
    model
  • one-sided communication
  • Simple, two-level model that supports locality
    management
  • local vs. remote memory
  • Programmer control over performance critical
    decisions
  • data partitioning
  • communication
  • Suitable for mapping to a range of parallel
    architectures
  • shared memory, message passing, hybrid, PIM
  • Much in common with UPC

4
CAF Programming Model Features
  • SPMD process images
  • fixed number of images during execution
  • images operate asynchronously
  • Both private and shared data
  • real y(20, 20) a private 20x20 array in each
    image
  • real y(20, 20) a shared 20x20 array in each
    image
  • Simple one-sided shared-memory communication
  • x(, jj2) y(r, ) pp2 copy rows from
    pp2 into local columns
  • Flexible synchronization
  • sync_team(notify ,wait)
  • notify a vector of process ids to signal
  • wait a vector of process ids to wait for
  • Pointers and (perhaps asymmetric) dynamic
    allocation
  • Parallel I/O

5
One-sided Communication with Co-Arrays
integer a(10,20) if (thisimage() gt 1)
a(15,110) a(15,110)thisimage()-1
6
Finite Element Example (Numrich)
  • subroutine assemble(start, prin, ghost, neib, x)
  • integer start(), prin(), ghost(),
    neib(), k1, k2, p
  • real x()
  • call sync_all(neib)
  • do p 1, size(neib) ! Add contributions from
    ghost regions
  • k1 start(p) k2 start(p1)-1
  • x(prin(k1k2)) x(prin(k1k2))
    x(ghost(k1k2)) neib(p)
  • enddo
  • call sync_all(neib)
  • do p 1, size(neib) ! Update the ghosts
  • k1 start(p) k2 start(p1)-1
  • x(ghost(k1k2)) neib(p) x(prin(k1k2))
  • enddo
  • call synch_all
  • end subroutine assemble

7
Portable CAF Compiler
  • Compile CAF to Fortran 90 runtime support
    library
  • source-to-source code generation for wide
    portability
  • expect best performance by leveraging vendor F90
    compiler
  • Co-arrays
  • access data in generated code using F90 pointers
  • allocate storage with dope vector initialization
    outside F90
  • Porting to a new compiler / architecture
  • synthesize compatible dope vectors for co-array
    storage
  • tailor communication to architecture

8
CAF Compiler Status
  • Near production-quality F90 front end from Open64
  • Working prototype for a CAF subset
  • allocate co-arrays using static constructor-like
    strategy
  • co-array access
  • remote data access uses ARMCI get/put
  • process local data access uses load/store
  • synch_all, synch_team synchronization
  • multi-dimensional array section operations
  • Successfully compiled and executed NAS MG
  • platforms SGI Origin, IA64 Myrinet
  • performance similar to hand-coded MPI

9
NAS MG Efficiency (Class C)
IA64/Myrinet 2000
10
CAF Compiler Coming Attractions
  • Co-arrays as procedure arguments
  • Triplet notation for co-dimensions
  • Co-arrays of user defined types
  • types can contain pointers
  • Dynamic allocation of co-arrays
  • Compiler support for parallel I/O

11
CAF Language Research Issues
  • Synchronization
  • locks instead of critical sections
  • split-phase primitives
  • synch_team/synch_all semantics can require
    pairwise notification
  • may need synchronization matching hints to enable
    optimization
  • Language support for efficient reductions
  • manually-coded reductions unlikely to yield
    portable performance
  • Memory consistency model for co-array data
  • Controlling process to processor mapping
  • Support for hierarchical locality domains
  • support work sharing on SMPs?

12
CAF Compiler Research Issues
  • Aim for performance transparency
  • Compiler optimization of communication and I/O
  • multi-mode communication direct load/store
    RDMA
  • combine synchronization with communication
  • put/get with flag
  • one-sided ? two-sided communication
  • transform from get to put communication
  • exploit split-phase communication and
    synchronization
  • communication vectorization
  • latency hiding for communication and parallel I/O
  • platform-tailored optimization
  • synchronization strength reduction
  • Interoperability with other parallel programming
    models
  • Optimizations to improve node performance

13
CAF Interactions
  • Working with CAF code from Numrich and Wallcraft
    (NRL)
  • Refining ARMCI synchronization with Nieplocha
  • Designing parallel I/O design for CAF with UIUC
  • Exploring language design with Numrich and
    Nieplocha
  • Coordinating with Rasmussen (LANL) on Fortran 90
    array dope vector interface library
  • Planning a fall CAF workshop at PSC
  • coordinating with Ralph Roskies, Sergiu
    Sanielevici
  • encouragement from Rich Hirsch, Fred Johnson
Write a Comment
User Comments (0)
About PowerShow.com