UPC at CRDLBNL - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

UPC at CRDLBNL

Description:

Higher WHIRL. Lower WHIRL. Compiler based on Open64. Multiple ... Intermediate form called WHIRL. Leverage standard optimizations and analyses. Pointer analysis ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 16
Provided by: yel3
Category:
Tags: crdlbnl | upc | whirl

less

Transcript and Presenter's Notes

Title: UPC at CRDLBNL


1
UPC at CRD/LBNL
  • Kathy Yelick
  • Dan Bonachea, Jason Duell, Paul Hargrove, Parry
    Husbands, Costin Iancu, Mike Welcome, Christian
    Bell

2
What is UPC?
  • UPC is an explicitly parallel language
  • Global address space
    can read/write remote memory
  • Programmer control over
    layout and scheduling
  • From Split-C, AC, PCP
  • Why a new language?
  • Easier to use than MPI, especially for program
    with complicated data structures
  • Possibly faster on some machines, but current
    goal is comparable performance

p0
p1
p2
3
Background
  • UPC efforts elsewhere
  • IDA Bill Carlson, UPC promoter
  • GMU (documentation) and UMC (benchmarking)
  • HP (Alpha cluster and CMPI compiler (with MTU))
  • Cray (implementations)
  • Intrepid (SGI and t3e compiler)
  • UPC Book
  • T. El-Ghazawi, B. Carlson, T. Sterling, K. Yelick
  • 3 chapters in draft form goal is to have proofs
    by SC03
  • Three components of NERSC effort
  • Compilers for DOE machines (SP and PC clusters)
  • Runtime systems for ours and other compilers
  • Applications and benchmarks

4
UPC Funding
  • Base program funding K52004
  • Compiler/translator work
  • Applications
  • Runtime for DOE machines
  • Part of Pmodels Center K52018
  • Runtime support common to Titanium (and hopefully
    CoArray Fortran, at some point)
  • Collaboration with ARMCI group
  • NSA funding
  • UPC for clusters

5
Compiler Status
  • NERSC compiler/translator
  • Costin Iancu and Wei Chen
  • Translates UPC to C Berkeley UPC Runtime
  • Based on Open64 compiler for C
  • Status
  • Complete in prototype form
  • Debugging, tuning, extensions ongoing
  • Release planned for next month
  • Quadrics, Myrinet, IBM/SP, and MPI
  • Shared memory/process implementation is next
  • Investigating optimization opportunities
  • Communication optimizations
  • UPC language optimizations

6
UPC Compiler
  • Compiler based on Open64
  • Multiple front-ends, including gcc
  • Intermediate form called WHIRL
  • Leverage standard optimizations and analyses
  • Pointer analysis
  • Loop optimizations
  • Current focus on C backend
  • IA64 possible in future
  • UPC Runtime built on GASNet
  • Portable
  • Language-independent

UPC
Higher WHIRL
Optimizing transformations
C Runtime
Lower WHIRL
Assembly IA64, MIPS, Runtime
7
Portable Runtime Support
  • Developing a runtime layer that can be easily
    ported and tuned to multiple architectures.

Direct implementations of parts of full GASNet
Runtime Global pointers (opaque type with rich
set of pointer operations), memory management,
job startup, etc.
Generic support for UPC, CAF, Titanium
GASNet Extended API Supports put, get, locks,
barrier, bulk, scatter/gather
GASNet Core API Small interface based on
Active Messages
Core sufficient for functional implementation
GASNet released 1/03
8
Communication Optimizations
  • Characterizing performance of current machines
  • Latency, overlap (communication computation)
  • Plan to automatically optimization using
    communication performance model
  • Preliminary results 10x improvement on Matmul

9
Performance without Communication
10
Preliminary Parallel Performance
11
Costs of Pointer-to-Shared Arithmetic Berkeley
vs. HP
  • HP is faster for most operations, since HP
    generates assembly code
  • Both compilers optimize for phaseless pointers
  • For some operations, Berkeley can beat the HP
    (ptr comparison)
  • Expect gap to narrow once the proper
    optimizations are built-in for Berkeley UPC

12
Applications
  • NAS Parallel Benchmark Sized Apps
  • UPC MG complete
  • UPC CG complete
  • UPC GUPS
  • GWU has done IS, EP, and FT
  • Planning on
  • Several Splash benchmarks
  • Sparse Cholesky
  • Possibly AMR

13
Mesh Generation
  • Parallel Mesh Generation in UPC
  • 2D Delaunay triangulation
  • Based on Triangle software by Shewchuk (UCB)
  • Parallel version from NERSC uses dynamic load
    balancing, software caching, and parallel sorting

14
Summary
  • Lots of progress on
  • Compiler
  • Runtime
  • Portable communication layer (GASNet)
  • Applications
  • Working on developing a large application that
    depends on UPC
  • Mesh generation
  • AMR (?), Sparse LU (?)

15
Future Plans
  • Runtime support for Intrepid
  • Gcc-based open source compiler
  • Performance tuning of runtime
  • Additional machines (Infiniband, X1, Dolphin)
  • Optimization of compiled code
  • Communication optimizations
  • Automatic search-based optimizations
  • Application efforts
Write a Comment
User Comments (0)
About PowerShow.com