Moderator: - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Moderator:

Description:

embrace user hints to guide communication placement and optimization ... Kernel, benchmark, and application driven studies. assess strengths and weaknesses of models ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 17
Provided by: johnmello6
Category:
Tags: moderator

less

Transcript and Presenter's Notes

Title: Moderator:


1
Programming Languages/Models and Compiler
Technologies
Moderator John Mellor-Crummey Department of
Computer Science Rice University
Microsoft Manycore Workshop June 21,
2007
2
Panelists
  • David August - Princeton University
  • Saman Amarasinghe - Massachusetts Institute of
    Technology
  • Guy Blelloch - Carnegie Mellon University
  • Charles Leiserson - Massachusetts Institute of
    Technology
  • Uzi Vishkin - University of Maryland, College
    Park

3
Architectural Challenges
  • Significant parallelism
  • Multiple kinds of parallelism
  • cores
  • ILP
  • SIMD
  • Diversity of cores
  • Run-time throttling of cores for power mgmt
  • Memory hierarchy
  • bandwidth
  • near term will continue to be a significant
    bottleneck
  • long term 3D stacked memory?
  • long and often non-uniform memory latencies
  • scratch pads

4
Roles of Parallel Programming Models
  • Enhance programmer productivity through
    abstraction
  • Manage platform resources to deliver performance
  • Provide standard interface for platform
    portability

5
The Goal
  • Simpler ways of conceptualizing, expressing,
  • debugging, and tuning scalable parallel programs
  • Multiple models will be necessary
  • Models will necessarily trade off simplicity,
    expressivity, relevance to legacy code, and
    performance

6
To Succeed, Parallel Programming Models Must
  • Be ubiquitous
  • cross platform
  • at a minimum laptops, SMP servers
  • distributed memory clusters?
  • Be expressive
  • Be productive
  • easy to write
  • easy to read and maintain
  • easy to reuse
  • Have a promise of future availability and
    longevity
  • Be efficient
  • Be supported by tools

7
Simplifying Parallel Programming
  • A high-level parallel language should
  • Provide global address space
  • beware exposed buffering
  • Separate concerns partitioning, mapping, and
    synchronization vs. algorithm specification
  • viscosity comes from premature mingling of
    these issues
  • Enable programmer to manage locality at a high
    level
  • locality performance
  • affinity between data and computation
  • e.g. HPFs ON HOME declarations

8
Design Issues I
  • Ultimate control vs. simplicity of use
  • library developers vs. productivity users
  • should it be the same language for both?
  • extensible language model (Suns Fortress)
  • kitchen sink model (X10)
  • Implicit vs. explicit parallelism
  • implicit parallelism is often more malleable
  • better supports dynamic adaptation
  • Compiler assisted vs. compiler-centric
  • Co-array Fortran and UPC
  • user control over work decomposition, data
    movement, and synchronization
  • HPF compiler must deliver or all is lost
  • Lazy vs. eager parallelism
  • Cilks lazy parallelism provides a model for
    scalable binaries
  • eager parallelism adds unnecessary overhead

9
Design Issues II
  • Deterministic vs. non-deterministic models
  • deterministic clocked final model
  • Saraswat et al. (www.saraswat.org/cf.pdf)
  • Static vs. dynamic scheduling
  • dynamic scheduling will be increasingly important
  • irregular computations, task parallelism
  • adaptive scheduling in response to core
    throttling
  • Cooperative vs. independent scheduling of work
  • does benefit of shared cache outweigh difficulty
    of using it?
  • tightly synchronous vs. more loosely synchronous
  • Scalable to distributed-memory ensembles?
  • broad community probably only cares about
    tightly-coupled platforms
  • some government and industry clients will always
    have extreme needs
  • Importance of managing affinity between cores and
    data
  • important for highest efficiency for library
    developers

10
Transactions are not THE Answer
  • Transactions are a piece of the puzzle atomicity
  • Other aspects of the parallel programming problem
  • identifying concurrency
  • partitioning work
  • ordering actions

11
Autotuning
  • Seductive idea
  • Very successful as a library-based approach
  • FFTW, Atlas, OSKI,
  • Much work needed to apply to applications rather
    than kernels
  • huge search space
  • progress in effective truncated search
  • model guidance can be effective
  • autotuning for parallelism
  • dangerously close to automatic parallelization

12
Rice Experience Lessons from HPF
  • Good data and computation partitionings are
    essential
  • without good partitionings, parallelism suffers
  • flexible user-control is essential
  • Excess communication undermines scalability
  • both frequency and volume must be right
  • embrace user hints to guide communication
    placement and optimization
  • e.g. HPF/JA directives REFLECT, LOCAL, PIPELINE,
    etc.
  • Single processor efficiency is critical
  • must use caches effectively on microprocessors
  • Icache beware of complex machine-generated code
  • Dcache beware of communication footprint
  • Optimizing tightly-coupled algorithms can be hard
  • if the compiler doesnt optimize it, performance
    may be doomed!

13
Rice Experience HPF vs. Co-array Fortran
  • Rice dHPF - a decade of investment in compiler
    technology
  • not quite, govt cut funding here too, just like
    architecture
  • polyhedral code generation models (like Lethin
    described)
  • Co-array Fortran for clusters
  • a few years effort by a pair of students
  • Result Co-array Fortran bests HPF
  • more expressive
  • higher performance
  • shorter time to solution
  • currently, can be HARDER to program than MPI

14
Principal Compiler and Runtime Challenges
  • Exploiting multiple levels of heterogeneous
    parallelism
  • Choreographing parallelism, data movement,
    synchronization
  • Managing memory hierarchy
  • cache
  • scratch pad

Warning Dont try this at home.
15
Programming Model Ecosystem Issues
  • Semantic mismatch between programming model and
    execution model
  • Debugging data races and non-determinism
  • Performance analysis why isnt performance
    scaling
  • insufficient parallelism
  • parallelism is too fine grain to be efficient
  • architecture level issues, e.g., false sharing

16
A Path Forward
  • Kernel, benchmark, and application driven studies
  • assess strengths and weaknesses of models
  • Explore alternatives evaluate their effects on
  • simplicity
  • expressiveness
  • correctness
  • performance
Write a Comment
User Comments (0)
About PowerShow.com