Title: Implementing Tomorrow's Programming Languages
1Implementing Tomorrow's Programming Languages
- Rudi Eigenmann
- Purdue University
- School of ECE
- Computing Research institute
- Indiana, USA
2How to find Purdue University
3Computing Research Institute (CRI)
Other DP Centers Bioscience Nanotechnology E-Ente
rprise Entrepreneurship Learning Advanced
Manufacturing Environment Oncology
CRI is the high-performance computing branch of
Discovery Parks
4Compilers are the Center of the Universe
- The compiler translates
- the programmers
- view
- into
- the machines
- view
5Why is Writing Compilers Hard? a high-level view
- Translation passes are complex algorithms
- Not enough information at compile time
- Input data not available
- Insufficient knowledge of architecture
- Not all source code available
- Even with sufficient information, modeling
performanceis difficult - Architectures are moving targets
6Why is Writing Compilers Hard? from an
implementation angle
- Interprocedural analysis
- Alias/dependence analysis
- Pointer analysis
- Information gathering and propagation
- Link-time, load-time, run-time optimization
- Dynamic compilation/optimization
- Just-in-time compilation
- Autotuning
- Parallel/distributed code generation
7Its Even Harder Tomorrow
- Because we want
- All our programs to work on multicore processors
- Very High-level languages
- Do weather forecast
- Composition Combine weather forecast with
energy-reservation and cooling manager - Reuse warn me if Im writing a module that
exists out there.
8How Do We Get There?Paths towards tomorrows
programming language
- Addressing the (new) multicore challenge
- Automatic Parallelization
- Speculative Parallel Architectures
- SMP languages for distributed systems
- Addressing the (old) general software engineering
challenge - High-level languages
- Composition
- Symbolic analysis
- Autotuning
9The Multicore Challenge
- We have finally reached the long-expected speed
wall for the processor clock. - (this should not be news to you!)
- one of the biggest disruptions in the
evolution of information technology. - Software engineers who do not know parallel
programming will be obsolete in no time.
10Automatic ParallelizationCan we implement
standard languages on multicore?
Polaris A Parallelizing Compiler
Standard Fortran
- more specifically a source-to-source
restructuring compiler - Research issues in such a compiler
- Detecting parallelism
- Mapping parallelism onto the machine
- Performing compiler techniques at runtime
- Compiler infrastructure
Polaris
Fortrandirectives (OpenMP)
OpenMP backend compiler
11State of the Art in Automatic parallelization
- Advanced optimizing compilers perform well in 50
of all science/engineering applications. - Caveats this is true
- in research compilers
- for regular applications, written in Fortran
or C without pointers - Wanted heroic, black-belt programmers who know
the assembly language of HPC
12Can Speculative Parallel Architectures Help?
- Basic idea
- Compiler splits program into sections (without
considering data dependences) - The sections are executed in parallel
- The architecture tracks data dependence
violations and takes corrective action.
13Performance of Speculative Multithreading
- SPEC CPU2000 FP programs executed on a 4-core
speculative architecture.
14We may needExplicit Parallel Programming
- Shared-memory architectures
- OpenMP proven model for Science/Engineering
programs - Suitability for non-numerical programs ?
- Distributed computers
- MPI the assembly language of parallel/distributed
systems. Can we do better ?
15Beyond ScienceEngineering Applications
- 7 Dwarfs
- Structured Grids (including locally structured
grids, e.g. Adaptive Mesh Refinement) - Unstructured Grids
- Fast Fourier Transform
- Dense Linear Algebra
- Sparse Linear Algebra
- Particles
- Monte Carlo
- Search/Sort
- Filter
- Combinational logic
- Finite State Machine
16Shared-Memory Programming for Distributed
Applications?
- Idea 1
- Use an underlying software distributed-shared-memo
ry system (e.g., Treadmarks). - Idea 2
- Direct translation into message-passing code
17OpenMP for Software DSM
Challenges
- S-DSM maintains coherency at a page level
- Optimizations that reduce false sharing and
increase page affinity are very important
- In S-DSMs, such as TreadMarks, the stacks are not
in shared address space - Compiler must identify shared stack variable ?
interprocedural analysis
P1 tells P2 I have written page x
Processor 1
Processor 2
A50
barrier
Shared address space
Shared memory
A50
Distributed memories
P2 requests page diff from P1
stack
stack
stack
stack
stack
stack
stack
stack
t
18Optimized Performance of SPEC OMPM2001 Benchmarks
on a Treadmarks S-DSM System
19Direct Translation of OpenMP into Message Passing
- A question often asked How is this different
from HPF? - HPF emphasis is on data distribution
- OpenMP the starting point is explicit
parallel regions. - HPF implementations apply strict data
distribution and owner-computes schemes - Our approach partial replication of shared data.
- Partial replication leads to
- Synchronization-free serial code
- Communication-free data reads
- Communication for data writes amenable to
collective message passing. - Irregular accesses (in our benchmarks) amenable
to compile-time analysis - Note partial replication is not necessarily
data scalable
20Performance of OpenMP-to-MPI Translation
NEW
EXISTS
Hand-coded MPI
OpenMP-to-MPI
Higher is better
Performance comparison of our OpenMP-to-MPI
translated versions versus (hand-coded) MPI
versions of the same programs. Hand-coded MPI
represents a practical upper bound speedup
is relative to the serial version
21How does the performance compare to the same
programs optimized for Software DSM?
OpenMP for SDSM
OpenMP-to-MPI
NEW
EXISTS (Project 2)
Higher is better
22How Do We Get There?Paths towards tomorrows
programming language
- The (new) multicore challenge
- Automatic Parallelization
- Speculative Parallel Architectures
- SMP languages for distributed systems
- The (old) general software engineering challenge
- High-level languages
- Composition
- Symbolic analysis
- Autotuning
23(Very) High-Level Languages
?
- Observation The number of programming
- errors is roughly proportional to the
- number of programming lines
Scripting, Matlab
Object- oriented languages
Fortran
Assembly
24CompositionCan we compose software from existing
modules?
- Idea
- Add an abstract algorithm (AA) construct to
the programming language - the programmer definines is the AAs goal
- called like a procedure
- Compiler replaces each AA call with a sequence of
library calls - How does the compiler do this?
- It uses a domain-independent planner that accepts
procedure specifications as operators
25Motivation Programmers often Write Sequences of
Library Calls
- Example A Common BioPerl Call Sequence Query a
remote database and save the result to local
storage
Query q bio_db_query_genbank_new(nucleotide,
ArabidopsisORGN AND topoisomeraseTITL AND
03000SLEN) DB db bio_db_genbank_new(
) Stream stream get_stream_by_query(db,
q) SeqIO seqio bio_seqio_new(gtsequence.fasta,
fasta) Seq seq next_seq(stream) write_seq(s
eqio, seq)
Example adapted from http//www.bioperl.org/wiki/H
OWTOBeginners
26Defining and Calling an AA
- AA (goal) defined using the glossary...
algorithm save_query_result_locally(db_name,
query_string, filename, format) gt
query_result(result, db_name, query_string),
contains(filename, result),
in_format(filename, format)
1 data type, 1 AA call
27Ontological Engineering
- Library author provides a domain glossary
- query_result(result, db, query) result is the
outcome of sending query to the database db - contains(filename, data) file named filename
contains data - in_format(filename, format) file named filename
is in format format
28Implementing the Composition Idea
(Compiler)
(Executable)
Plan User
World
(Library Specs.)
Operators
Planner
Actions
Plan
(Call Context)
Initial State
(AA Definition)
Goal State
A Domain-independent Planner
A
- Borrowing AI technology planners
- -gt for details, see PLDI 2006
29Symbolic Program Analysis
- Today many compiler techniques work assume
numerical constants - Needed Techniques that canreason about the
program in symbolic terms. - differentiate ax2 -gt 2ax
- analyze ranges yexp if c y5 -gt
yexpexp5 - c0
- DO j1,n Recognize algorithm
- if (t(j)ltv) c1 -gt c COUNT(t1nltv)
- ENDDO
30Autotuning(dynamic compilation/adaptation)
- Moving compile-time decisions to runtime
- A key observation
- Compiler writers solve difficult decisions
by creating a command-line option - -gt finding the best combination of options
means making the difficult compiler decisions.
31Tuning Time
PEAK is 20 times as fast as the whole-program
tuning.
On average, PEAK reduces tuning time from 2.19
hours to 5.85 minutes.
32Program Performance
The performance is the same.
33Conclusions
- Advanced compiler capabilities are crucial for
implementing tomorrows programming languages - The multicore challenge -gt parallel programs
- Automatic parallelization
- Support for speculative multithreading
- Shared-memory programming support
- High-level constructs
- Composition pursues this goal
- Techniques to reason about programs in symbolic
terms - Dynamic tuning