Introductory Courses in High Performance Computing at Illinois - PowerPoint PPT Presentation

1 / 6

About This Presentation

Title:

Introductory Courses in High Performance Computing at Illinois

Description:

Peephole optimizations. Loop optimizations. Branch optimizations. Locality ... False sharing. Optimization for power. Tools for program tuning. Performance ... – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 7

Provided by: Padua

Learn more at: https://www.cs.gsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Introductory Courses in High Performance Computing at Illinois

1
Introductory Courses in High Performance
Computing at Illinois

David Padua

2
Our oldest course

420 Parallel Programming for Scientists and
Engineers.
Course intended for non-CS majors (but many CS
students take it). Taught once a year for the
last 20 years.
CS 420 Parallel Progrmg Sci Engrg
Credit 3 or 4 hours.
Fundamental issues in design and development of
parallel programs for various types of parallel
computers. Various programming models according
to both machine type and application area. Cost
models, debugging, and performance evaluation of
parallel programs with actual application
examples. Same as CSE 402 and ECE 492. 3
undergraduate hours. 3 or 4 graduate hours.
Prerequisite CS 400 or CS 225

3
420 Parallel Programming for Scientists and
Engineers

Machines
Programming models
Shared-memory
Distributed memory
Data parallel
OpenMP/MPI/Fortran 90
Clusters/Shared-memory machines/Vector
supercomputers (in the past)
Data parallel numerical algorithms (in Fortran
90/MATLAB)
Sorting/N-Body

4
Other courses

4xx Parallel programming. For majors
4xx Performance Programming. For all issues
related to performance
5xx Theory of parallel computing. For advanced
students
554 Parallel Numerical Algorithms.

5
4xx Parallel programming. For majors

Overview of architectures. Architectural
characterization of most important parallel
systems today. Issues in effective programming of
parallel architectures exploitation of
parallelism, locality (cache, registers), load
balancing, communication, overhead, consistency,
coherency, latency avoidance. Transactional
memories.
Programming paradigms. Shared-memory, message
passing, data parallel or regular, and functional
programming paradigms. .. Message-passing
programming. PGAS programming. Survey of
programming languages. OpenMP, MPI, TBB, Charm,
UPC, Co-array Fortran, High-Performance Fortran,
NESL.
Concepts. Basic concepts in parallel programming.
Speedup, efficiency, redundancy, isoefficiency,
Amdahl's law.
Programming principles. Reactive parallel
programming. Memory consistency. Synchronization
strategies, critical regions, atomic updates,
races, deadlock avoidance, prevention, livelock,
starvation, scheduling fairness. Lock-free
algorithms. Asynchronous algorithms. Speculation.
Load balancing. Locality enhancement. Lock free
algorithms. Asynchronous algorithms.
Algorithms. Basic algorithms Element-by-element
array operations, reductions, parallel prefix,
linear recurrences, boolean recurrences. Systolic
arrays, Matrix multiplication, LU decomposition,
Jacobi relaxation, fixed point iterations.
Sorting and searching. Graph algorithms,
Datamining algorithms. N-Body/Particle
simulations.

6
4xx Performance Programming.

Sequential Performance bottlenecks CPU
(pipelining, multiple issue processors (in-order
and out-of order), support for speculation,
branch prediction, execution units,
vectorization, registers, register renaming)
caches (temporal and spatial locality, compulsory
misses, conflict misses, capacity misses,
coherence misses) memory (latency, row/column,
read/write), I/O
Parallel performance bottlenecks Amdahl, load
imbalance, communication, false sharing,
granularity of communication (distributed memory)
Optimization strategies Algorithm and program
optimizations. Static and dynamic optimizations.
Data dependent optimizations. Machine dependent
and machine independent optimizations.
Sequential program optimizations Redundancy
elimination. Peephole optimizations. Loop
optimizations. Branch optimizations.
Locality optimizations. Tiling. Cache oblivious
and cache conscious algorithms. Padding. Hardware
and software prefetch.
Parallel programming optimizations Brief
introduction to parallel programming of
shared-memory machines. Dependence graphs and
program optimizations. Privatization, expansion,
induction variables, wrap-around variables, loop
fusion and loop fission. Frequently occurring
kernels (reductions, scan, linear recurrences)
and their parallel versions. Program
vectorization. Multimedia extensions and their
programming. Speculative parallel programming.
Load balancing. Bottlenecks. Overdecomposition.
Communication optimizations. Aggregation for
communication. Redundant computations to save
avoid communication. False sharing.
Optimization for power.
Tools for program tuning. Performance monitors.
Profiling. Sampling. Compiler switches,
directives and compiler feedback.
Autotuning. Empirical search. Machine learning
strategies for program optimization. Libbrary
generators. ATLAS, FFTW, SPIRAL.
Algorithm choice and tuning. Hybrid algorithms.
Self optimizing algorithms. Sorting. Datamining.
Numerical error and algorithm choice.