Introductory Courses in High Performance Computing at Illinois - PowerPoint PPT Presentation

1 / 6
About This Presentation
Title:

Introductory Courses in High Performance Computing at Illinois

Description:

Peephole optimizations. Loop optimizations. Branch optimizations. Locality ... False sharing. Optimization for power. Tools for program tuning. Performance ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 7
Provided by: Padua
Learn more at: https://www.cs.gsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Introductory Courses in High Performance Computing at Illinois


1
Introductory Courses in High Performance
Computing at Illinois
  • David Padua

2
Our oldest course
  • 420 Parallel Programming for Scientists and
    Engineers.
  •  
  • Course intended for non-CS majors (but many CS
    students take it). Taught once a year for the
    last 20 years.
  •  
  • CS 420 Parallel Progrmg Sci Engrg
  • Credit 3 or 4 hours.
  • Fundamental issues in design and development of
    parallel programs for various types of parallel
    computers. Various programming models according
    to both machine type and application area. Cost
    models, debugging, and performance evaluation of
    parallel programs with actual application
    examples. Same as CSE 402 and ECE 492. 3
    undergraduate hours. 3 or 4 graduate hours.
    Prerequisite CS 400 or CS 225

3
420 Parallel Programming for Scientists and
Engineers
  • Machines
  • Programming models
  • Shared-memory
  • Distributed memory
  • Data parallel
  • OpenMP/MPI/Fortran 90
  • Clusters/Shared-memory machines/Vector
    supercomputers (in the past)
  • Data parallel numerical algorithms (in Fortran
    90/MATLAB)
  • Sorting/N-Body

4
Other courses
  • 4xx Parallel programming. For majors
  • 4xx Performance Programming. For all issues
    related to performance
  •  5xx Theory of parallel computing. For advanced
    students
  • 554 Parallel Numerical Algorithms.

5
4xx Parallel programming. For majors
  • Overview of architectures. Architectural
    characterization of most important parallel
    systems today. Issues in effective programming of
    parallel architectures exploitation of
    parallelism, locality (cache, registers), load
    balancing, communication, overhead, consistency,
    coherency, latency avoidance. Transactional
    memories.
  • Programming paradigms. Shared-memory, message
    passing, data parallel or regular, and functional
    programming paradigms. .. Message-passing
    programming. PGAS programming. Survey of
    programming languages. OpenMP, MPI, TBB, Charm,
    UPC, Co-array Fortran, High-Performance Fortran,
    NESL.
  • Concepts. Basic concepts in parallel programming.
    Speedup, efficiency, redundancy, isoefficiency,
    Amdahl's law.
  • Programming principles. Reactive parallel
    programming. Memory consistency. Synchronization
    strategies, critical regions, atomic updates,
    races, deadlock avoidance, prevention, livelock,
    starvation, scheduling fairness. Lock-free
    algorithms. Asynchronous algorithms. Speculation.
    Load balancing. Locality enhancement. Lock free
    algorithms. Asynchronous algorithms.
  • Algorithms. Basic algorithms Element-by-element
    array operations, reductions, parallel prefix,
    linear recurrences, boolean recurrences. Systolic
    arrays, Matrix multiplication, LU decomposition,
    Jacobi relaxation, fixed point iterations.
    Sorting and searching. Graph algorithms,
    Datamining algorithms. N-Body/Particle
    simulations.

6
4xx Performance Programming.
  • Sequential Performance bottlenecks CPU
    (pipelining, multiple issue processors (in-order
    and out-of order), support for speculation,
    branch prediction, execution units,
    vectorization, registers, register renaming)
    caches (temporal and spatial locality, compulsory
    misses, conflict misses, capacity misses,
    coherence misses) memory (latency, row/column,
    read/write), I/O
  • Parallel performance bottlenecks Amdahl, load
    imbalance, communication, false sharing,
    granularity of communication (distributed memory)
  • Optimization strategies Algorithm and program
    optimizations. Static and dynamic optimizations.
    Data dependent optimizations. Machine dependent
    and machine independent optimizations.
  • Sequential program optimizations Redundancy
    elimination. Peephole optimizations. Loop
    optimizations. Branch optimizations.
  • Locality optimizations. Tiling. Cache oblivious
    and cache conscious algorithms. Padding. Hardware
    and software prefetch.
  • Parallel programming optimizations Brief
    introduction to parallel programming of
    shared-memory machines. Dependence graphs and
    program optimizations. Privatization, expansion,
    induction variables, wrap-around variables, loop
    fusion and loop fission. Frequently occurring
    kernels (reductions, scan, linear recurrences)
    and their parallel versions. Program
    vectorization. Multimedia extensions and their
    programming. Speculative parallel programming.
    Load balancing. Bottlenecks. Overdecomposition.
  • Communication optimizations. Aggregation for
    communication. Redundant computations to save
    avoid communication. False sharing.
  • Optimization for power.
  • Tools for program tuning. Performance monitors.
    Profiling. Sampling. Compiler switches,
    directives and compiler feedback.
  • Autotuning. Empirical search. Machine learning
    strategies for program optimization. Libbrary
    generators. ATLAS, FFTW, SPIRAL.
  • Algorithm choice and tuning. Hybrid algorithms.
    Self optimizing algorithms. Sorting. Datamining.
    Numerical error and algorithm choice.
Write a Comment
User Comments (0)
About PowerShow.com