Title: Manycore Algorithms
1Manycore Algorithms
2Driving Manycore Applications
Revolutionary change in applications for the 21st
century!
3HPCs success has been its failure!
For decades, HPC has been on a vicious cycle of
enabling applications that run well on HPC
systems.
Applications
- Manycores disruption can open the door for a
revolutionary transformation! - For the first time in decades, manycore will
allow innovation in real-world algorithms
4Enabling 21st Century Applications(slightly
modified slide from 7 YEARS ago!)
- Manycore apps will require
- Integer performance
- Strings, lists, trees, graphs
- Combinatorics
- Optimization
- Computational geometry
- Irregular data accesses
- Dynamic programming, backtracking
- Heuristics and solutions to NP-hard problems
- Innovate tomorrows applications must drive
manycore!
- Current HPC systems are designed for
physics-based simulations that use - Floating-point, linear algebra
- Top 500 List measures Linpack!
- Regular operations (high-degrees of locality)
- e.g., Matrices, FFT, CG
- Low-order polynomial-time algorithms
- Focus of current HPC/petascale algorithms
- Dense linear algebra
- Sparse linear algebra
- FFT or multi-grid
- Global scatter-gather operations
- Dynamically evolving coordinate grids
- Dynamic load-balancing
- Particle-based or lattice-gas algorithms
- Continuum equation solvers
5May You Live in Tumultuous Times.
May You Live in Interesting Times.
6Manycore Challenge List Ranking
- Challenge Given a linked list stored in memory,
find the distance from each node to the head - Sequential approach is trivial (2 lines of code,
linear time) - Linear speedup with the number of processors in
theory (PRAM) - No speedup has ever been reported using MPI
- Rationale List ranking is the basis for many
irregular parallel algorithms, and is
representative of many client applications.
Input Random List of 226 elements
- SWARM SoftWare and Algorithms for Running on
Manycore - Supported by Microsoft Research Faculty Award in
Parallelism and Concurrency - Library of efficient implementations of parallel
programming primitives and example kernels - Prefix-sums, pointer-jumping, list ranking,
divide and conquer, pipelining, graph algorithms,
symmetry breaking, graph algorithms - Computational model for analyzing algorithms on
multimanycore systems - Portable
- Microsoft Visual Studio, Linux, AIX, Solaris
- Intel Xeon, AMD Opteron, IBM Power6, Sun US T1
- Shared Source under the Microsoft Permissive
License (Ms-PL) - http//multicore-swarm.sourceforge.net/
7Concluding Remarks
- Is Manycore the next CB radio or is it for real?
- Its here, baby! Sit back, relax, and enjoy the
ride! - What will Multicore do to the computing
ecosystem? - Require real-world algorithmic innovations for
the first major time in several decades - Can innovative algorithmic techniques exploit the
opportunities and address the challenges of
multi-core? - Absolutely, aided by synergistic architectural
components - How will programming models and supporting system
software change to accommodate the unique
properties and peculiarities of multi-core
structures? - Composability is a must. Models must give
performance advantage for using architectural
features - Challenges for memory hierarchy between memory
and on chip resources. - Processors cheap, memory bandwidth expensive.
Compute in the memory when possible ?
transactions! - Where will the parallelism come from?
Hw/Sw/Compiler/OS? - The user ? explicitly parallel algorithms, and
systems ? architecture-driven thread-level
speculation - Is a single programming model the right solution?
- No pick the right tool for the problem domain
specialized programming models - Do we have to use all those 1024 cores?
- Yes, while theres no more free lunch in
computing, most problems can reveal this amount
of concurrency.
8Acknowledgment of Support
- National Science Foundation
- CSR A Framework for Optimizing Scientific
Applications (06-14915) - CAREER High-Performance Algorithms for
Scientific Applications (06-11589 00-93039) - ITR Building the Tree of Life -- A National
Resource for Phyloinformatics and Computational
Phylogenetics (EF/BIO 03-31654) - DBI Acquisition of a High Performance
Shared-Memory Computer for Computational Science
and Engineering (04-20513). - IBM PERCS / DARPA High Productivity Computing
Systems (HPCS) - DARPA Contract NBCH30390004
- IBM Shared University Research (SUR) Grant
- Sony-Toshiba-IBM (STI)
- Sun Academic Excellence Grant
- Microsoft Research