System Architecture: Near, Medium, and Longterm Scalable Architectures - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

System Architecture: Near, Medium, and Longterm Scalable Architectures

Description:

... size & bandwidth per core. Symbiosis of architecture and ... (Dual-core Opteron)? Open Shapes = Existing Logarithmic Algorithm (Gibson/Bruck)? Solid Shapes ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 12
Provided by: csSa2
Category:

less

Transcript and Presenter's Notes

Title: System Architecture: Near, Medium, and Longterm Scalable Architectures


1
System ArchitectureNear, Medium, and
Long-termScalable Architectures
  • Panel Discussion Presentation
  • Sandia CSRI Workshop onNext-generation Scalable
    ApplicationsWhen MPI-only is not enough
  • June 4, 2008
  • Kevin Pedretti
  • Scalable System Software Dept.
  • Sandia National Laboratories
  • ktpedre_at_sandia.gov

Sandia is a multiprogram laboratory operated by
Sandia Corporation, a Lockheed Martin
Company,for the United States Department of
Energys National Nuclear Security
Administration under contract DE-AC04-94AL85000.
2
Near Term
  • Odds are good, but goods are odd...
  • Multi-core, many-core, mega-core
  • Heterogeneous ISAs, cores, systems
  • Accelerators GPU, Cell, Clearspeed, FPGA, etc.
  • Embedded Tilera, SPI, Ambric (336-core),
    Tensilica
  • Scalable Architectures
  • Peak FLOPS not bottleneck
  • Improving per-socket efficiency on real
    applications is low-hanging fruit
  • Decreasing memory size bandwidth per core
  • Symbiosis of architecture and system software

3
Near Term (Cont.)?
  • Adapting MPI implementations for architecture
  • Shared memory copies vs. NIC
  • Cache pollution, injection
  • Leverage hierarchy / intra-node locality
  • Adapting MPI applications for architecture
  • MPI shared memory LIBSM
  • MPI something else for intra-node
  • OpenMP, Thread Building Blocks, ALF Streaming,
    CUDA, Rapid Mind, Peakstream/Google, etc.
  • All incompatible, some similar concepts
  • Adapting architecture for MPI?
  • Leveraging interconnect capabilities for PGAS

4
OS Scalability
At 8192 nodes, CNL (2.0.44) is 49 worse than
Catamount onthis Partisn problem. Doesnt
appear to be a bandwidth issue.
5
Task and Memory Placement
  • No standard mechanisms, most punt and hope for
    best
  • Explicit vs. implicit mechanisms
  • More important than node placement?

6
Intra-node MPI
7
Virtual Memory Nice, but Gets in Way
Dashed Line Small pages Solid Line
Large pages (Dual-core Opteron)? Open Shapes
Existing Logarithmic Algorithm
(Gibson/Bruck)? Solid Shapes New
Constant-Time Algorithm (Slepoy, Thompson,
Plimpton)?
UnexpectedBehavior Due to TLB
TLB misses increased with large pages,but time
to service miss decreased dramatically
(10x).Page table fits in L1! (vs. 2MB per GB
with small pages)?
8
So, Answer is Large Pages?
  • DRAM bank conflicts can be considerable depending
    on data alignment
  • OS-level and hardware mitigation strategies

9
Affects SpMV Also(28 Node HPCCG Run)?
10
Medium Term
  • More accelerators, normalization
  • Attractive power and memory efficiency
  • Commodity processors will integrate GPUs on-chip
  • HPC-centric off-chip accelerators
  • General-purpose cores not getting much faster
  • Leverage architecture for specific app domains
  • Some common mechanism will/must emerge for
    dealing with data-parallel accelerators
  • General-purpose cores become more light-weight,
    better match for light-weight system software
  • Chip stacking
  • Off-chip optics

11
Long Term
  • MPP-on-a-chip
  • On and off-chip optics
  • More intelligent memory systems
  • Application driven architectures
Write a Comment
User Comments (0)
About PowerShow.com