Architectural Considerations for Petaflops and beyond - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Architectural Considerations for Petaflops and beyond

Description:

1987--present Massively parallel, Message-passing Fortran and C ... Mid-nineties: CRAFT-90 (shared memory approach to MPPs. Early-nineties to ~2000 MPP Threads ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 24
Provided by: willia193
Category:

less

Transcript and Presenter's Notes

Title: Architectural Considerations for Petaflops and beyond


1
Architectural Considerations for Petaflops and
beyond Bill Camp Sandia National Labs March
4,2003 SOS7 Durango, CO, USA -
2
Programming Models A historical
perspective 1948--53 Machine Language
Rules 1953--1973 single-threaded
Fortran 1973--1980 single-threaded vector
Fortran 1978--1995 Shared memory parallel vector
Fortran Directives multi-, auto- and
microtasking 1987--present Massively parallel,
Message-passing Fortran and C 1995--present
Threads-based, shared memory parallelism 1996--pre
sent Hybrid threads message passing
3
Programming Models Some false starts Late
80s--early 90s SIMD Fortran for heterogeneous
problems Mid-eighties--present Dataflow
parallelism and Functional programming Mid-eighti
es--late eighties AI-based languages, eg
LISP Mid-nineties CRAFT-90 (shared memory
approach to MPPs Early-nineties to 2000 MPP
Threads
4
Programming Models --Observations Shared memory
programming models have never scaled
well Directives-based approaches lead to code
explosion and are not effective at dealing with
Amdahls Law Outer-Loop, distributed memory
parallelism requires a physics-centric
approach. I.e., it changed the way we think about
parallelism but (largely) preserved our code
base, didnt lead to code explosion, and made it
easier to marginalize the effedcts of Amdahls
Law. People will change approaches only for a
huge perceived gain
5
Petaflops-- can we get there with what we have
now? YES
6
Whats Important? SURE - Scalability -
Usability - Reliability - Expense minimization
7
A more REAListic Amdahlian Law
The actual scaled speedup is more like S(N)
SAmdahl(N)/1 fcomm x Rp/c, where fcomm is the
fraction of work devoted to communications and
Rp/c is the ratio of processor speed to
communications speed.
8
REAL Law Implications Sreal(N) / SAmdahl(N)
Lets consider three cases on two computers the
two computers are identical except that one has
an Rp/c of 1 and the second an Rp/c of 0.05 The
three cases are fcomm 0.01, 0.05 and 0.10
9
REAL Law Implications S(N) / SAmdahl(N)
fcomm
0.01 0.05 0.10
Rp/c
0.99 0.95 0.9 0.83
0.50 0.33
1.0 0.05
10
Bottom line
A well-balanced architecture is nearly
insensitive to communications overhead By
contrast a system with weak communications can
lose over half its power for applications in
which communications is important
11
  • Petaflops-- Why can we get there with what we
    have now?
  • We only need 3 more spins of Moores Law
  • --Todays 6-GF Hammer becomes a 48-GF processor
    by 2009
  • --10-Gigabit ethernet becomes 40 or 80-Gbit
    ethernet
  • --Memory capacities and prices continue to
    improve on current trend until 2009
  • Disk technology continues on its current
    trajectory for 6 more years
  • We use small, optical switches to give us 40--80
    Gbyte/sec interconnects

12
  • Petaflops-- Why can we get there with what we
    have now?
  • We need 12,000--25,000 processors to get a peak
    PETAFLOP.
  • It will have 250--1000 TB memory
  • It will have several hundred petabytes disk
    storage
  • It will sustain about a half terabyte/sec I/O
    (more costs more)
  • It will have about 30 TB/sec XC BW
  • It will have about 5--10 PB/Sec memory BW
  • BALANCE REMAINS ESSENTIALLY LIKE THAT IN THE RED
    STORM DESIGN
  • COST in 2009 100M--250M in then-year dollars

13
  • Petaflops-- Design issues
  • It will use commodity processors with multiple
    cores per chip
  • It will run a partitioned OS based on Linux
  • It could have partitions with fast vector
    processors in a mix-and-match architecture
  • It wont look like the Earth Simulator
  • It wont run IA-64 based on current Intel design
    intent
  • It will probably run Power PC or HAMMER follow-ons

14
  • Petaflops-- Why not Earth Simulator?
  • On our codes, commodity processors are nearly as
    fast as the ES nodes and they have a 1.5--2.0
    order of magnitude cost/performance advantage
  • BTW this is also true-- but with not as huge a
    difference-- for the McKinley versus the
    Pentium-4
  • Example The geometric mean of Livermore Loops on
    ES is only 60 faster than on a 2 GHz Pentium-4
  • Example A real CTH problem is about as fast on
    that P-4 as it is on the ES

15
  • Petaflops-- Why not Earth Simulator?
  • Amdahls Law and the high cost of custom
    processors

16
  • Why not Earth Simulator?
  • Amdahls Law
  • S TS / TV
  • S 1/pW / (s N) (1-p)W / (s/M) / W /
    s
  • S p/N M(1-p) -1
  • Let N M 4,
  • S 1/ p/4 4(1-p) .

17
  • Why not Earth Simulator?
  • Amdahls Law (p vector fraction of work)
  • S p/N M(1-p) -1
  • Let N M 4,
  • S 1/ p/4 4(1-p) .
  • P must be greater than or equal to 0.8 for
    breakeven!

18
  • Petaflops-- Why not IA-64?
  • Heat
  • Size
  • Complexity
  • Cost
  • High latency/ low BW
  • Difficulty in Compilability
  • Competition from Intel
  • .

19
(No Transcript)
20
  • The Bad News
  • Somewhere between a petaflop and an Exaflop, we
    will run the string out on this approach to
    computing

21
The Good News - For ExaFlops computing, there is
lots of potential for innovation New
approaches DNA computers New memory-centric
technologies (eg, spin computers) (Not) quantum
computers Very Low power semiconductor based
systems
22
The Good News - For ExaFlops computing, there is
lots of potential for innovation The
Requirements for SURE will not change!
23
  • The Good News
  • Ill be gone fishing!
  • The END (almost)
Write a Comment
User Comments (0)
About PowerShow.com