Architectural Considerations for Petaflops and beyond - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Architectural Considerations for Petaflops and beyond

Description:

1987--present Massively parallel, Message-passing Fortran and C ... Mid-nineties: CRAFT-90 (shared memory approach to MPPs. Early-nineties to ~2000 MPP Threads ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 24

Provided by: willia193

Learn more at: https://www.cs.sandia.gov

Category:

more less

Transcript and Presenter's Notes

Title: Architectural Considerations for Petaflops and beyond

1
Architectural Considerations for Petaflops and
beyond Bill Camp Sandia National Labs March
4,2003 SOS7 Durango, CO, USA -
2
Programming Models A historical
perspective 1948--53 Machine Language
Rules 1953--1973 single-threaded
Fortran 1973--1980 single-threaded vector
Fortran 1978--1995 Shared memory parallel vector
Fortran Directives multi-, auto- and
microtasking 1987--present Massively parallel,
Message-passing Fortran and C 1995--present
Threads-based, shared memory parallelism 1996--pre
sent Hybrid threads message passing
3
Programming Models Some false starts Late
80s--early 90s SIMD Fortran for heterogeneous
problems Mid-eighties--present Dataflow
parallelism and Functional programming Mid-eighti
es--late eighties AI-based languages, eg
LISP Mid-nineties CRAFT-90 (shared memory
approach to MPPs Early-nineties to 2000 MPP
Threads
4
Programming Models --Observations Shared memory
programming models have never scaled
well Directives-based approaches lead to code
explosion and are not effective at dealing with
Amdahls Law Outer-Loop, distributed memory
parallelism requires a physics-centric
approach. I.e., it changed the way we think about
parallelism but (largely) preserved our code
base, didnt lead to code explosion, and made it
easier to marginalize the effedcts of Amdahls
Law. People will change approaches only for a
huge perceived gain
5
Petaflops-- can we get there with what we have
now? YES
6
Whats Important? SURE - Scalability -
Usability - Reliability - Expense minimization
7
A more REAListic Amdahlian Law
The actual scaled speedup is more like S(N)
SAmdahl(N)/1 fcomm x Rp/c, where fcomm is the
fraction of work devoted to communications and
Rp/c is the ratio of processor speed to
communications speed.
8
REAL Law Implications Sreal(N) / SAmdahl(N)
Lets consider three cases on two computers the
two computers are identical except that one has
an Rp/c of 1 and the second an Rp/c of 0.05 The
three cases are fcomm 0.01, 0.05 and 0.10
9
REAL Law Implications S(N) / SAmdahl(N)
fcomm
0.01 0.05 0.10
Rp/c
0.99 0.95 0.9 0.83
0.50 0.33
1.0 0.05
10
Bottom line
A well-balanced architecture is nearly
insensitive to communications overhead By
contrast a system with weak communications can
lose over half its power for applications in
which communications is important
11

Petaflops-- Why can we get there with what we
have now?
We only need 3 more spins of Moores Law
--Todays 6-GF Hammer becomes a 48-GF processor
by 2009
--10-Gigabit ethernet becomes 40 or 80-Gbit
ethernet
--Memory capacities and prices continue to
improve on current trend until 2009
Disk technology continues on its current
trajectory for 6 more years
We use small, optical switches to give us 40--80
Gbyte/sec interconnects

Petaflops-- Why can we get there with what we
have now?
We need 12,000--25,000 processors to get a peak
PETAFLOP.
It will have 250--1000 TB memory
It will have several hundred petabytes disk
storage
It will sustain about a half terabyte/sec I/O
(more costs more)
It will have about 30 TB/sec XC BW
It will have about 5--10 PB/Sec memory BW
BALANCE REMAINS ESSENTIALLY LIKE THAT IN THE RED
STORM DESIGN
COST in 2009 100M--250M in then-year dollars

Petaflops-- Design issues
It will use commodity processors with multiple
cores per chip
It will run a partitioned OS based on Linux
It could have partitions with fast vector
processors in a mix-and-match architecture
It wont look like the Earth Simulator
It wont run IA-64 based on current Intel design
intent
It will probably run Power PC or HAMMER follow-ons

Petaflops-- Why not Earth Simulator?
On our codes, commodity processors are nearly as
fast as the ES nodes and they have a 1.5--2.0
order of magnitude cost/performance advantage
BTW this is also true-- but with not as huge a
difference-- for the McKinley versus the
Pentium-4
Example The geometric mean of Livermore Loops on
ES is only 60 faster than on a 2 GHz Pentium-4
Example A real CTH problem is about as fast on
that P-4 as it is on the ES

Petaflops-- Why not Earth Simulator?
Amdahls Law and the high cost of custom
processors

Why not Earth Simulator?
Amdahls Law
S TS / TV
S 1/pW / (s N) (1-p)W / (s/M) / W /
s
S p/N M(1-p) -1
Let N M 4,
S 1/ p/4 4(1-p) .

Why not Earth Simulator?
Amdahls Law (p vector fraction of work)
S p/N M(1-p) -1
Let N M 4,
S 1/ p/4 4(1-p) .
P must be greater than or equal to 0.8 for
breakeven!

Petaflops-- Why not IA-64?
Heat
Size
Complexity
Cost
High latency/ low BW
Difficulty in Compilability
Competition from Intel
.

19
(No Transcript)
20

The Bad News
Somewhere between a petaflop and an Exaflop, we
will run the string out on this approach to
computing

21
The Good News - For ExaFlops computing, there is
lots of potential for innovation New
approaches DNA computers New memory-centric
technologies (eg, spin computers) (Not) quantum
computers Very Low power semiconductor based
systems
22
The Good News - For ExaFlops computing, there is
lots of potential for innovation The
Requirements for SURE will not change!
23