Title: Do or die
1Vision for OSC Computing and Computational
Sciences
Thomas Zacharia Associate Laboratory
Director Computing and Computational Sciences Oak
Ridge National Laboratory
http//www.ccs.ornl.gov/
Earth Simulator Rapid Response Meeting May
15-16, 2002
2Charge from Dr. Orbach
- Review . . . current state of the national
computer vendor community relative to high
performance computing - . . . Vision for what realistically should be
accomplished in the next five years within the
Office of Science in high performance computing
3Dr. Orbachs Vision for OSC ComputingStatement
to ASCAC Committee, May 8, 2002
- there is a centrality of computation in
everything that we do - large scale computation is the future of every
program in the Office of Science - we want to have our own computing program in
non-defense computational science
Astrophysics
Biology
Computer Scientists
Applied Mathematics
COMPUTING INFRASTRUCTURE
Materials
Theoretical and Computational Scientists
Climate
Chemistry
Fusion
4FY 03 Budget Request for OSC Computing
Considerably Lower than Required to Meet Goals
900,000,000
800,000,000
700,000,000
600,000,000
500,000,000
Budget Dollars
400,000,000
300,000,000
200,000,000
100,000,000
0
DOE-SC
NNSA
NSF
Fiscal Year
5As Fraction of Total Budget, OSC is Half NNSA and
NSF and Needs Significant Increase to Meet Goals
12
10
8
6
Computing Budget / Total Budget ()
4
2
0
DOE-SC
NNSA
NSF
6Earth Simulator has Heightened Urgency for
Infrastructure Strategy for Scientific Computing
- Critical Steps
- Invest in critical software with integrated
science, and computer science development teams - Deploy scientific computing hardware
infrastructure in support of large scale
computation - Cray, HP, IBM, SGI
- IBM is the largest US installation
- Develop new initiative to support advanced
architecture research
Top 500 Supercomputers
US has been 1 in 12 of 19 lists
A concerted effort will be required to regain US
leadership in high performance computing. The
LINPACK benchmark generally overestimates the
effectiveness of an architecture for applications
such as climate by a substantial factor.
Stability and reliability are also important
system properties.
7Invest in Critical Software with Integrated
Science and Computer Science Development Teams
SciDAC a Good Start Towards Scientific Computing
Software
- Scientific Applications
- Climate Simulation
- Computational Chemistry
- Fusion 5 Topics
- High Energy Nuclear Physics 5 Topics
- Collaboratories
- Four Projects
- Middleware Network Research
- Six Projects
- Computer Science
- Scalable Systems Software
- Common Component Architecture
- Performance Science and Engineering
- Scientific Data Management
- Applied Mathematics
- PDE Linear/Nonlinear Solvers and Libraries
- Structured Grids/AMR
- Unstructured Grids
Dave Bader, SciDAC PI Meeting, Jan 15, 2002,
Washington DC
8Deploy Scientific Computing Hardware
Infrastructure to Support Large-Scale
Computation
- Provide most effective and efficient computing
resources for a set of scientific applications - Serve as focal point for scientific research
community as it adapts to new computing
technologies - Provide organizational framework needed for
multidisciplinary activities - Addressing software challenges requires strong,
long term collaborations among disciplinary
computational scientists, computer scientists,
and applied mathematicians - Provide organizational framework needed for
development of community codes - Implementing many scientific codes requires wide
range of disciplinary expertise - Organizational needs will continue to grow as
computers advance to petaflops scale
Dave Bader, SciDAC PI Meeting, Jan. 15, 2002,
Washington, DC
9Earth Simulator has Widened Gap with DOE
Scientific Computing Hardware Infrastructure
7,000
7,000
Technology Gap
5,000
5,000
Simulations years/day
Simulations years/day
Widening Gap
3,000
3,000
1,000
1,000
Earth Simulator
POWER4 H (40TFlops)
Power5 (50 TFlops)
Earth Simulator
SEABORG
CHEETAH
10,000
- Top left comparison between ES and SC resources
highlights widening gap between SC capabilities
and others - Top right comparison between ES and US resources
of comparable peak performance highlights
architectural difference and need for new
initiative to close the gap - Right comparison between ES and US resources of
comparable cost
8,000
6,000
Simulations years/day
4,000
2,000
0
Earth Simulator
POWER4 H (340TFlops)
Power5 (350 TFlops)
10Possible U.S. Response in the Near Term for
Increased Computing Capacity
Earth Simulator
US Alternative
- 40 TFlops Peak
- 5120 Vector Processors
- 8 GFlops Processor
- 8 Processors per Node
- 500 M Procurement
- 50M/yr Maintenance
- Limited Software Investment to date
- Significant Ancillary Impact on Biology,
Nanoscience, Astrophysics, HENP, Fusion
- 40 TFlops Peak
- 5120 Power5 Processors
- 8 GFlops Processor
- 64 Processors per Node
- 100 M Procurement
- 10M/yr Maintenance
- SciDAC Investment in Computational Science and
related ISICs - Significant Ancillary Impact on Biology,
Nanoscience, Astrophysics, HENP, Fusion
11Best Performance of High Resolution Atmospheric
Model
Performance of Hi-Resolution Atmospheric Model
105
104
Earth Simulator (2560)
103
GFlops
AlphaES45 (2048)
102
AlphaES40 (256)
SP3 WHII (512)
T3E (512)
101
102
103
104
105
Inter-node bandwidth (Mb/s)
12Develop New Initiative to Support Advanced
Architecture BlueGene Offers Possible Option
1000000
C/L/D
ASCI
Beowulfs
100000
T3E
COTS
JPL
ASCI Blue
Dollars/GFlops
ASCI White
ASCI Compaq
10000
QCDSP
QCDOC
Columbia
1000
Columbia/IBM
Blue Gene/L
100
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Year
13BlueGene Architecture is a (more) General Purpose
Machine that builds on QCDOC
- QCDSP (600GF based on Texas Instruments DSP C31)
- Gordon Bell Prize for Most Cost Effective
Supercomputer in '98 - Columbia University Designed and Built
- Optimized for Quantum Chromodynamics (QCD)
- 12,000 50MF Processors
- Commodity 2MB DRAM
- QCDOC (20TF based on IBM System-on-a-Chip)
- Collaboration between Columbia University and IBM
Research - Optimized for QCD
- IBM 7SF Technology (ASIC Foundry Technology)
- 20,000 1GF processors (nominal)
- 4MB Embedded DRAM External Commodity DDR/SDR
SDRAM
- BlueGene L/D (180TF based on IBM
System-on-a-Chip) - Designed by IBM Research in IBM CMOS 8SF
Technology - 64,000 2.8GF processors (nominal)
- 4MB Embedded DRAM External Commodity DDR SDRAM
14System Organization (conceptual)
- Host System
- Diagnostics, booting, archive
- Application dependent requirements
- File Server Array
- 500 RAID PC servers
- Gb Ethernet and/or Infiniband
- Application dependent requirements
-
- BlueGene/L Processing Nodes
- 81920 Nodes
- Two major partitions
- 65536 nodes production
- Platform (256 TFlops peak)
- 16384 nodes partitioned into code development
platforms
15Summary
- Continue investment in critical software with
integrated science, and computer science
development teams - Deploy scientific computing hardware
infrastructure in support of large scale
computation - Develop new initiative to support advanced
architecture research - Develop a bold new facilities strategy for OSC
computing - Increase OSC computing budget to support outlined
strategy
Without sustained commitment to scientific
computing, key computing and computational
sciences capabilities, including personnel, will
erode beyond recovery.