NERSC and Blue Planet - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

NERSC and Blue Planet

Description:

System designers did not truly understand current and future scientific applications ... LLNL/ASCI has become very interested in Blue Planet ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 21
Provided by: min6152
Category:
Tags: nersc | blue | planet

less

Transcript and Presenter's Notes

Title: NERSC and Blue Planet


1
NERSC and Blue Planet
  • William T.C. Kramer
  • NERSC/LBNL
  • May 28, 2003
  • NERSC User Group Meeting, Chicago IL

2
What is Blue Planet
  • Blue Planet is a science driven design
    process to develop systems that are
    simultaneously more effective for science and
    sustainable and cost effective for vendors.
  • White Paper uses IBM as an example of what can be
    done with this process
  • Can be applied to a number of vendors
  • Blue Planet is a new concept for a sustainable
    computer architecture more effective for science
    and engineering applications
  • A specific implementation leveraging the IBM
    roadmap that better balances scientific
    processing needs and the commercial viability
  • Described as a ultrascale scale system on the
    order of the Earth Simulator
  • http//www.nersc.gov/news/blueplanet.html
  • and
  • http//www.nersc.gov/news/ArchDevProposal.5.01.pdf

3
Signposts of Change in HPC
  • In early 2002 there were several signposts that
    signal a fundamental change in HPC in the US.
    For NERSC they were
  • Poor benchmarks for the NERSC workload for our
    latest procurement (March 2002)
  • Impressive early performance results of the Earth
    Simulator System (April 2002)
  • Increasing indications that commodity clusters
    will not be as easy, scalable or cost-effective
    as first projected.
  • Lack of progress in computer architecture
    research evident at Petaflops Workshop (WIMPS,
    February 2002)
  • Detailed evaluation of current and future
    processors and systems
  • System designers did not truly understand current
    and future scientific applications
  • Design target codes are not current and future
    methods

4
The Conclusion
  • The community has pursued the logical extreme of
    the COTS systems
  • The commodity building block was the
    microprocessor, but is now the entire server
    (SMP).
  • Communications and memory bandwidth are not
    scaling with peak processor power.
  • Near football-field size computers consuming
    megawatts of electricity.
  • DOE Office of Science is the natural leader to
    address this and is making it a priority
  • This is happening against the backdrop of
  • Decreasing interest in HPC by some U.S. vendors
  • Further consolidation/reduction of the number of
    vendors
  • Reduced profitability and reduced technology
    investments
  • New ways to define commodity
  • So, we are in the middle of a fundamental change
    of the basic premises of the HPC market in the
    U.S.

5
The Divergence Problem
  • The requirements of high performance computing
    for science and engineering and the requirements
    of the commercial market are diverging.
  • The commercial-clusters-of-COTS-SMPs approach is
    no longer sufficient to provide the highest level
    of performance
  • Lack of memory bandwidth
  • High interconnect latency
  • Lack of interconnect bandwidth
  • Lack of high performance parallel I/O
  • High cost of ownership for large scale systems

6
The Divergence Problem
  • In response, NERSC, ANL, IBM developed a Science
    Driven Computer Architecture proposal.
  • Includes a new architecture co-defined with IBM
    called Blue Planet
  • "Creating Science-Driven Computer Architecture
    A New Path to Scientific Leadership -
    http//www.nersc.gov/news/ArchDevProposal.5.01.pdf
  • Expanding this process with other vendors

7
Overall Goals
  • Restore American leadership in capability
    scientific computing by 2005
  • Define a sustainable path for efficient
    scientific computing
  • Focus on achieving high sustained performance
    rather than peak
  • The first step in a long term strategy
  • Petaflop peak by the end of the decade with 40
    sustained
  • An initial system with 2x sustained performance
    over the ES at 50 the cost
  • On at least a modest number of strategic large
    applications
  • Sustained performance of 30-40 on key benchmarks
  • Needs to be 4X peak performance of the current ES
    (assume Moore's law performance scaling)
  • 160TF peak performance
  • Phased delivery plan with the final system
    available in 2H05,1Q06
  • Assumes a significant funding profile can be
    developed
  • Low risk (build off existing roadmap to the
    extent possible)
  • Full strategy has multiple solutions proposed
  • ANL with Blue Gene/L and ORNL with Cray X1
  • Proposed to be a cooperative development effort
    between NERSC and IBM

8
Approach
  • Study applications critical to DOE Office of
    Science and others. For example
  • Material Science
  • Combustion simulation and adaptive methods
  • Computational astrophysics
  • Nanoscience (new drugs and also new microchip
    technologies)
  • Biochemical and Biosciences (protein
    folding/interactions)
  • Climate modeling
  • High Energy Physics (particle accelerators and
    astrophysics)
  • Multi-grid Eigen solvers and LA methods
  • Identify key bottlenecks found in these critical
    applications
  • Outline a high level approach to address the
    challenges
  • Follow-up meetings for detailed drill down by the
    IBM computer scientists and application
    scientists at NERSC
  • Iterate on proposed solution

9
Other Ideas for Consideration
  • Finely tuned libraries for FFT, FMA, Matrix ops
    (ESSL, PESSL)
  • Hardware acceleration engines for performance
    critical ops especially when software tuning is
    inhibited by other constraints
  • MPI Lite (avoid some of the performance
    inhibiting semantics that are seldom used)
  • Trade-offs would be ordering rules and maybe
    repeatability of results
  • Hardware acceleration engines for MPI (in
    processor adapter)
  • Unified Parallel C programming model and other
    new languages
  • Microkernel OS on compute nodes
  • Other OS enhancements for HPC
  • E.g. no paging of well behaved applications
  • better hooks for daemon control
  • Advanced Cooling to address floor space
  • New CPUs clock is limited by packaging and
    cooling not chip technology
  • Compiler technologies for Viva, VMX, etc.

10
New Class of Computer Architectures for Science
  • Sustained cooperative development of new computer
    architectures
  • Engaging the scientists with the developers
  • A focus on sustained performance of scientific
    applications not on peak performance
  • Addressing the key bottlenecks of bandwidth and
    latency for memory and processor interconnection
  • A new investment in the computer science research
    and scientific research communities
  • Cost Matters
  • If effective scientific supercomputing is only
    available at high cost, it will have impact on
    only a small part of the scientific community.
  • So, we need to leverage the resources of
    mainstream IT companies like IBM, HP and Intel.
  • And our national science policy should motivate
    them to participate durably.

11
Full IBM Blue Planet System Components
  • New IH Wide Node - 8 CPUs per node
  • POWER5 GS Processor - 2.5GHz
  • Single core MCM
  • 2048 node system (8 Nodes per frame)
  • 16K processors _at_ 10GF per CPU 160TF Peak
  • Virtual Vector Architecture - VIVA
  • Federation Switch - 3 stage topology
  • 8GB/s per server for the uni direction
    communication bandwidth.
  • 40-50 TF Sustained on 2-3 selected applications
  • 256 TB of memory 16GB per CPU
  • May reduced to 128TB of memory if it can sustain
    full memory BW
  • 2.5PB disk in I/O system approximately 48 IO
    nodes
  • Approximately 600 Frames
  • 256 compute racks, 250 Disk racks, 160 Switch
    racks
  • 12,000-15,000 Sq Feet 5-7 MWatts Power
  • Scientists will focus on application optimization

12
Blue Planet A Conceptual View
  • Increasing memory bandwidth single core
  • 8 single CPUs are matched with memory address bus
    limits for full memory bandwidth
  • Increasing switch bandwidth 8-way nodes
  • Decreased switch latency while increasing span
  • Enabling vector programming model inside each SMP
    node
  • Sustained performance on science applications at
    a sustainable cost and development model

13
Issues Under Study
  • Issues Covered
  • Memory Bandwidth
  • Especially for small scattered accesses
  • MPI communication latency
  • Protocol path length overhead hurts performance
  • Example UPC very sensitive to this overhead
  • MPI Collective Communication Performance
  • MPI IO Performance
  • MPI scaling to a large number of tasks
  • New Issues
  • Microkernel option
  • VIVA follow-up with the Compiler team

14
Progress Already
  • Additional changes Power 5 CPU
  • Scaling the switch larger than first planned
  • Software performance changes
  • Close to committing small, more memory intensive
    node

15
Progress and Status
  • LLNL/ASCI has become very interested in Blue
    Planet
  • 8 meetings with IBM and LLNL to develop the
    ideas and narrow down design choices
  • CPU Design
  • Node Design
  • Switch/Interconnect
  • Software
  • System Level
  • Libraries
  • Special devices
  • Modeling Performance

16
Node Discussion
  • Large SMP vs Small SMP
  • Impacts switch scaling and hence cost
  • Partitioned vs Not partitioning
  • Impacts memory latencies and switch scaling
  • MCM based vs DCM based nodes
  • Impacts cost and performance
  • Memory Latency
  • Performance sensitivity
  • Memory Bandwidth
  • Streams performance sensitivity
  • Cost

17
Interconnect Discussion
  • Importance of Bisection bandwidth
  • Collectives vs point to point
  • Application classification
  • Fat Tree (multi stage networks) vs 3D mesh
    designs
  • Is a 3D mesh or hypercube based approach cost
    effective?
  • Sensitivity to communication patterns
  • Need to model the communication patterns to
    corresponding bottlenecks
  • What to prioritize
  • Global bandwidth versus local bandwidth
  • Infiniband switch costs
  • Depends on the scale of the system

18
Progress Already
  • System Software
  • Next on the list start consideration
  • Currently IBM is most interested in limited full
    blown system software
  • Studying Microkernel and minimum OSs
  • Modeling
  • IBM Research, NCAR, SDSC, PERC, NERSC, LLNL
  • Developing tools and methods

19
Other Progress
  • Other sites very interested
  • Over 30 sites asked IBM for briefing
  • A number have asked to participate in discussions
  • Blue Planet is the basis for several other
    activities
  • A lot of the basis for recent White Paper for
    HECRTF
  • Continued discussions with DOE/SC
  • Having depth discussions with SGI, Intel, Cray
    and HP
  • SGI has some interesting plans based on DARPA
    HPCS
  • Considering holding a workshop on Blue Planet

20
Summary
  • Think of Blue Planet as a new process as well as
    a single instantiation of a computer architecture
  • Waiting for vendors to produce a product, and
    then evaluating and purchasing will only increase
    the divergence
  • Products are already designed to a different
    design point
  • Ideas for new, sustainable architectures are
    sparse
  • Commodity clustering has more than reached its
    limits of effectiveness
  • Need a modified approach to improve this
    situation for capability science
  • Lou Gerstners book title was Who said Elephants
    Cant Dance. Blue Planet is a collaborative
    effort by some users and some parts of IBM, Cray,
    SGI, and other vendors to do do some fancy
    dancing for Scientific computing.
Write a Comment
User Comments (0)
About PowerShow.com