The Impact of Global Petascale Plans on Geoscience Modeling - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

The Impact of Global Petascale Plans on Geoscience Modeling

Description:

Good news: the race to petascale is on and is international... Source of funds: Presidential Innovation Initiative announce in SOTU. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 22
Provided by: TomBe
Category:

less

Transcript and Presenter's Notes

Title: The Impact of Global Petascale Plans on Geoscience Modeling


1
The Impact of Global Petascale Plans on
Geoscience Modeling
  • Richard Loft
  • SCD Deputy Director for RD

2
Outline
  • Good news/bad news about the petascale
    architectures.
  • NSF Petascale Track-1 Solicitation
  • A sample problem description from NSF Track-1.
  • NCARs petscale science response.
  • CCSM as a petascale application

3
Good news the race to petascale is on and is
international
  • Earth Simulator-2 (MEXT) in Japan is committed to
    regaining ES leadership position by 2011.
  • DOE is deploying 1 PFLOPS peak system by 2008.
  • Europe has the ENES EVElab program program.
  • NSF Track-1 solicitation targets 2010 system.
  • Lots of opportunities for ES modeling on
    petascale systems - worldwide!
  • But how does this impact ES research/application
    development plans?

4
Bad news computer architecture is facing
significant challenges
  • Memory wall memory speeds not keeping up with
    CPU
  • Memory is optimized for density not speed
  • Which causes CPU latency to memory in terms of
    the number of CPU clocks per load to increase
  • Which causes more and more on-chip real-estate
    used for cache
  • Which causes cache lines are getting longer
  • Which causes microprocessors to become less
    forgiving of irregular memory access patterns.
  • Microprocessor performance improvement has slowed
    since 2002, and are already 3x off projected
    levels of performance for 2006, based on pre-2002
    historical rate of improvement.
  • Key driver is power consumption.
  • Future feature size shrinks will be devoted to
    more CPUs per chip.
  • Rumors of Japanese chips with 1024 CPUs per chip
    at ISCA-33.
  • Design verification of multi-billion gate chips,
    fab-ability, reliability (MTBF), fault tolerance,
    are becoming serious issues.
  • Can we program these things?

5
Best Guess about Architectures 2010
  • 5 GHz is looking like an aggressive clock speed
    for 2010.
  • For 8 CPUs/sockets (chip) this is about 80
    GFLOPS peak/socket.
  • 2 PFLOPS is 25,000 sockets with 200,000 CPUs.
  • Key unknown is which architecture for a cluster
    on a chip will be most effective (there are many
    ways to organize a CMP.
  • Vector systems will be around, but at what price?
  • Wildcards
  • Impact of DARPA HPCS program architectures.
  • Exotics in the wings MTAs, FPGAs, PIMs,
    GPUs, etc.

6
NSF Track-1 System Background
  • Source of funds Presidential Innovation
    Initiative announce in SOTU.
  • Performance goal 1 PFLOPS sustained on
    interesting problems.
  • Science goal breakthroughs
  • Use model 12 research teams per year using whole
    system for days or weeks at a time.
  • Capability system - large everything fault
    tolerant.
  • Single system in one location.
  • Not a requirement that machine be upgradable.

7
The NSF Track-1 petascale system proposal is
out NSF06-573
8
Track-1 Project Parameters
  • Funds 200M over 4 years, starting FY07
  • Single award
  • Money is for end-to-end system (as in 625)
  • Not intended to fund facility.
  • Release of funds tied to meeting hw and sw
    milestones.
  • Deployment Stages
  • Simulator
  • Prototype
  • Petascale system operates FY10-FY15
  • Operations funds FY10-15 funded separately.
  • Facility costs not included.

9
Two Stage Award Process Timeline
  • Solicitation out June 6, 2006
  • HPCS down-select July, 2006
  • Preliminary Proposal due September 8, 2006
  • Down selection (invitation to 3-4 to write Full
    Proposal)
  • Full Proposal due February 2, 2007
  • Site visits Spring, 2007
  • Award Sep, 2007

10
NSFs view of the problem
  • NSF recognizes the facility (power, cooling,
    space) challenge of this system.
  • NSF recognizes the need for fault tolerance
    features.
  • NSF recognizes that applications will need
    significant modification to run on this systems.
  • NSF expects Track-1 proposer to discuss needs
    with application experts (many are in this room).
  • NSF application support funds - expect
    solicitation out in September, 2006.

11
Sample benchmark problem (from Solicitation)
  • A 12,288-cubed simulation of fully developed
    homogeneous turbulence in a periodic domain for
    one eddy turnover time at a value of Rlambda of
    O(2000). The model problem should be solved using
    a dealiased, pseudospectral algorithm, a
    fourth-order explicit Runge-Kutta time-stepping
    scheme, 64-bit floating point (or similar)
    arithmetic, and a time-step of 0.0001 eddy
    turnaround times. Full resolution snapshots of
    the three-dimensional vorticity, velocity and
    pressure fields should be saved to disk every
    0.02 eddy turnaround times. The target wall-clock
    time for completion is 40 hours.

12
Back of the envelope calculations forturbulence
example
  • N12288
  • One 64 bit variable 14.8 TB of memory
  • 3 time levels x 5 variables (u,v,w,P,vor) 222 TB
    of memory
  • File output every 0.02 eddy turn-over times 74
    TB/snapshot
  • Total output in 40 hour run 3.7 PB
  • I/O BW gtgt 256 GB/sec
  • 3(NNN(65)) 361.8 TFLOP/field/step
  • Assume 4 variables (u,v,w,P) 1.447 PFLOP/step
  • Must average 14.4 seconds/step
  • Mean flops rate 100 TFLOPS. (what a
    coincidence)
  • Real FLOPS rate gtgt 100 TFLOPS because of I/O
    comms overhead.
  • Must communicate 384NNN 178 TB per
    timestep - aggregate network BW gtgt 12.4 TB/sec

13
Turbulence problem system resource estimates
  • gtgt 5 GFLOPS sustained per socket.
  • Computations must scale well on chip.
  • 10 GFLOPS sustained probably more realistics.
  • Probably doable with optimized RFFT calls (FFTW).
  • gt 8 GB memory/socket (1-2 GB/CPU)
  • gtgt 0.5 GB/sec/socket sustained system bisection
    BW
  • For a 100,000 byte message
  • Realistically ( 2 GB/sec/socket )
  • gtgt 670 disk spindles saturated during I/O
  • Realistically ( 2k-4k disks
  • multi-pbyte RAID

14
UCAR Peta-Process
  • Define UCAR petascale science goals
  • Develop system requirements
  • Make these available to Track-1 proposal writers
  • Define application development resource
    requirements
  • Fold these into proposals for additional
    resources

15
Peta-process details
  • A few, strategic, specific and realistic science
    goals for exploiting petascale systems. What are
    the killer apps?
  • The CI requirements (system attributes, data
    archive footprint, etc.) of the entire toolchain
    for the science workflow for each.
  • A mechanism to provide part of this information
    to all the consortia competing for the 200M.
  • The project retasking required to ultimately
    write viable proposals for time on petascale
    systems over the next 4 years.
  • Resource requirements for
  • staff augmentations
  • Local equipment infrastructure enhancements
  • Build University collaborations to support this
    effort.

16
Relevant Science Areas (from NSF Track-1
Solicitation)
  • The detailed structure of, and the nature of
    intermittency in, stratified and unstratified,
    rotating and non-rotating turbulence in classical
    and magnetic fluids, and in chemically reacting
    mixtures
  • The nonlinear interactions between cloud systems,
    weather systems and the Earths climate
  • The dynamics of the Earths coupled, carbon,
    nitrogen and hydrologic cycles
  • The decadal dynamics of the hydrology of large
    river basins
  • The onset of coronal mass ejections and their
    interaction with the Earths magnetic field,
    including modeling magnetic reconnection and
    geo-magnetic sub-storms
  • The coupled dynamics of marine and terrestrial
    ecosystems and oceanic and atmospheric physics
  • The interaction between chemical reactivity and
    fluid dynamics in complex systems such as
    combustion, atmospheric chemistry, and chemical
    processing.

17
Peta-science Ideas (slide 1 of 2)
  • Topic 1. Across scale modeling simulation of the
    21st century climate with a coupled
    atmosphere-ocean model at 0.1 degree resolution
    (eddy resolving in the ocean). For specific time
    periods of the integration, shorter-time
    simulations with higher spatial resolution 1 km
    with a nonhydrostatic global atmospheric model
    and 100 m resolution in a nested regional model.
    Emphasis will be put the explicit representation
    of moist turbulence, convection and hydrological
    cycle.
  • Topic 2. Interactions between atmospheric layers
    and response of the atmosphere to solar
    variability. Simulations of the atmospheric
    response to 10-15 solar cycles derived by a
    high-resolution version of WACCM (with explicit
    simulation of the QBO) coupled to an ocean model.

18
Peta-science Ideas (slide 2 of 2)
  • Topic 3. Simulation of Chemical Weather High
    resolution chemical/dynamical model with rather
    detailed chemical scheme. Study of global air
    pollution, impact of mega-cities wildfires, and
    other pollution sources.
  • Topic 4. Solar MHD a high resolution model of
    turbulence in the solar photosphere.

19
Will CCSM qualify on the Track-1 system? Maybe!
  • This system is designed to be 10x bigger than
    Track-2 systems being funded by OCI money at 30M
    a piece over the next 5 years.
  • Therefore, a case can be made an qualifying
    ensemble application that could run on at least
    10-25 of the system, even as an ensemble in
    order to justify needing the resource.
  • This means a good target for one instance of CCSM
    is 10,000 to 50,000 processors.
  • John Dennis (with Bryan and Jones) has done some
    work on POP 2.1 _at_ 0.1 degree that looks
    encouraging at 30,000 CPUs

20
Some petascale application dos and donts
  • Dont assume the node is a rack mounted server.
  • No local disk drive
  • No full kernel - beware of relying on system
    calls.
  • No serialization of memory or I/O
  • Global arrays read in and distributed
  • Serialized I/O (I.e. through node 0) of any kind,
    will be a show stopper
  • Multiple executables applications may be
    problematic
  • Eschew giant look-up tables.
  • No unaddressed load imbalances.
  • Dynamic load balancing schemes
  • Communication overlapping
  • Algorithms with irregular memory accesses will be
    perform poorly.

21
Some design question for discussion
  • How CCSM can use different CMPs and/or MTAs
    architectures effectively? Consider-
  • Power 5 ( 2 CPU per chip, 2 threads per CPU)
  • Niagara (8 CPUs per chip, 4 thread per CPU)
  • Cell (assymetrical - 1 scalar 8 Synergistic
    procs per chip)
  • New coupling strategies?
  • Time to scrap multi-executables?
  • New ensemble strategies?
  • Stacking instances as multiple threads on an CPU?
  • Will the software we rely on scale?
  • MPI
  • ESMF
  • pNetCDF
  • NCL
Write a Comment
User Comments (0)
About PowerShow.com