CCSM4 - A Flexible New Infrastructure for Earth System Modeling - PowerPoint PPT Presentation

About This Presentation
Title:

CCSM4 - A Flexible New Infrastructure for Earth System Modeling

Description:

CCSM4 - A Flexible New Infrastructure for Earth System Modeling Mariana Vertenstein NCAR CCSM Software Engineering Group * * – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 40
Provided by: CGD88
Learn more at: https://sea.ucar.edu
Category:

less

Transcript and Presenter's Notes

Title: CCSM4 - A Flexible New Infrastructure for Earth System Modeling


1
CCSM4 - A Flexible New Infrastructure for Earth
System Modeling
  • Mariana Vertenstein
  • NCAR
  • CCSM Software Engineering Group

2
Major Infrastructure Changes since CCSM3
  • CCSM4/CPL7 development could not have occurred
    without the following collaborators
  • DOE/SciDAC
  • Oak Ridge National Laboratory (ORNL)
  • Argonne National Laboratory (ANL)
  • Los Alamos National Laboratory (LANL)
  • Lawrence Livermore National Laboratory (LLNL)
  • NCAR/CISL
  • ESMF

3
Outline
  • What are software requirements of community earth
    model?
  • Overview of current CCSM4
  • How does CCSM4 address requirements?
  • Flexibility permits greater efficiency,
    throughput, ease of porting and model
    development
  • How is CCSM4 being used in new ways?
  • Interactive ensembles - extending traditional
    definition of component
  • Extending CCSM to ultra high resolutions
  • What is CCSM4 Scalability and Performance?
  • Upcoming releases and new CCSM4 scripts

4
CESM General Software Requirements
5
Specific High Resolution Requirements
Scalable and flexible coupling infrastructure
Parallel I/O throughout model system (for both
scalable memory and performance)
Scalable memory (minimum global arrays) for each
component
Capability to use both MPI and OpenMP effectively
to address requirements of new multi-core
architectures
6
CCSM4 Overview
  • Consists of a set of 4 (5 for CESM) geophysical
    component models on potentially different grids
    that exchange boundary data with each other only
    via communication with a coupler (hub and spoke
    architecture)
  • New science is resulting in sharply increasing
    number of fields being communicated between
    components
  • Large code base gt1M lines
  • Fortran 90 (mostly)
  • Developed over 20 years
  • 200-300K lines are critically important --gt no
    comp kernels, need good compilers
  • Collaborations are critical
  • DOE/SciDAC, University Community, NSF (PetaApps),
    ESMF

7
What are the CCSM Components?
CAM DATM (WRF)
Atmosphere Component
CAM Modes Multiple Dycores, Multiple Chemistry
Options, WACCM, single column
Data-ATM Multiple Forcing/Physics Modes
CLM DLND (VIC)
Land Component
CLM Modes no BGC, BGC, Dynamic-Vegetation,
BGC-DV, Prescribed-Veg, Urban
Data-LND Multiple Forcing/Physics Modes
CICE DICE
Ice Component
CICE Modes Fully Prognostic, Prescribed
Data-ICE Multiple Forcing/Physics Modes
POP DOCN(SOM/DOM) (ROMS)
Ocean Component
POP Modes Ecosystem, Fully-coupled, Ocean-only,
Multiple Physics Options
Data-OCN Multiple Forcing/Physics Modes
(SOM/DOM)
New Land Ice Component
Coupler Regridding, Merging, Calculation of
ATM/OCN fluxes, Conservation diagnostic
8
CCSM Component Grids
  • Ocean and Sea-Ice must run on same grid
  • displaced pole, tripole
  • Atmosphere and Land can now run on different
    grids
  • these in general are different from the ocean/ice
    grid
  • lat/lon, but also new cubed sphere for CAM
  • Globally grids span low resolution (3 degree) to
    ultra-high
  • 0.25? ATM/LND 1152 x 768
  • 0.50? ATM/LND 576 x 384
  • 0.1? OCN/ICE 3600 x 2400
  • Regridding
  • Done in parallel at runtime using mapping files
    that are generated offline using SCRIP
  • In past, grids have been global and logically
    rectangular but now can have single point,
    regional, cubed sphere
  • Regridding issues are rapidly becoming a higher
    priority

9
CCSM Component Parallelism
  • MPI/OpenMP
  • CAM, CLM, CICE, POP have
  • MPI/OpenMP hybrid capability
  • Coupler only has MPI capability
  • Data models only have MPI capability
  • Parallel I/O (use of PIO library)
  • CAM, CICE, POP, CPL, Data models all have PIO
    capability

10
New CCSM4 Architecture
11
Advantages of CLP7 Design
  • New flexible coupling strategy
  • Design targets a wide range of architectures -
    massively parallel peta-scale hardware, smaller
    linux clusters, and even single laptop computers
  • Provides efficient support of varying levels of
    parallelism via simple run-time configuration for
    processor layout
  • New CCSM4 scripts provide one simple xml file to
    specify processor layout of entire system and
    automated timing information to simplify load
    balancing
  • Scientific unification
  • ALL model development done with one code base -
    elimination of separate stand-alone component
    code bases (CAM, CLM)
  • Code Reuse and Maintainability
  • Lowers cost of support/maintenance

12
More CPL7 advantages
  • Simplicity
  • Easier to debug - much easier to understand time
    flow
  • Easier to port ported to
  • IBM p6 (NCAR)
  • Cray XT4/XT5 (NICS,ORNL,NERSC)
  • BGP (Argonne), BGL (LLNL)
  • Linux Clusters (NCAR, NERSC, CCSM4-alpha users)
  • Easier to run - new xml-based scripts permit
    user-friendly capability to create out-of-box
    experiments
  • Performance (throughput and efficiency)
  • Much greater flexibility to achieve optimal load
    balance for different choices of
  • Resolution, Component combinations, Component
    physics
  • Automatically generated timing tables provide
    users with immediate feedback on both performance
    and efficiency

13
CCSM4 Provides a Seamless End-to-End Cycle of
Model Development, Integration and Prediction
with One Unified Model Code Base
14
New frontiers for CCSM
  • Using the coupling infrastructure in novel ways
  • Implementation of interactive ensembles
  • Pushing the limits of high resolution
  • Capability to really exercise the scalability and
    performance of the system

15
CCSM4 and PetaApps
  • CCSM4/CPL7 is integral piece of NSF Petaapps
    award
  • Funded 3 year effort aimed at advancing climate
    science capability for petascale systems
  • NCAR, COLA, NERSC, U. Miami
  • Interactive ensembles using CCSM4/CPL7 involves
    both computational and scientific challenges
  • used to understand how oceanic, sea-ice and
    atmospheric noise impacts climate variability
  • can also scale out to tens of thousands of
    processors
  • Also examine use of PGAS language in CCSM

16

Interactive Ensembles and CPL7
  • All Ensemble members run concurrently on
    non-overlapping processor sets
  • Communication with coupler takes place serially
    over ensemble members
  • Setting new number of ensembles requires editing
    1 line of an xml file
  • 35M CPU hours TeraGrid 2nd largest

Driver
Currently being used to perform ocean data
assimilation (using DART) for POP2
POP
time
CPL
CLM
CICE
processors
17
CCSM4 and Ultra High Resolution
  • DOE/LLNL Grand Challenge Simulation
  • .25 atmosphere/land and .1 ocean/ice
  • Multi-institutional collaboration (ANL, LANL,
    LLNL, NCAR, ORNL)
  • First ever U.S. multi-decadal global climate
    simulation with eddy resolving ocean and high
    resolution atmosphere
  • 0.42 sypd on 4048 cpus (Atlas LLNL cluster)
  • 20 years completed
  • 100 GB/simulated month

18
Ultra High Resolution (cont)
  • NSF/PetaApps Control Simulation (IE baseline)
    John Dennis (CISL) has carried this out
  • .5 atmosphere/land and .1 ocean/ice
  • Control run in production _at_ NICS (Teragrid)
  • 1.9 sypd on 5848 quad-core XT5 cpus (4-5 months
    continuous simulation)
  • 155 years completed
  • 100TB of data generated (generating 0.5-1 TB per
    wall clock day)
  • 18M CPU hours used
  • Transfer output from NICS to NCAR (100 180
    MB/sec sustained) archive on HPSS
  • Data analysis using 55 TB project space at NCAR

19
Next steps at high resolution
  • Future work
  • Use OpenMP capability in all components
    effectively to take advantage multi-core
    architectures
  • Cray XT5 hex-core and BG/P
  • Improve disk I/O performance currently using 10
    - 25 of time
  • Improve memory footprint scalability
  • Future simulations
  • .25 atm/ .1 ocean
  • T341 atm/ .1 ocean (effect of Eulerian dycore)
  • 1/8 atm (HOMME)/.25 land/ .1 ocean

20
CCSM4 Scalability and Performance
21
New Parallel I/O library (PIO)
  • Interface between the model and the I/O library.
    Supports
  • Binary
  • NetCDF3 (serial netcdf)
  • Parallel NetCDF (pnetcdf) (MPI/IO)
  • NetCDF4
  • User has enormous flexibility to choose what
    works best for their needs
  • Can read one format and write another
  • Rearranges data from model decomp to I/O
    friendly decomp (rearranger is framework
    independent) model tasks and I/O tasks can be
    independent

22
PIO in CCSM
  • PIO implemented in CAM, CICE and POP
  • Usage is critical for high resolution, high
    processor count simulations
  • Serial I/O is one of the largest sources of
    global memory in CCSM - will eventually always
    run out of memory
  • Serial I/O results in serious performance penalty
    at higher processor counts
  • Performance benefit noticed even with serial
    netcdf (model output decomposed on output I/O
    tasks)

23
CPL scalability
  • Scales much better than previous version both
    in memory and throughput
  • Inherently involves a lot of communication versus
    flops
  • New coupler has not been a bottleneck in any
    configuration we have tested so far other
    issues such as load balance and scaling of other
    processes have dominated
  • Minor impact at 1800 cores (kraken peta-apps
    control)

24
CCSM4 Cray XT Scalability
CAM 1664
time
POP 4028
CICE 1800
CPL 1800
processors
1.9 sypd on 5844 cores with i/o on kraken
quad-core XT5
(Courtesy of John Dennis)
25
CAM/HOMME Dycore
Cubed-sphere grid overcomes dynamical core
scalability problems inherent with lat/lon
grid Work of Mark Taylor (SciDAC), Jim Edwards
(IBM), Brian Eaton(CSEG)
  • PIO library used for all I/O (work COULD NOT have
    been done without PIO)
  • BGP (4 cores/node) Excellent scalability down
    to 1 element per processor (86,200 processors at
    0.25 degree resolution).
  • JaguarPF (12 cores/node) 2-3x faster per core
    than BGP, scaling not as good - 1/8 degree run
    loosing scalability at 4 elements per processor

26
CAM/HOMMME Real Planet 1/8 Simulations
  • CCSM4 - CAM4 physics configuration with cyclical
    year 2000 ocean forcing data sets
  • CAM-HOMME 1/8, 86400 cores
  • CLM2 on lat/lon 1/4, 512 cores
  • Data ocean/ice, 1, 512 cores
  • Coupler, 8640 cores
  • Jaguarpf simulation
  • Excellent scalability 1/8 degree running at 3
    SYPD on Jaguar
  • Large scale features agree well with Eulerian and
    FV dycores
  • Runs confirm that the scalability of the
    dynamical core is preserved by CAM and the
    scalability of CAM is preserved by CCSM real
    planet configuration.

27
How will CCSM4 be released?
  • Leverage Subversion revision control system
  • Source code and Input Data obtained from
    Subversion servers (not tar files)
  • Output data of control runs from ESG
  • Advantages
  • Easier for CSEG to produce frequent updates
  • Flexible way to have users obtain new updates of
    source code (and bug fixes)
  • Users can leverage Subversion to merge new
    updates into their sandbox with their
    modifications

28

Obtaining the Code and Updates
Subversion Source Code Repository
(Public) https//svn-ccsm-release.cgd.ucar.edu
29
Creating an Experimental Case
  • New CCSM4 Scripts Simplify
  • Porting CCSM4 to your machine
  • Creating your experiment and obtaining necessary
    input data for your experiment
  • Load Balancing your experiment
  • Debugging your experiment- if something goes
    wrong during the simulation (never happen of
    course) - simpler to determine what it is

30
Porting to your machine
  • CCSM4 scripts contain a set of supported machines
    user can run out of the box
  • CCSM4 scripts also support a set of generic
    machines (e.g. linux clusters with a variety of
    compilers)
  • user still needs to determine which generic
    machine most closely resembles their machine and
    needs to customize Makefile macros for their
    machine
  • user feedback will be leveraged to continuously
    upgrade the generic machine capability
    post-release

31
Obtaining Input Data
  • Input data is now in Subversion repository
  • Entire input data is about 900 GB and growing
  • CCSM4 scripts permit user automatically obtain
    only the input data need for a given experimental
    configuration

32

Accessing input data for your experiment
Set up experiment create_newcase (component set,
resolution, machine)
33
Load Balancing Your Experiment
  • Load balancing exercise must be done before
    starting an experiment
  • Repeat short experiments (20 days) without I/O
    and adjust processor layout to
  • optimize throughput
  • minimize idle time (maximize efficiency)
  • Detailed timing results are produced with each
    run
  • Makes load balancing exercise much simpler than
    in CCSM3

34
Load Balancing CCSM Example
35
CCSM4 Releases and Timelines
  • January 15, 2010
  • CCSM4.0 alpha release - to subset of users and
    vendors with minimal documentation (except for
    script's User's Guide)
  • April 1, 2010
  • CCSM4.0 release - Full documentation, including
    User's Guide, Model Reference Documents, and
    experimental data
  • June 1, 2010 CESM1.0 release
  • ocean ecosystem, CAM-AP, interactive chemistry,
    WACCM
  • New CCSM output data web design underway
    (including comprehensive diagnostics)

36
CCSM4.0 alpha release Extensive CCSM4 Users
Guide already in place apply for alpha user
access at www.ccsm.ucar.edu/models/ccsm4.0
37
Upcoming Challenges
  • This year
  • Carry out IPCC simulations
  • Release CCSM4 and CESM1 and updates
  • Resolve performance and memory issues with
    ultra-high resolution configuration on Cray XT5
    and BG/P
  • Create user-friendly validation process for
    porting to new machines
  • On the horizon
  • Support regional grids
  • Nested regional modeling in CPL7
  • Migration to optimization for GPUs

38
Big Interdisciplinary Team!
S. Mishra (NCAR) S. Peacock (NCAR) K. Lindsay
(NCAR) W. Lipscomb (LANL) R. Loft (NCAR) R. Loy
(ANL) J. Michalakes (NCAR) A. Mirin (LLNL) M.
Maltrud (LANL) J. McClean (LLNL) R. Nair
(NCAR) M. Norman (NCSU) N. Norton (NCAR) T. Qian
(NCAR) M. Rothstein (NCAR) C. Stan (COLA) M.
Taylor (SNL) H. Tufo (NCAR) M. Vertenstein
(NCAR) J. Wolfe (NCAR) P. Worley (ORNL) M. Zhang
(SUNYSB)
  • Contributors
  • D. Bader (ORNL)
  • D. Bailey (NCAR)
  • C. Bitz (U Washington)
  • F. Bryan (NCAR)
  • T. Craig (NCAR)
  • A. St. Cyr (NCAR)
  • J. Dennis (NCAR)
  • B. Eaton (NCAR)
  • J. Edwards (IBM)
  • B. Fox-Kemper (MIT,CU)
  • N. Hearn (NCAR)
  • E. Hunke (LANL)
  • B. Kauffman (NCAR)
  • E. Kluzek (NCAR)
  • B. Kadlec (CU)
  • D. Ivanova (LLNL)
  • E. Jedlicka (ANL)
  • E. Jessup (CU)
  • Funding
  • DOE-BER CCPP Program Grant
  • DE-FC03-97ER62402
  • DE-PS02-07ER07-06
  • DE-FC02-07ER64340
  • BR KP1206000
  • DOE-ASCR
  • BR KJ0101030
  • NSF Cooperative Grant NSF01
  • NSF PetaApps Award
  • Computer Time
  • Blue Gene/L time
  • NSF MRI Grant
  • NCAR
  • University of Colorado
  • IBM (SUR) program
  • BGW Consortium Days
  • IBM research (Watson)
  • LLNL

39
  • Thanks! Questions?
  • CCSM4.0 alpha release page at
  • www.ccsm.ucar.edu/models/ccsm4.0
Write a Comment
User Comments (0)
About PowerShow.com