Title: CCSM4 - A Flexible New Infrastructure for Earth System Modeling
1CCSM4 - A Flexible New Infrastructure for Earth
System Modeling
- Mariana Vertenstein
- NCAR
- CCSM Software Engineering Group
2Major Infrastructure Changes since CCSM3
- CCSM4/CPL7 development could not have occurred
without the following collaborators - DOE/SciDAC
- Oak Ridge National Laboratory (ORNL)
- Argonne National Laboratory (ANL)
- Los Alamos National Laboratory (LANL)
- Lawrence Livermore National Laboratory (LLNL)
- NCAR/CISL
- ESMF
3Outline
- What are software requirements of community earth
model? - Overview of current CCSM4
- How does CCSM4 address requirements?
- Flexibility permits greater efficiency,
throughput, ease of porting and model
development - How is CCSM4 being used in new ways?
- Interactive ensembles - extending traditional
definition of component - Extending CCSM to ultra high resolutions
- What is CCSM4 Scalability and Performance?
- Upcoming releases and new CCSM4 scripts
4CESM General Software Requirements
5Specific High Resolution Requirements
Scalable and flexible coupling infrastructure
Parallel I/O throughout model system (for both
scalable memory and performance)
Scalable memory (minimum global arrays) for each
component
Capability to use both MPI and OpenMP effectively
to address requirements of new multi-core
architectures
6CCSM4 Overview
- Consists of a set of 4 (5 for CESM) geophysical
component models on potentially different grids
that exchange boundary data with each other only
via communication with a coupler (hub and spoke
architecture) - New science is resulting in sharply increasing
number of fields being communicated between
components - Large code base gt1M lines
- Fortran 90 (mostly)
- Developed over 20 years
- 200-300K lines are critically important --gt no
comp kernels, need good compilers - Collaborations are critical
- DOE/SciDAC, University Community, NSF (PetaApps),
ESMF
7What are the CCSM Components?
CAM DATM (WRF)
Atmosphere Component
CAM Modes Multiple Dycores, Multiple Chemistry
Options, WACCM, single column
Data-ATM Multiple Forcing/Physics Modes
CLM DLND (VIC)
Land Component
CLM Modes no BGC, BGC, Dynamic-Vegetation,
BGC-DV, Prescribed-Veg, Urban
Data-LND Multiple Forcing/Physics Modes
CICE DICE
Ice Component
CICE Modes Fully Prognostic, Prescribed
Data-ICE Multiple Forcing/Physics Modes
POP DOCN(SOM/DOM) (ROMS)
Ocean Component
POP Modes Ecosystem, Fully-coupled, Ocean-only,
Multiple Physics Options
Data-OCN Multiple Forcing/Physics Modes
(SOM/DOM)
New Land Ice Component
Coupler Regridding, Merging, Calculation of
ATM/OCN fluxes, Conservation diagnostic
8CCSM Component Grids
- Ocean and Sea-Ice must run on same grid
- displaced pole, tripole
- Atmosphere and Land can now run on different
grids - these in general are different from the ocean/ice
grid - lat/lon, but also new cubed sphere for CAM
- Globally grids span low resolution (3 degree) to
ultra-high - 0.25? ATM/LND 1152 x 768
- 0.50? ATM/LND 576 x 384
- 0.1? OCN/ICE 3600 x 2400
- Regridding
- Done in parallel at runtime using mapping files
that are generated offline using SCRIP - In past, grids have been global and logically
rectangular but now can have single point,
regional, cubed sphere - Regridding issues are rapidly becoming a higher
priority
9CCSM Component Parallelism
- MPI/OpenMP
- CAM, CLM, CICE, POP have
- MPI/OpenMP hybrid capability
- Coupler only has MPI capability
- Data models only have MPI capability
- Parallel I/O (use of PIO library)
- CAM, CICE, POP, CPL, Data models all have PIO
capability
10New CCSM4 Architecture
11Advantages of CLP7 Design
- New flexible coupling strategy
- Design targets a wide range of architectures -
massively parallel peta-scale hardware, smaller
linux clusters, and even single laptop computers - Provides efficient support of varying levels of
parallelism via simple run-time configuration for
processor layout - New CCSM4 scripts provide one simple xml file to
specify processor layout of entire system and
automated timing information to simplify load
balancing - Scientific unification
- ALL model development done with one code base -
elimination of separate stand-alone component
code bases (CAM, CLM) - Code Reuse and Maintainability
- Lowers cost of support/maintenance
12More CPL7 advantages
- Simplicity
- Easier to debug - much easier to understand time
flow - Easier to port ported to
- IBM p6 (NCAR)
- Cray XT4/XT5 (NICS,ORNL,NERSC)
- BGP (Argonne), BGL (LLNL)
- Linux Clusters (NCAR, NERSC, CCSM4-alpha users)
- Easier to run - new xml-based scripts permit
user-friendly capability to create out-of-box
experiments - Performance (throughput and efficiency)
- Much greater flexibility to achieve optimal load
balance for different choices of - Resolution, Component combinations, Component
physics - Automatically generated timing tables provide
users with immediate feedback on both performance
and efficiency
13CCSM4 Provides a Seamless End-to-End Cycle of
Model Development, Integration and Prediction
with One Unified Model Code Base
14New frontiers for CCSM
- Using the coupling infrastructure in novel ways
- Implementation of interactive ensembles
- Pushing the limits of high resolution
- Capability to really exercise the scalability and
performance of the system
15CCSM4 and PetaApps
- CCSM4/CPL7 is integral piece of NSF Petaapps
award - Funded 3 year effort aimed at advancing climate
science capability for petascale systems - NCAR, COLA, NERSC, U. Miami
- Interactive ensembles using CCSM4/CPL7 involves
both computational and scientific challenges - used to understand how oceanic, sea-ice and
atmospheric noise impacts climate variability - can also scale out to tens of thousands of
processors - Also examine use of PGAS language in CCSM
16Interactive Ensembles and CPL7
- All Ensemble members run concurrently on
non-overlapping processor sets - Communication with coupler takes place serially
over ensemble members - Setting new number of ensembles requires editing
1 line of an xml file - 35M CPU hours TeraGrid 2nd largest
Driver
Currently being used to perform ocean data
assimilation (using DART) for POP2
POP
time
CPL
CLM
CICE
processors
17CCSM4 and Ultra High Resolution
- DOE/LLNL Grand Challenge Simulation
- .25 atmosphere/land and .1 ocean/ice
- Multi-institutional collaboration (ANL, LANL,
LLNL, NCAR, ORNL) - First ever U.S. multi-decadal global climate
simulation with eddy resolving ocean and high
resolution atmosphere - 0.42 sypd on 4048 cpus (Atlas LLNL cluster)
- 20 years completed
- 100 GB/simulated month
18Ultra High Resolution (cont)
- NSF/PetaApps Control Simulation (IE baseline)
John Dennis (CISL) has carried this out - .5 atmosphere/land and .1 ocean/ice
- Control run in production _at_ NICS (Teragrid)
- 1.9 sypd on 5848 quad-core XT5 cpus (4-5 months
continuous simulation) - 155 years completed
- 100TB of data generated (generating 0.5-1 TB per
wall clock day) - 18M CPU hours used
- Transfer output from NICS to NCAR (100 180
MB/sec sustained) archive on HPSS - Data analysis using 55 TB project space at NCAR
19Next steps at high resolution
- Future work
- Use OpenMP capability in all components
effectively to take advantage multi-core
architectures - Cray XT5 hex-core and BG/P
- Improve disk I/O performance currently using 10
- 25 of time - Improve memory footprint scalability
- Future simulations
- .25 atm/ .1 ocean
- T341 atm/ .1 ocean (effect of Eulerian dycore)
- 1/8 atm (HOMME)/.25 land/ .1 ocean
20CCSM4 Scalability and Performance
21New Parallel I/O library (PIO)
- Interface between the model and the I/O library.
Supports - Binary
- NetCDF3 (serial netcdf)
- Parallel NetCDF (pnetcdf) (MPI/IO)
- NetCDF4
- User has enormous flexibility to choose what
works best for their needs - Can read one format and write another
- Rearranges data from model decomp to I/O
friendly decomp (rearranger is framework
independent) model tasks and I/O tasks can be
independent
22PIO in CCSM
- PIO implemented in CAM, CICE and POP
- Usage is critical for high resolution, high
processor count simulations - Serial I/O is one of the largest sources of
global memory in CCSM - will eventually always
run out of memory - Serial I/O results in serious performance penalty
at higher processor counts - Performance benefit noticed even with serial
netcdf (model output decomposed on output I/O
tasks)
23CPL scalability
- Scales much better than previous version both
in memory and throughput - Inherently involves a lot of communication versus
flops - New coupler has not been a bottleneck in any
configuration we have tested so far other
issues such as load balance and scaling of other
processes have dominated - Minor impact at 1800 cores (kraken peta-apps
control)
24CCSM4 Cray XT Scalability
CAM 1664
time
POP 4028
CICE 1800
CPL 1800
processors
1.9 sypd on 5844 cores with i/o on kraken
quad-core XT5
(Courtesy of John Dennis)
25CAM/HOMME Dycore
Cubed-sphere grid overcomes dynamical core
scalability problems inherent with lat/lon
grid Work of Mark Taylor (SciDAC), Jim Edwards
(IBM), Brian Eaton(CSEG)
- PIO library used for all I/O (work COULD NOT have
been done without PIO) - BGP (4 cores/node) Excellent scalability down
to 1 element per processor (86,200 processors at
0.25 degree resolution). - JaguarPF (12 cores/node) 2-3x faster per core
than BGP, scaling not as good - 1/8 degree run
loosing scalability at 4 elements per processor
26CAM/HOMMME Real Planet 1/8 Simulations
- CCSM4 - CAM4 physics configuration with cyclical
year 2000 ocean forcing data sets - CAM-HOMME 1/8, 86400 cores
- CLM2 on lat/lon 1/4, 512 cores
- Data ocean/ice, 1, 512 cores
- Coupler, 8640 cores
- Jaguarpf simulation
- Excellent scalability 1/8 degree running at 3
SYPD on Jaguar - Large scale features agree well with Eulerian and
FV dycores - Runs confirm that the scalability of the
dynamical core is preserved by CAM and the
scalability of CAM is preserved by CCSM real
planet configuration.
27How will CCSM4 be released?
- Leverage Subversion revision control system
- Source code and Input Data obtained from
Subversion servers (not tar files) - Output data of control runs from ESG
- Advantages
- Easier for CSEG to produce frequent updates
- Flexible way to have users obtain new updates of
source code (and bug fixes) - Users can leverage Subversion to merge new
updates into their sandbox with their
modifications
28Obtaining the Code and Updates
Subversion Source Code Repository
(Public) https//svn-ccsm-release.cgd.ucar.edu
29Creating an Experimental Case
- New CCSM4 Scripts Simplify
- Porting CCSM4 to your machine
- Creating your experiment and obtaining necessary
input data for your experiment - Load Balancing your experiment
- Debugging your experiment- if something goes
wrong during the simulation (never happen of
course) - simpler to determine what it is
30Porting to your machine
- CCSM4 scripts contain a set of supported machines
user can run out of the box - CCSM4 scripts also support a set of generic
machines (e.g. linux clusters with a variety of
compilers) - user still needs to determine which generic
machine most closely resembles their machine and
needs to customize Makefile macros for their
machine - user feedback will be leveraged to continuously
upgrade the generic machine capability
post-release
31Obtaining Input Data
- Input data is now in Subversion repository
- Entire input data is about 900 GB and growing
- CCSM4 scripts permit user automatically obtain
only the input data need for a given experimental
configuration
32Accessing input data for your experiment
Set up experiment create_newcase (component set,
resolution, machine)
33Load Balancing Your Experiment
- Load balancing exercise must be done before
starting an experiment - Repeat short experiments (20 days) without I/O
and adjust processor layout to - optimize throughput
- minimize idle time (maximize efficiency)
- Detailed timing results are produced with each
run - Makes load balancing exercise much simpler than
in CCSM3
34Load Balancing CCSM Example
35CCSM4 Releases and Timelines
- January 15, 2010
- CCSM4.0 alpha release - to subset of users and
vendors with minimal documentation (except for
script's User's Guide) - April 1, 2010
- CCSM4.0 release - Full documentation, including
User's Guide, Model Reference Documents, and
experimental data - June 1, 2010 CESM1.0 release
- ocean ecosystem, CAM-AP, interactive chemistry,
WACCM - New CCSM output data web design underway
(including comprehensive diagnostics)
36CCSM4.0 alpha release Extensive CCSM4 Users
Guide already in place apply for alpha user
access at www.ccsm.ucar.edu/models/ccsm4.0
37Upcoming Challenges
- This year
- Carry out IPCC simulations
- Release CCSM4 and CESM1 and updates
- Resolve performance and memory issues with
ultra-high resolution configuration on Cray XT5
and BG/P - Create user-friendly validation process for
porting to new machines - On the horizon
- Support regional grids
- Nested regional modeling in CPL7
- Migration to optimization for GPUs
38Big Interdisciplinary Team!
S. Mishra (NCAR) S. Peacock (NCAR) K. Lindsay
(NCAR) W. Lipscomb (LANL) R. Loft (NCAR) R. Loy
(ANL) J. Michalakes (NCAR) A. Mirin (LLNL) M.
Maltrud (LANL) J. McClean (LLNL) R. Nair
(NCAR) M. Norman (NCSU) N. Norton (NCAR) T. Qian
(NCAR) M. Rothstein (NCAR) C. Stan (COLA) M.
Taylor (SNL) H. Tufo (NCAR) M. Vertenstein
(NCAR) J. Wolfe (NCAR) P. Worley (ORNL) M. Zhang
(SUNYSB)
- Contributors
- D. Bader (ORNL)
- D. Bailey (NCAR)
- C. Bitz (U Washington)
- F. Bryan (NCAR)
- T. Craig (NCAR)
- A. St. Cyr (NCAR)
- J. Dennis (NCAR)
- B. Eaton (NCAR)
- J. Edwards (IBM)
- B. Fox-Kemper (MIT,CU)
- N. Hearn (NCAR)
- E. Hunke (LANL)
- B. Kauffman (NCAR)
- E. Kluzek (NCAR)
- B. Kadlec (CU)
- D. Ivanova (LLNL)
- E. Jedlicka (ANL)
- E. Jessup (CU)
- Funding
- DOE-BER CCPP Program Grant
- DE-FC03-97ER62402
- DE-PS02-07ER07-06
- DE-FC02-07ER64340
- BR KP1206000
- DOE-ASCR
- BR KJ0101030
- NSF Cooperative Grant NSF01
- NSF PetaApps Award
- Computer Time
- Blue Gene/L time
- NSF MRI Grant
- NCAR
- University of Colorado
- IBM (SUR) program
- BGW Consortium Days
- IBM research (Watson)
- LLNL
39- Thanks! Questions?
- CCSM4.0 alpha release page at
- www.ccsm.ucar.edu/models/ccsm4.0