Title: Scaling of the Community Atmospheric Model to ultrahigh resolution
1Scaling of the Community Atmospheric Model to
ultrahigh resolution
- Michael F. Wehner
- Lawrence Berkeley National Laboratory
- mfwehner_at_lbl.gov
- with
- Pat Worley (ORNL), Art Mirin (LLNL)
- Lenny Oliker (LBNL), John Shalf (LBNL)
2Motivations
- First meeting of the WCRP Modeling Panel (WMP)
- Convened at the UK MetOffice October, 2005 by
Shukla - Discussion focused on benefits and costs of
climate and weather models approaching 1km in
horizontal resolution - Eventual white paper by Shukla and Shapiro for
the WMO JSC - Counting the Clouds, A presentation by Dave
Randall (CSU) to DOE SciDAC (June 2005) - Dave presents a compelling argument for global
atmospheric models that resolve cloud systems
rather than parameterize them. - Presentation is on the web at www.scidac.org
3fvCAM
- NCAR Community Atmospheric Model version 3.1
- Finite Volume hydrostatic dynamics (Lin-Rood)
- Parameterized physics is the same as the spectral
version - Our previous studies focus on the performance of
the fvCAM with a 0.5oX0.625oX28L mesh on a wide
variety of platforms (See Pat Worleys talk this
afternoon) - In the present discussion, we consider the
scaling behavior of this model over a range of
existing mesh configurations and extrapolate to
ultra-high horizontal resolution.
4Operations count
- Exploit three existing horizontal resolutions to
establish the scaling behavior of the number of
operations per fixed simulation period. - Existing resolutions (all 28 vertical levels)
- B 2oX2.5o
- C 1oX1.25o
- D 0.5ox0.625o
- Define
- m of longitudes, n of latitudes
5Operations Count (Scaling)
- Parameterized physics
- Time step can remain constant
- Ops m n
- Dynamics
- Time step determined by the Courant condition
- Ops m n n
- Filtering
- Allows violation of an overly restrictive Courant
condition near the poles - Ops m log(m) n n
6Operations Count (Physics)
7Operations Count (dynamics)
8Operations Count (Filters)
9Sustained computation rate requirements
- A reasonable metric in climate modeling is that
the model - must run 1000 times faster than real time.
- Millenium scale control runs complete in a year.
- Century scale transient runs complete in a month.
10Can this code scale to these speeds?
- Domain decomposition strategies
- Np number of subdomains, Ng number of grid
points - Existing strategy is 1D in the horizontal
- A better strategy is 2D in the horizontal
- Note fvCAM also uses a vertical decomposition as
well as OpenMP parallelism to increase
utilization of processors.
11Processor scaling
- The performance data from fvCAM fits the first
model well but tells us little about future
technologies. - A practical constraint is that the number of
subdomains is limited to be less than or equal to
the number of horizontal cells . - At three cells across per subdomain, complete
communication of the models data is required. - This constraint can provide an estimate of the
maximum number of subdomains ( processors) as
well as the minimum processor performance
required to achieve the 1000X real time metric
(in the absence of communication costs).
12Maximum number of horizontal subdomains
-2,123,366
-3840
13Minimum processor speed to achieve 1000X real time
Assume no vertical decomposition and no OpenMP
14Total memory requirements
15Memory scales slower than processor speed due to
Courant condition.
16Strawman 1km climate computer
- I mesh at 1000X real time
- .015oX.02oX100L
- 10 Petaflops sustained
- 100 Terabytes total memory
- 2 million horizontal subdomains
- 10 vertical domains
- 20 million processors at 500Mflops each
sustained - including communications costs.
- 5 MB memory per processor
- 20,000 nearest neighbor send-receive pairs per
subdomain per simulated hour of 10KB each
17Conclusions
- fvCAM could probably be scaled up to a 1.5km mesh
- Dynamics would have to be changed to fully
non-hydrostatic - The scaling of the operations count is
superlinear with horizontal resolution because of
the Courant condition. - Surprisingly, filtering does not dominate the
calculation. Physics cost is negligible. - One dimensional horizontal domain decomposition
strategy will likely not work. - Limits on processor number and performance are
too severe. - Two dimensional horizontal domain decomposition
strategy would be favorable but requires a code
rewrite. - Its not as crazy as it sounds.