Title: GENIEfy: Collaborative study of GENIE Earth System Models on the Grid
1GENIEfy Collaborative study of GENIE Earth
System Models on the Grid
- NERC Annual eScience Meeting
- 26th 27th April 2006
- Simon Cox
- Southampton Regional e-Science Centre
2The GENIE / GENIEfy Team
- Principal Investigator - GENIEfy
- Tim Lenton UEA Norwich
- Research Team and Collaborators
- James Annan FRSGC, Japan
- Chris Armstrong Manchester
- Chris Brockwell UEA Norwich
- David Cameron CEH Edinburgh
- Peter Cox Hadley Centre (UKMO)
- Neil Edwards Open University
- Sudipta Goswami UEA Norwich
- Robin Hankin NOC
- Julia Hargreaves FRSGC, Japan
- Phil Harris CEH Wallingford
- Zhuoan Jiao Southampton e-Science Centre
- Eleftheria Katsiri London e-Science Centre
- Valerie Livina UEA Norwich
- Dan Lunt Bristol
- Richard Myerscough NOC
- Principal Investigator - GENIE
- Paul Valdes Bristol
- Co-Investigators / Management team
- Peter Challenor NOC
- Trevor Cooper-Chadwick Southampton e-Sci.
Centre - Simon Cox Southampton e-Sci. Centre
- John Darlington London e-Science Centre
- Rupert Ford Manchester
- Eric Guilyardi Reading
- John Gurd Manchester
- Richard Harding CEH Wallingford
- Robert Marsh NOC
- Tony Payne Bristol
- Graham Riley Manchester
- John Shepherd NOC
- Rachel Warren UEA Norwich
- Andrew Watson UEA Norwich
3GENIE / GENIEfy
GENIEfy Grid ENabled Integrated Earth System
Model for the Community
- The GENIE project has developed a Grid-based
system to - Flexibly couple together state-of-the-art
components to form a unified Earth system model - Execute the resulting model on the Grid
- Share the distributed data produced in
simulations - Provide high-level open access to the system,
creating and supporting virtual organisations of
Earth system modellers
4Scientific Aims
- Orbital parameters affect incident radiation and
climate - Biological and geological processes interact
with, and feedback upon, the climate (via, for
instance, CO2)
5The target GENIE Model
6Flexible modelling framework
- Modularity
- Swappable components throughout
- e.g. Atmosphere 2D Energy-Moisture Balance Model
or 3D Intermediate GCM - Scalability
- Variable resolution
- e.g. Ocean 18x18, 36x36, 72x72 8-32 depth
layers - Traceability
- Common physics when resolution is varied
- Where a process is not resolved, parameterise it
based on a resolution that does resolve it
reasonably
7GENIE Science achievements
- Scientific outcomes from the first phase of the
GENIE project
8GENIE achievements
- Ground-breaking science with modularised Earth
system models - Stability of the ocean thermohaline circulation
to multiple forcings - Transient simulations from last glacial maximum
to pre-industrial - Long-term climate change projections with a
closed carbon cycle - Using the power of the Grid to enable the
science - Extensive exploration of model parameter space
- Automated simultaneous tuning of multiple model
parameters - Data assimilation and characterisation of model
uncertainty - Practical software engineering
- Version control
- Build-test-deploy cycle
- Nightly build and test
9Bi-Stability of the Thermohaline Circulation (THC)
OFF
ON
Single point in model parameter space sensitive
to initial conditions Atlantic Meridional
Overturning Circulation MOC (Sv) Annual
Average Air Temperature Difference (K) ON
OFF How close are we to collapse of the
thermohaline circulation?
Marsh, R. J. et al. (2004) Climate Dynamics
10THC bi-stability search
OFF
Current state
Vary 2 parameters controlling N. Atlantic
freshwater supply X Anomaly in Atlantic to
Pacific atmospheric moisture transport (DFWX) Y
Atmospheric moisture diffusivity (DIFF), controls
Equator to Pole transport Top panel shows results
of 961 x 4000yr simulations Bottom panel shows
range from 9 restart experiments Total 40 Myr of
simulations
ON
Atmosphere more diffusive
Atmosphere less diffusive
Current state
Narrow region of bi-stability
Marsh, R. J. et al. (2004) Climate Dynamics
11Data assimilation
James Annan, Julia Hargreaves
- Ensemble Kalman Filter (EnKF)
- A sample of the posterior probability
distribution defined by prior beliefs and climate
observations - Data
- World Ocean Atlas temperature and salinity
- NCEP reanalysis surface air temperature, humidity
- THC strength and heat transport at 25N
- The system is close to a non-linear threshold
- Initial conditions are uncertain
12GENIE ensembles compared to CMIP
13Unpredictability in THC response
Data assimilated 54 member ensemble, 3.3 x CO2
87
Model with enhanced hydrological sensitivity
14GENIE e-Science Tools
- Software deployed within the GENIE framework to
exploit Grid technology
15Geodise Toolboxes
- Geodise Compute Toolbox
- Grid access from the Desktop
- Matlab and Jython interfaces
- Globus and Condor support
- Geodise Database Toolbox
- Associate metadata with data
- Programmatic and GUI access
- OptionsMatlab
- Engineering Design Optimisation
- Suite of multi-dimensional optimisation algorithms
16Grid Computation
Institutional Resources (GT2)
National Grid Service (GT2)
17Need Metadata
18Data Management System
19GENIE Toolbox
- Provide functions to manage generic time-stepping
codes on the Grid - User provides
- Archive containing the model binary input data
- Metadata describing the model configuration
- Metadata specifying a compute resource
- Client manages
- Model configuration, transfer and job submission
- Job monitoring
- Data retrieval and upload to database
20Client Session Job submission
21Tuning GENIE Models
- Exploiting e-Science tools to tune the free
parameters of models in the GENIE framework
22Scripted Tuning Study
Matlab gtgt optimum fminsearch(_at_genie, params, )
GENIE Database
function error genie(params)
error gd_query(params) metadata.param1
params(1) metadata.param2 params(2) handle,
retrieve gc_jobsubmit(metadata, runtime,
resource) gd_jobpoll(handle) error
gc_jobretrieve(retrieve) gd_archive(metadata,
error) return error
231D and 2D Optimisation
Specify a starting
point parameters
0.5 Perform the
minimisation optimum
fminsearch( _at_cgoldstein_1D, parameters,
optimisation_parameters )
Specify a starting point
parameters 420
5000000 Perform
the minimisation optimu
m fminsearch( _at_cgoldstein_2D, parameters,
optimisation_parameters )
24Response Surface Modelling
- Optimisation of 12 parameters in cGOLDSTEIN ocean
model - Each objective function calculation (model run)
takes 1 hour - Direct Search methods require too many
evaluations to be practical - Employ a Kriging method to construct a Response
Surface Model - Search a stochastic process model of the
underlying objective function - Iteratively update the metamodel ? Converge on an
optimal solution
R20.9052
- Optimal solutions EnKF 0.4986 ACCPM0.4891 Kri
g0.4913
25Genetic Algorithm tuning of IGCM
Default Tuned Data
- 36 reduction in error statistic compared to
default parameters - Similar result to a parallel study performed
using the Ensemble Kalman Filter - Model physics insufficient to perfectly match
observational data.
26Multi-Objective Optimisation
- Single objective function
- Weighted sum of (model observation) RMS
differences - Some objectives can be improved at the expense of
others - Little improvement in the precipitation and
evaporation fields - Multi-objective optimisation
- Employ a Pareto Front to optimise multiple
objectives - Implementation of the Non-dominated Sorting
Genetic Algorithm (NSGA-II, Deb (2002)) - 3 objective functions
- Weighted sum of the RMS differences between
seasonal averages of model fields and equivalent
observational data - OBJ1 (sensible heat latent heat net solar
net long) - OBJ2 (precipitation rate evaporation)
- OBJ3 (wind stress_x wind stress_y)
- IGCM problem definition
- 32 free parameters (TXBLCNST TYBLCNST)
- 2 constraints on the parameters
- HUMCLOUDMAX gt HUMCLOUDMIN
- SNOLOOK2 gt ALBEDO_ICEHSEET
27Pareto Front Progression
- 50 generations of the NSGA-II algorithm
28Multi-objective Optimisation
- 5000 model invocations
- Southampton University Condor pool
- Iridis2 Compute Cluster
- National Grid Service
- Pareto Front driven towards origin
- 3 objective functions reduced
- Targeted improvements
- Evaporation fields improved without compromising
other fields
29Collaborative Ensemble Studies
- Investigation of bi-stability in the THC at
varying resolution under a dynamic atmosphere
using a distributed Grid-enabled Problem Solving
Environment
30GENIE-2 on 3 different grids experimental design
- Three grids
- (i) 36x36 and (ii) 72x72 equal area grids
- (iii) 64x32 grid of resolution 5.625 (as IGCM)
- 1-D parameter sweeps, varying Atlantic-Pacific
freshwater flux adjustment from 0 to 2 x default
value (0.32 Sv) - GENIE-2 run for 1000 years (sufficient for
equilibrium) - Restarts from Conveyor Off and Conveyor On
states to investigate model dependence of THC
bistability
31GENIE-2 Configuration
3D atmosphere IGCM (64 x 32 x 7)
2D slab sea-ice
3D ocean (36 x 36 x 8) (64 x 32 x 8) (72 x 72 x
16)
2D land surface
32GENIE Grids
- Some GENIE grids
- (Lenton et al., in prep.)
- Lowest resolution, equal-area grid, used since
2002 (the basis for GENIE predecessor,
C-GOLDSTEIN) - The original grid featuring constant increments
of latitude - The 5.625ยบ resolution grid which exactly matches
that of the IGCM
33Surface freshwater flux correction
- Three zones where A-P flux (Fa) is applied,
indicating default values (from Oort 1983)
34Collaborative Study
35Collaborative Model Study
- 12 Ensemble Experiments
- 3 x 1D FWF adjustment
- 3 GOLDSTEIN grid resolutions (36x36, 64x32,
72x72) - 1 x 2D FWF adjustment, Boundary layer factor
- 36x36 GOLDSTEIN grid
- 8 x 1D FWF adjustment, restarted from output of
phase 1 - 3 GOLDSTEIN grid resolutions (36x36, 64x32,
72x72) - 72x72 models runs performed on both Linux and
win32 platforms - Addition of resource
- National Grid Service (Oxford, Leeds, Manchester,
RAL, Bristol) - Condor Pools (Southampton, Bristol, NOC)
- Clusters
- Cluster1 (Norwich)
- Pacifica (Southampton), Iridis2 / Pacifica2 (dual
processor, dual core)
36Experiment Setup
- Create an Experiment definition in the database
- Specify an ensemble of simulations
- Associate the control scripts with the experiment
definition - Record metadata describing the experiment
- Users can contribute by downloading the Setup
script - Query the database for an experiment to
contribute to - Click the hyperlink to SetupExperiment
- Download the file to the local filesystem
- Execute the script
- Experiment files are retrieved from the database
and installed on the client machine
37Client Session
38Autonomous worker script
- Update experiment
- Check that local scripts are up to date
- Stage 1
- Query for existing jobs
- Process any completed jobs and archive the
restart files and output data - Stage 2
- Query database for new work units
- User specifies logic for job selection
- Find all simulations yet to be started
- If available, select restart files with maximum
achieved timestep in each simulation - Filter out simulations with active jobs
- Stage 3
- Submission of new jobs
- Number of timesteps reduced to ensure model
finishes at the end of a model year
39Resource Usage
- 5 client installations
- 9 Grid resources exploited
- 352 simulations defined (1000 and 2000 yrs)
- 3,736 compute tasks submitted
- 46,992 CPU hours (estimated)
- 428,000 IGCM-GOLDSTEIN years performed
40Resource Usage
41Bistability of the thermohaline circulation in
all three GENIE-2 resolutions
42 Two Parameter Study
43Summary
- THC as a function of freshwater fluxes obtained
for - Two levels of atmospheric complexity (including
previous studies) - Three model grids (varying horizontal resolution)
- THC bistability is more extensive with
- More complex atmosphere
- A regular 5.625 grid
- Results tentatively support the existence of THC
bistability in the real world (i.e. towards
infinite complexity)
44NIEeS GENIE Workshop
- 26th 28th June 2006, NIEeS, Cambridge
- Aims of the workshop
- To showcase the use of Grid technology for Earth
system modelling - To decide on the next development steps for
GENIE, adopting community standards - To begin the design of a generic physical coupler
for the GENIE framework - To start a GENIE user group and ascertain the
users requirements - To strengthen our national and international
collaborative links - To launch the Quaternary QUEST project, which is
applying GENIE to understanding
glacial-interglacial variations in atmospheric
carbon dioxide