Title: Climateprediction'net NERC Annual eScience meeting, April 2006
1Climateprediction.netNERC Annual eScience
meeting, April 2006
- Nick Faull, Carl Christensen, Myles Allen, Dave
Frame many others - Department of Physics, University of Oxford
- nfaull_at_atm.ox.ac.uk
2The project
- Overall objective is to quantify the range of
uncertainty of future climate change. - This requires 100s of thousands of climate model
(GCM) simulations. - Use public resource distributed computing to meet
the demand anyone can go to www.climatepredictio
n.net and download the Hadley Centre climate
model to run on their PC.
3Volunteer Computing
- A specialized form of distributed computing
which is really an old idea in computer science
-- using remote computers to perform a same or
similar tasks - Was around before '99 but took off with SETI_at_home
- SETI_at_home capacity with 500K users about 1 PF
1000 TF - for comparison Earth Sim in Kyoto 35TF max
- CPDN running at about 60 TF (30K users each 2GF
machine average, i.e. PIV 2GHz)
4Other Public Resource Distributed Computing
(PRDC) Projects
- There are 100-200M PCs connected to the internet.
- lt1 are involved.
- Project development is getting easier thanks to
generic platforms, e.g. the Berkeley Open
Infrastructure for Network Computing (BOINC)
5Running an Ensemble of GCM Simulations is not a
Typical PRDC Application
- A typical GCM simulation takes weeks or months,
not hours. - GCMs use more memory.
- GCM simulations produce data for further
analysis. Most PRDC projects analyse data or test
an hypothesis so data transfer is not a problem. - Potentially all the simulations are useful, not
just those which find a result.
6CPDN Volunteer Computing Challenges...
- Climate models (ESM's, AOGCM's etc) are very
large, complex systems developed by physicists
sometimes over decades ( proprietary in case of
UKMO) - 1 million lines of Fortran code (HadSM3 -- 550
files, 40MB text source code) - Little documentation (the science is well
documented but not the software and design of the
system per se) - Also utility code written by various scientists
students over the years (outside of model code,
220 files, 12MB source, 250K lines) often
workable but hard to implement on a
cross-platform PC project - Meant to be run on supercomputers, primarily
64-bit not designed (or indeed envisioned) to
be run on anything other than a supercomputer or
at the very least, a Linux cluster
7CPDN and BOINC Integration
Apologies to Bill Watterson
8Why BOINC?
- BOINC is based on the experiences of the
SETI_at_home team in handling millions of users,
downloads and uploads, investment of gtUS1million - So it makes sense to use BOINC which has a tried
and tested framework instead of keep playing
catch-up and reinventing the wheel - Basically, BOINC allows us to focus on what we do
best (or should be doing best) - Climate science, climate modelling, visualisation
packages (peer-to-peer perhaps?), cross-platform
porting of models, grid applications to clamp
on the BOINC server-side
9Data nodes
- Rely on donated server space
- Data is federated across the nodes
- Do not want end users to FTP the raw data
- Instead
- Provide a secure, robust, efficient, scalable
environment for data discovery and analysis - Distribute the analysis by enabling each data
node to process data - Allow multiple interfaces to access the data
10Distributed analysis of data
- Design problems
- Data set is federated across data nodes
- Only one copy of the data set
- Data set is large
- Servers are donated, potentially no root access
- Analysis are computationally expensive, may take
several days - Web services to analyse data
- Lightweight way to build grid like infrastructure
- Open, standardised protocols
- Security features present in software stack
- Support from industry (Sun, Microsoft, IBM, etc.)
- Momentum in UK academic community (WSRF, OMII,
etc.) - CPDN will provide data via grid-enabled web
services to such providers as the NERC Data Grid
http//ndg.badc.rl.ac.uk/
11Security issues
- Threats to participants (unexpected costs of
participation) - Software package is digitally signed.
- Communications are always be initiated by the
client. - HTTP over a secure socket layer will be used
where necessary to protect participant details
and guarantee reliable data collection. - Digitally signed files can be used where
necessary. - Threats to the experiment (falsified data)
- Two types of run replication
- Small number of repeated identical runs.
- Large numbers of initial condition ensembles.
- Checksum tracking of client package files to
discourage casual tampering. - Opportunity to repeat runs as necessary.
- Server security management and frequent backups.
12Climateprediction.net participants
gt200,000 volunteers, gt150 countries, gt13M
model-years
13Climateprediction.net What it looks like
14Results from our initial climateprediction.net
experiment (Stainforth et al, 2005)
- Using simplified model ocean to keep runs short
- 15-year calibration phase to compute ocean heat
transport - 15-year control phase with pre-industrial CO2
(280ppm) - 15-year 2xCO2 phase with CO2 at 560ppm.
- Repeat with different initial conditions to
average out noise and quantify sampling
uncertainty
15Parameter perturbations
- Critical Relative Humidity RHcrit
- Accretion constant CT
- Condensation nuclei concentration CW
- Ice fall velocity VF1
- Entrainment coefficient (EntCoef).
- Empirically adjusted cloud fraction (EACF).
16Frequency Distribution of Simulations
From Stainforth et al, 2005
17Frequency distribution, eliminating drifting
control simulations
18And at about the same time our participants
started reporting models freezing over
19Un-physically strong low-cloud versus
surface-heat-flux feedback in equatorial Pacific
20Climate sensitivities from climateprediction.net
Stainforth et al, 2005
21And having got excited about the cold ones
22BBC Climate Change Experiment
- Transient simulation of 1920 to 2080 with HadCM3L
exploring - Model uncertainty in the atmosphere.
- Model uncertainty in the ocean.
- Uncertainty in historic forcing.
- Some uncertainty in future forcing.
- Natural variability.
23Over 50,000 active participants running HadCM3L
1920-2080, see bbc.co.uk/climatechange
24(No Transcript)
25The problem
- An error in a file header caused the model to
read in the man-made sulphate emissions from the
wrong point in the file. - Resulted in having too little sulphate emission
in the 20th century hence models warming up too
fast no global dimming effect - ...but can still do useful science with GHG only
ensemble.
26Particpants thoughts
- Whoops! Still that's science for you...
- I would feel better about the error, if I
thought the person/people responsible had
been sacked!
27HadCM3L Attribution Project (courtesy Daithi
Stone)
28HadCM3L Attribution Project (courtesy Daithi
Stone)
29Sahel desert drought experiment
- The Sahel desert drought in 1970s and 1980s
created a famine that killed a million people and
afflicted more than 50 million. - Suggestion that the drought was likely caused by
air pollution (global dimming) changing
properties of clouds over the Atlantic ocean,
disturbing the monsoons and shifting the tropical
rains southwards. - With reduced sulphate aerosol in the model can
test whether this had impact on rainfall in this
region
30Distributed computing is not just for
climate-resolution models
31Distributed computing is not just for
climate-resolution models
32the climate that might have beenhttp//attribut
ion.cpdn.orgHadAM3 N144 model288 longitude x
217 latitude x 30 vertical gridboxes
33Educational Outreach
- CPDN has public education via the website, media,
and schools as an important facet of the project - Website has much information on climate change
and related topics to the CPDN program. - Schools are running CPDN and comparing results,
with special events at U Reading - Students will host a debate on climate change
issues, compare and contrast their results etc.
Currently focused on UK schools, but as projects
added and staff resources are gained plan to
expand to other European schools and US schools
Students at Gosford Hill School, Oxon viewing
their CPDN model
34Future Plans
- Just released 160-yr HadCM3 1920-2080
hind/forecast runs with the BBCs Climate Chaos
Season of programmes, and Meltdown documentary - BBC World will hopefully pick up the programmes
and push CPDN in July 2006. - Received funding from NERC Knowledge Transfer
scheme for regional modelling (PRECIS) via CPDN - May have sister or spinoff projects in
Germany the US (depending on proposals/funding)