Title: From
1From lab books to computational Earth science.
- Chris Hill, MIT cnh_at_mit.edu
- Edinburgh, July 2007
2Lab books
A lab notebook is a primary record of research.
Researchers use a lab notebook to document their
hypotheses, experiments and initial analysis or
interpretation of these experiments. The notebook
serves as an organizational tool, a memory aid,
and can also have a role in protecting any
intellectual property that comes from the
research. The guidelines for lab notebooks vary
widely between institution and between individual
labs, but some guidelines are fairly common. The
lab notebook is usually written in as the
experiments progress, rather than a later date.
Many say that lab notebook should be thought of
as a diary of activities that are described in
sufficient detail to allow another scientist to
follow the same steps. To ensure that data cannot
be easily altered, notebooks with permanently
bound pages are often recommended. Researchers
are often encouraged to write only with
unerasable pen, to sign and date each page, and
to have their notebooks inspected periodically by
another scientist who can read and understand it.
All of these guidelines can be useful in proving
exactly when a discovery was made, in the case of
a patent dispute. Several companies now offer
electronic lab notebooks. This format has gained
some popularity, especially in large
pharmaceutical companies, which have large
numbers of researchers and great need to document
their experiments.
wikipedia
3Lab books
- physical, chemical and biological scientists are
taught lab-book discipline from an early age. - reproducible results are the foundation of
scientific and engineering disciplines e.g.
Mickleson/Morley. - even an infamous Journal of Unreproducible
Results - in computational science the lab book
discipline is not so ubiquitous maybe because - program is a formal statement of applied
mathematical axioms - axioms are deterministic
- therefore reproducibility is not an issue
- however, a programs i.e. a complex collection of
simple elemental statements is hard to
comprehend. If details are not recorded,
reproducibility may well be an issue.
4Some example computational Earth science
experiments.
- Aqua-planet.
- Eddying North Atlantic.
- Global ocean with eddies and seaice.
- IPCC
5A simple GFD configuration
- Some factors that affect the solution
- Initial conditions.
- Atmosphere Clouds, radiation, dynamics, boundary
layer, temporal and spatial discretization. - Seaice Thermodynamics. Aging. Stress-strain
relation. - Ocean Dynamics, coordinate system,
vertical/horizontal friction and mixing. - Coupling Time stepping, emergetics.
- External forcings Solar insolation, reference
profiles
Water covered planet. Atmosphere-ocean-seaice.
Jean-Michel Campin and David Ferreira
6An eddying, ocean only configuration
- Some factors that affect the solution
- Initial conditions.
- Atmosphere fluxes Planetary boundary layer
scheme. - Ocean Dynamics, coordinate system,
vertical/horizontal friction and mixing. - Coupling Time stepping, emergetics.
- External forcings Solar insolation, reference
profiles, atmospheric reanalysis. - Non-linear/turbulent flow, so bitwise
reproducibility subject to FP round off, parallel
reduction operatations etc
Ocean-only, forced with atmospheric reanalysis
for Jan-Mar.
Red/blue shading ocean heating/cooling. Cyan/mag
enta line /-17.5OC _at_ 200m. Streaks
Windstress. Green thickness Ocean mixed layer
depth.
7Global eddying ocean, sea-ice decadal ensemble.
50 members.
Ensemble perturbations Numerical
formulation Ocean parameters Seaice
parameters Initial conditions Boundary conditions
8IPCC ocean ACC transports
Couples atmosphere, ocean, seaice, land,
vegetation, chemistry etc
Could I make this plot without too much
difficulty yes Could I rerun IPCC scenario
(possibly with some parameter change)
no Diagnosing these results is possible today
(PCMDI/ESG archives) for broad scientific
community. Rerunning experiments (with or without
small changes) is still very hard. Factors
affecting solution range from bottom drag to
land-surface formulation to emissions profiles.
9Examples summary
- Way Forward
- hand record is not practical nor ideal (i.e. not
as potentially useful as electronic record). - Electronic information should be stored so as to
be amenable to machine reasoning. - requires defined vocabularies, precise formal
structure, pattern matching, rules etc.. - ?W3C/semantic web technologies - XML, RDF,
- In theory, using XML, RDF etclt we could describe
model systems using these and enable reruns for
extra outputs (e.g. transport of S3 by flow) or
derived runs (e.g. modified air-sea coupling
coefficient of formulation). - In practice this is hardwork!
- To reproduce an experiment
- significant quantity of information needs to be
stored spans broad big-picture information
(water-covered planet, atmosoceanseaice) to
minute details (bitwise reproducibility may
require record of compiler, OS etc)
10Baby steps toward a computational Earth science
model repository.
- What is working today PCMDI/ESG
- Steps toward future - ESC
11PCMDI
- Archive of all IPCC model outputs.
- Stored in common format (netCDF with standard
metadata). - Stored on common mesh. Simplifies things, but
can/does degrade information and even mislead
(e.g. conservation in one coordinate system may
be inexact in another). - Very limited model metadata is held.
- Very successful and technically impressive
societal utility func. of model quality!
Schmittner et al (2005, GRL)
12Earth System Curator (ESC)
Can we (for better or worse!) do for models what
PCMDI does for datasets? PCMDI datasets are data
wrapped in a common/standard container
(netCDF). The PCMDI container is
self-describing. This means we can query and
even combine (to some degree) the PCMDI
datasets. A container analogy for modeling
technology is the component architecture
supported by systems like ESMF.
13Building a coupled model oriented solution
modeling system as a component tree
- Some mathematics component M
- no side-effects
- possible persistent internal state
- Supports representation as DAG such that
e.g
14Example of actual component tree.
- Tree of components from the GEOS-5 modeling
system. - Each box is an ESMF component.
- Components adhere to DAG semantics.
Suarez et. al
15Individual components in ESMF
- ESC builds on an ESMF-like component model.
- ESMF Component
- Container for sequence of computation that
implements a particular algorithm (physics
simulation e.g. Navier-Stokes solver or technical
function e.g history manager). An ESMF component
exposes its external interfaces through an ESMF
state. - ESMF State
- Container data type to transport data between
components - ESMF Field
- Container data type that can be used to push/pop
n-dimensional data with an associated mesh from
an ESMF State.
16Given a component model, like the ESMF paradigm,
ESC
- Describes a component in terms of
- parameters that control the computation sequence.
- states and fields that are passed into/out of the
component. - Provides two levels of description
- potential and specific.
- Potential is a list of all possible parameters
and fields. It is a virtualized description in
that it is not describing a specific instance. - Specific is a description of an instantiated
component in which parameters are bound to
specific values and fields and states are bound
to specific values.
17ESC component descriptions are in terms of XML
schema.
- Curator-NMM
- Described numerical model parameters e.g.
timestep, system requirements, - Gridspec
- Describes numerical mesh.
- Curator-CIAO
- Describes components inputs and outputs
- Curator-complete
- Describes wiring together of components
- A coupled component is also a component i.e.
schema is recursive.
Some details (more at http//www.earthsystemcurato
r.org) ..
18Curator-NMM
- The Curator-NMM schema describes model
components, their content, and their
connections. It is a superset of the NMM
schema. The main constructs in the Curator-NMM
schema are component, potential model, and
model. Components are "composable" pieces of
code that can be coupled together in various
arrangements to form different models. A
potential model consists of a group of
components, and describes the set of possible
models that can be built from those components.Â
A model is a fully specified application based on
a potential model and configuration choices.Â
19Curator-NMM
20Mosaic Grid Specification
- The Mosaic Grid Specification is a standardized
description of muti-patch, structured grids being
developed in coordination with CF activities.
21Mosaic Grid Specification
22Component component compatibility checking.
- ESC can describe coupled (multi-component)
systems. - In principle ESC could support recombination of
components from coupled systems e.g. couple
component A (atmosphere dynamics) with component
B (land-surface). - Ideally, for this, compatibility constraints need
to be expressed in a standard way.
23Service architectures
- Standards ? services
- Developing standardized descriptions is a
well-proven method toward a service oriented
approach e.g.
24Some useful (but an incomplete list of) URLs
Component models http//www.esmf.ucar.edu http//maplcode.org
Metadata standards http//www.earthsystemcurator.org http//ncas-cms.nerc.ac.uk/NMM/ http//www.earthsystemgrid.org/ http//www.cgd.ucar.edu/cms/eaton/cf-metadata/ http//sbml.org/index.psp http//cml.sourceforge.net/wiki/index.php/Main_Page http//www.w3.org/
25Summary
- Earth System Curator project is an activity
developing schema and tools to capture semantic
information about models. - Such information provides basis for formally
recording numerical experiments computational
Earth science lab book. - It also provides the basis for a formal approach
reproducible numerical results fewer Journal
of Irreproducible Results candidates. - Other efforts SBML (systems biology), CML
(chemistry) - already uploads to Science
submissions. - Maybe soon a computational Earth science
challenge will become, how to stop people doing
dumb things with easy to use modeling services,
rather than how to get people to use obtuse
legacy modeling systems - maybe! ?
26ESC collaboration
- NCAR (Cecelia Deluca, Julien Chastang), MIT
(Chris Hill, Constantinos Evangelinos), Georgia
Tech (Spencer Rubager, Rocky Dunlap, Angela),
GFDL (Balaji, Sergey), Reading UK (Lois
Steenman-Clark, Katherine Boughton), PRISM
(Sophie Valcke).