Title: Metadata Standards for Gridded Climate Data in the Earth System Grid
1Metadata Standards for Gridded Climate Data in
the Earth System Grid
UCRL-PRES-149779
2Overview
- I. Earth System Grid Grid Access to Climate
Research Data - II. Metadata Standards for Gridded Climate Data
3Part I
- ESG Grid Access to Climate Research Data
4Earth System Grid Overview
- The goal of ESG is to make climate data
particularly climate model data an easily
accessible community resource. The project is
funded by the SciDAC program Scientific
Discovery through Advanced Computing. - Enabling researchers to understand and make
effective use of very large, distributed climate
datasets is critical. The broad strategy is to
develop a collection of server-side capabilities
minimize the amount of data movement. - Multiple interfaces to ESG will allow researchers
to focus on science rather than issues of data
transfer, format, and data set manipulation. - Foundation is Globus Grid technology
5ESG uses Globus Grid technology.
- Globus middleware supports linkage of distributed
data archives, supercomputers, workstations,
local disk caches into data/computational grids. - GridFTP high-performance, secure, robust data
transfer mechanism protocol, server, client
library. - ESG is integrating OpenDAP (DODS protocol) with
GridFTP protocol. - Single sign-on using Grid Security Infrastructure
- Proxy certificates
- Community Authorization Service (CAS)
- Replica Location Service manages copying and
placement of files in a distributed environment. - Logical vs. physical files
- http//www.globus.org
6ESG U.S. Collaborations Development
ANL Computational grids, grid-based
applications
LBNL Climate storage Facility and access
LLNL Model diagnostics inter-comparison
USC/ISI Computational grids, grid-based
applications
ORNL Climate storage computational resources
NCAR Climate change predication and scenarios
7Program for Climate Model Diagnosis and
Intercomparison
- Validation and intercomparison of atmospheric
general circulation models, coupled
ocean-atmosphere models - Development of analysis software, quality
control, archiving, distribution of model
results. Climate Data Analysis Tools (CDAT) is a
Python-based analysis and visualization system. - Global warming detection studies
- CMIP (coupled models) and AMIP (atmospheric GCMs)
gather model simulation results from thirty
modeling groups worldwide.
8PCMDI and Model Development
PCMDI
PCMDIDiagnosis, quality control, data archival
Simulation data
Gridded observation data
Feedback to modelers
Controlled simulation runs
Data assimilation
Modeling groups
Observations
9ESG-II Architecture
Portals
Middleware
Servers
10ESG Metadata Services
11ESG is leveraging off existing software and
projects.
- OpenDAP (DODS) Distributed Oceanographic Data
System (Unidata) - Integrations of Globus GridFTP, DODS data access
- THREDDS THematic Real-time Environmental
Distributed Data Services (Unidata) - LAS Live Access Server (NOAA Pacific Marine
Environmental Laboratory) - Works with CDAT, Ferret, GrADS,
- CDAT Climate Data Analysis Tools (PCMDI),
includes CDMS Climate Data Management System,
VCDAT visualization - Community Data Portal project (NCAR)
- NCL (NCAR)
- Globus Grid technology(ANL, ISI) GridFTP, CAS
Community Authorization Service
12CDAT Example of an ESG GUI Client Access
13LAS/CDAT Example of a Web-based Data Portal
- Technology Web Based (end user requirements)
- LAS, DODS, ESG (i.e., Globus), CDAT
- Portal should hide/simplify the Grid for users
- Single sign-on
- Community-based authorization
- Simplified resource location
- Remote job submission, management
- Accesses the ESG Grid Testbed
14Part II
- Metadata Standards for Gridded Climate Data
15Climate Model Datasets
- Most climate simulation data are in the form of
gridded datasets collections of variables as a
function of longitude, latitude, time, and
vertical level. - A dataset is a logical container
- A file
- An aggregation of files
- A collection of database tables
- Model-generated data
- Model data
- Derived data zonal averages, global averages,
virtual variables - Observational data, including reanalyses
- Attributes in the form of (name, value) pairs,
array values
16Binary formats
- Suitable basis for storing data, but lack the
metadata to support certain application
requirements - netCDF (UCAR)
- array data model
- flexible attribute/value metadata model
- simple API
- HDF (NCSA, NASA)
- collection of APIs, can be tailored to specific
data models including scientific data sets,
satellite data, point data
17Binary formats
- GRIB (WMO, ECMWF, NCEP)
- mixed sequential/array data model
- tailored for simulation output, supports common
horizontal grid types - hardwired metadata model
- good compression capabilities
- lacks a standard API
18Coordinate Systems
- Self-describing binary formats are flexible, but
underconstrain representation of coordinate
systems.
Coordinate System Time(i) Latitude(j,k) Longitude(
j,k)
Coordinate Space
Index Space
V Temperature(Time, Latitude, Longitude)
V Temperature(i,j,k)
Variable Space
19Horizontal Grids
- Curvilinear grid - Los Alamos POP ocean model
Temperature(i,j) Latitude(i,j) Longitude(i,j) Lat_
bounds(i,j,4) Lon_bounds(i,j,4)
20Horizontal Grids
Temperature(i,j) Latitude(i) Longitude(i,j) Lat_bo
unds(i,2) Lon_bounds(i,j,4)
21Horizontal Grids
- General grid Colorado State geodesic grid
Temperature(npts) Latitude(npts) Longitude(npts) L
at_bounds(npts,6) Lon_bounds(npts,6)
22Spatial/temporal location
- Applications must be able to recognize the
spatial/temporal coordinate axes. - Visualization continental overlays
- Data selection by axis type
file cdms.open(sample.nc) temperature
filetemperature data temperature(latitude(-
45.0, 45.0))
23Time representation and calendars
- Climate simulations use different types of
calendars - proleptic Gregorian
- Julian
- Mixed Gregorian/Julian
- No leap years (noleap)
- 30-day months
- Climatologies represent multi-year averages.
24Metadata conventions
- Several conventions have been developed to
augment the netCDF data model. - Represent a balance between needs of data
producers and data consumers. - COARDS convention
- 1D coordinates axes, rectilinear horizontal
grids - axis identification based on units
- variables limited to four dimensions
- ordering of dimensions fixed
- http//ferret.wrc.noaa.gov/noaa_coop/coop_cdf_pr
ofile.html
25Metadata conventions
- CF (Climate and Forecast) convention
- Based on earlier conventions, COARDS and GDT
- multidimensional coordinates (auxiliary
coordinate variables) - simplified axis identification
- specific representation for several horizontal
grid types - rectilinear
- curvilinear
- reduced grids
- variables can have an arbitrary number of
dimensions - no constraint on ordering of dimensions
- non-Gregorian calendars
- standard name table
- http//www.cgd.ucar.edu/cms/eaton/cf-metadata/
26Comparability of quantities
- Ability to recognize comparable quantities is
fundamental to model intercomparison. - CF defines a schema for standard name tables
- XML representation used for table of standard
variable names and descriptions - standard_name attribute is optional. No
restriction on variable names. - Relationship to ontology development?
ltstandard_name_tablegt ltinstitutiongtProgram
for Climate Model Diagnosis and
Intercomparisonlt/institutiongt
ltcontactgtsupport_at_pcmdi.llnl.govlt/contactgt
ltentry id"surface_air_pressure"gt
ltcanonical_unitsgtPalt/canonical_unitsgt
ltdescriptiongtPressure defined at the level of the
mean topography within the grid
box.lt/descriptiongt lt/entrygt ltalias
id"mean_sea_level_pressure"gt
ltentry_idgtair_pressure_at_sea_levellt/entry_idgt
lt/aliasgt lt/standard_name_tablegt
27ESG metadata
- ESG has adopted the netCDF data model and the CF
convention as standards - Other standards and conventions will follow.
- NcML markup language.
28Aggregation
- CF and NcML apply to data aggregates as well as
files - Data aggregation collections of files/datasets
are treated as single entities. - array model
- netCDF-like
- tailored for extraction of 'hyperslabs' of data
- Aspects of aggregation
- combining/merging variables
- joining variables
- creating new coordinate axes
- overlaying/adding metadata
- nesting datasets
29Aggregation
- Aggregation maps well to multifile datasets
multifile datasets can be thought of as
'partitioned' into files. Variables may 'span'
multiple files. - Usually a dataset is partitioned on time and/or
vertical level axes. - PCMDI CDAT supports aggregations via the cdscan
utility, uses XML representation - THREDDS/DODS aggregation server
(http//www.unidata.ucar.edu/projects/THREDDS/)
Level
Variable
Time
30Summary
- The Earth System Grid project is developing
metadata services to support a variety of schemas
and conventions. - The initial focus of ESG is to enable climate
researchers to make effective use of distributed,
model-generated datasets. - The netCDF schema and CF convention are the
foundation for representation of this data.