Metadata Standards for Gridded Climate Data in the Earth System Grid - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Metadata Standards for Gridded Climate Data in the Earth System Grid

Description:

www.earthsystemgrid.org. Metadata Standards for Gridded Climate Data in ... No leap years (noleap) 30-day months. Climatologies represent multi-year averages. ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 31
Provided by: robert558
Category:

less

Transcript and Presenter's Notes

Title: Metadata Standards for Gridded Climate Data in the Earth System Grid


1
Metadata Standards for Gridded Climate Data in
the Earth System Grid
UCRL-PRES-149779
  • Robert Drach
  • LLNL/PCMDI

2
Overview
  • I. Earth System Grid Grid Access to Climate
    Research Data
  • II. Metadata Standards for Gridded Climate Data

3
Part I
  • ESG Grid Access to Climate Research Data

4
Earth System Grid Overview
  • The goal of ESG is to make climate data
    particularly climate model data an easily
    accessible community resource. The project is
    funded by the SciDAC program Scientific
    Discovery through Advanced Computing.
  • Enabling researchers to understand and make
    effective use of very large, distributed climate
    datasets is critical. The broad strategy is to
    develop a collection of server-side capabilities
    minimize the amount of data movement.
  • Multiple interfaces to ESG will allow researchers
    to focus on science rather than issues of data
    transfer, format, and data set manipulation.
  • Foundation is Globus Grid technology

5
ESG uses Globus Grid technology.
  • Globus middleware supports linkage of distributed
    data archives, supercomputers, workstations,
    local disk caches into data/computational grids.
  • GridFTP high-performance, secure, robust data
    transfer mechanism protocol, server, client
    library.
  • ESG is integrating OpenDAP (DODS protocol) with
    GridFTP protocol.
  • Single sign-on using Grid Security Infrastructure
  • Proxy certificates
  • Community Authorization Service (CAS)
  • Replica Location Service manages copying and
    placement of files in a distributed environment.
  • Logical vs. physical files
  • http//www.globus.org

6
ESG U.S. Collaborations Development
ANL Computational grids, grid-based
applications
LBNL Climate storage Facility and access
LLNL Model diagnostics inter-comparison
USC/ISI Computational grids, grid-based
applications
ORNL Climate storage computational resources
NCAR Climate change predication and scenarios
7
Program for Climate Model Diagnosis and
Intercomparison
  • Validation and intercomparison of atmospheric
    general circulation models, coupled
    ocean-atmosphere models
  • Development of analysis software, quality
    control, archiving, distribution of model
    results. Climate Data Analysis Tools (CDAT) is a
    Python-based analysis and visualization system.
  • Global warming detection studies
  • CMIP (coupled models) and AMIP (atmospheric GCMs)
    gather model simulation results from thirty
    modeling groups worldwide.

8
PCMDI and Model Development
PCMDI
PCMDIDiagnosis, quality control, data archival
Simulation data
Gridded observation data
Feedback to modelers
Controlled simulation runs
Data assimilation
Modeling groups
Observations
9
ESG-II Architecture
Portals
Middleware
Servers
10
ESG Metadata Services
11
ESG is leveraging off existing software and
projects.
  • OpenDAP (DODS) Distributed Oceanographic Data
    System (Unidata)
  • Integrations of Globus GridFTP, DODS data access
  • THREDDS THematic Real-time Environmental
    Distributed Data Services (Unidata)
  • LAS Live Access Server (NOAA Pacific Marine
    Environmental Laboratory)
  • Works with CDAT, Ferret, GrADS,
  • CDAT Climate Data Analysis Tools (PCMDI),
    includes CDMS Climate Data Management System,
    VCDAT visualization
  • Community Data Portal project (NCAR)
  • NCL (NCAR)
  • Globus Grid technology(ANL, ISI) GridFTP, CAS
    Community Authorization Service

12
CDAT Example of an ESG GUI Client Access
13
LAS/CDAT Example of a Web-based Data Portal
  • Technology Web Based (end user requirements)
  • LAS, DODS, ESG (i.e., Globus), CDAT
  • Portal should hide/simplify the Grid for users
  • Single sign-on
  • Community-based authorization
  • Simplified resource location
  • Remote job submission, management
  • Accesses the ESG Grid Testbed

14
Part II
  • Metadata Standards for Gridded Climate Data

15
Climate Model Datasets
  • Most climate simulation data are in the form of
    gridded datasets collections of variables as a
    function of longitude, latitude, time, and
    vertical level.
  • A dataset is a logical container
  • A file
  • An aggregation of files
  • A collection of database tables
  • Model-generated data
  • Model data
  • Derived data zonal averages, global averages,
    virtual variables
  • Observational data, including reanalyses
  • Attributes in the form of (name, value) pairs,
    array values

16
Binary formats
  • Suitable basis for storing data, but lack the
    metadata to support certain application
    requirements
  • netCDF (UCAR)
  • array data model
  • flexible attribute/value metadata model
  • simple API
  • HDF (NCSA, NASA)
  • collection of APIs, can be tailored to specific
    data models including scientific data sets,
    satellite data, point data

17
Binary formats
  • GRIB (WMO, ECMWF, NCEP)
  • mixed sequential/array data model
  • tailored for simulation output, supports common
    horizontal grid types
  • hardwired metadata model
  • good compression capabilities
  • lacks a standard API

18
Coordinate Systems
  • Self-describing binary formats are flexible, but
    underconstrain representation of coordinate
    systems.

Coordinate System Time(i) Latitude(j,k) Longitude(
j,k)
Coordinate Space
Index Space
V Temperature(Time, Latitude, Longitude)
V Temperature(i,j,k)
Variable Space
19
Horizontal Grids
  • Curvilinear grid - Los Alamos POP ocean model

Temperature(i,j) Latitude(i,j) Longitude(i,j) Lat_
bounds(i,j,4) Lon_bounds(i,j,4)
20
Horizontal Grids
  • Reduced grid

Temperature(i,j) Latitude(i) Longitude(i,j) Lat_bo
unds(i,2) Lon_bounds(i,j,4)
21
Horizontal Grids
  • General grid Colorado State geodesic grid

Temperature(npts) Latitude(npts) Longitude(npts) L
at_bounds(npts,6) Lon_bounds(npts,6)
22
Spatial/temporal location
  • Applications must be able to recognize the
    spatial/temporal coordinate axes.
  • Visualization continental overlays
  • Data selection by axis type

file cdms.open(sample.nc) temperature
filetemperature data temperature(latitude(-
45.0, 45.0))
23
Time representation and calendars
  • Climate simulations use different types of
    calendars
  • proleptic Gregorian
  • Julian
  • Mixed Gregorian/Julian
  • No leap years (noleap)
  • 30-day months
  • Climatologies represent multi-year averages.

24
Metadata conventions
  • Several conventions have been developed to
    augment the netCDF data model.
  • Represent a balance between needs of data
    producers and data consumers.
  • COARDS convention
  • 1D coordinates axes, rectilinear horizontal
    grids
  • axis identification based on units
  • variables limited to four dimensions
  • ordering of dimensions fixed
  • http//ferret.wrc.noaa.gov/noaa_coop/coop_cdf_pr
    ofile.html

25
Metadata conventions
  • CF (Climate and Forecast) convention
  • Based on earlier conventions, COARDS and GDT
  • multidimensional coordinates (auxiliary
    coordinate variables)
  • simplified axis identification
  • specific representation for several horizontal
    grid types
  • rectilinear
  • curvilinear
  • reduced grids
  • variables can have an arbitrary number of
    dimensions
  • no constraint on ordering of dimensions
  • non-Gregorian calendars
  • standard name table
  • http//www.cgd.ucar.edu/cms/eaton/cf-metadata/

26
Comparability of quantities
  • Ability to recognize comparable quantities is
    fundamental to model intercomparison.
  • CF defines a schema for standard name tables
  • XML representation used for table of standard
    variable names and descriptions
  • standard_name attribute is optional. No
    restriction on variable names.
  • Relationship to ontology development?

ltstandard_name_tablegt ltinstitutiongtProgram
for Climate Model Diagnosis and
Intercomparisonlt/institutiongt
ltcontactgtsupport_at_pcmdi.llnl.govlt/contactgt
ltentry id"surface_air_pressure"gt
ltcanonical_unitsgtPalt/canonical_unitsgt
ltdescriptiongtPressure defined at the level of the
mean topography within the grid
box.lt/descriptiongt lt/entrygt ltalias
id"mean_sea_level_pressure"gt
ltentry_idgtair_pressure_at_sea_levellt/entry_idgt
lt/aliasgt lt/standard_name_tablegt
27
ESG metadata
  • ESG has adopted the netCDF data model and the CF
    convention as standards
  • Other standards and conventions will follow.
  • NcML markup language.

28
Aggregation
  • CF and NcML apply to data aggregates as well as
    files
  • Data aggregation collections of files/datasets
    are treated as single entities.
  • array model
  • netCDF-like
  • tailored for extraction of 'hyperslabs' of data
  • Aspects of aggregation
  • combining/merging variables
  • joining variables
  • creating new coordinate axes
  • overlaying/adding metadata
  • nesting datasets

29
Aggregation
  • Aggregation maps well to multifile datasets
    multifile datasets can be thought of as
    'partitioned' into files. Variables may 'span'
    multiple files.
  • Usually a dataset is partitioned on time and/or
    vertical level axes.
  • PCMDI CDAT supports aggregations via the cdscan
    utility, uses XML representation
  • THREDDS/DODS aggregation server
    (http//www.unidata.ucar.edu/projects/THREDDS/)

Level
Variable
Time
30
Summary
  • The Earth System Grid project is developing
    metadata services to support a variety of schemas
    and conventions.
  • The initial focus of ESG is to enable climate
    researchers to make effective use of distributed,
    model-generated datasets.
  • The netCDF schema and CF convention are the
    foundation for representation of this data.
Write a Comment
User Comments (0)
About PowerShow.com