ESG Publication Tools - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

ESG Publication Tools

Description:

Scanning self-describing dataset to extract metadata. Aggregate variables ... starts the dataset scan. Options are: - create a new dataset, or replace ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 11
Provided by: deanwi
Category:
Tags: esg | publication | scan | tools

less

Transcript and Presenter's Notes

Title: ESG Publication Tools


1
ESG Publication Tools
PCMDI Software Team
ESG All Hands Meeting Boulder, Colorado April
29, 2008
LLNL-PRES-403079
2
Overview
  • Publication is the process of generating metadata
    about ESG datasets, and making that information
    available to ESG services
  • Search, browse, download, server-side processing
    rely on published metadata
  • Eventually will tie into a notification service
  • Unit of work is a dataset
  • Question need to publish individual files
    directly?
  • Publication deals with files and aggregations as
    first-class objects
  • Persons responsible for publication are the data
    publishers

3
Goals
  • Publisher can read metadata in a collection of
    files, and
  • Add new metadata
  • Modify existing m-d
  • Add, update, delete dataset
  • Flexibility to add new projects
  • Static configuration where possible (minimize
    coding)
  • Logic can be encapsulated in project-specific
    handlers
  • Metadata fields of interest are defined by the
    configuration
  • Different projects may have different metadata
    items.
  • CF-1 support
  • Standard names
  • Spatio-temporal coordinates
  • Standard configuration
  • .ini style

4
Goals
  • GUI, but publishing is also scriptable
  • Quality control checks for
  • Duplication of data
  • Validity of coordinate metadata (ex.
    monotonicity of time dimension)
  • Validity of standard name
  • Generation of THREDDS catalogs to support LAS,
    harvesting
  • Generation of data aggregations
  • Ability to publish both online and offline
    (tertiary storage) datasets.
  • For offline data, requires a list of paths /
    filesizes
  • Support for Dublin Core
  • Some CF fields map to DC

5
The Process
  • Specify
  • Project (IPCC_AR4, C-LAMP, NARCCAP,)
  • Dataset
  • Metadata may be read from self-describing
    dataset, or input by user
  • Options for specifying a dataset
  • Read paths from a file
  • Regular expression template for paths
  • Directory name and file filter
  • Generate dataset metadata by
  • Scanning self-describing dataset to extract
    metadata
  • Aggregate variables
  • Create/replace/update/delete
  • Publish
  • Generate THREDDS catalog. The form of the catalog
    may depend on whether
  • Dataset is aggregated,
  • Non-aggregated,
  • Offline
  • Release data for harvesting

6
Dataset publishing on an ESG node Metadata
specification
  • Dataset pane
  • shows metadata in a file, allows modification
  • is project-specific
  • metadata is extracted from the first file in the
    list
  • Output pane
  • displays logged results
  • log level is configurable

Expansion buttons in left pane correspond to
publication steps.
  • Status bar
  • shows scan progress

7
Data scan
1. Dataset is created or updated based on input
metadata.Required fields are highlighted.
Selecting an extraction option starts the
dataset scan. Options are- create a new
dataset, or replacethe dataset if it exists -
append or update - the files are added to an
existing dataset.
2. Files are scanned and internal database
tables populated.
3. If an aggregation dimension is found or
specified, variables are aggregated.
8
Data aggregation and publication
  • Publication step
  • Generate THREDDS catalog for harvesting,
    server-side configuration
  • Release data for harvesting

9
Configuration
  • INI style
  • Named section for each application, project
  • Each section contains options for that section
  • Expands (option)s interpolations
  • Per-project specification of models, experiments,
    standard names
  • Enumerate valid values for fields
  • Per-project handler encapsulates logic for
    reading / generating metadata

10
Status
  • Publication GUI is pre-alpha
  • Implemented in Python, Tcl/Tk
  • Metadata DB is MySQL, but flexibility to use
    PostGRES
  • ESG-specific data (in addition to THREDDS) needs
    to be defined.
  • Method of data release depends on harvesting
    infrastructure
  • Still to do
  • Dataset management
  • Display existing datasets
  • Delete dataset(s)
  • Individual file deletion
  • Handling multiple datasets
  • Handle non-CF compliant netCDF
  • Improved handling of preferences
  • Interface to backup systems?
  • Interface to authn/authz
  • Checksums?
Write a Comment
User Comments (0)
About PowerShow.com