Title: Automated Scanning of Observational Data Sets for the Generation of Formal Metadata
1Automated Scanning of Observational Data Sets for
the Generation of Formal Metadata
- Jason Fabritz and Donald Denbo
- UW/JISAO-NOAA/PMEL
- Presented byChris Moore
- US/JISAO-NOAA/PMEL
2Project Goals
Create tools to automatically generate FGDC
compliant metadata
- Improve quality and consistency of NOAAs
metadata - Increase users ability to find data of interest
during web-base data discovery - Provide flexibility by reading several file
formats and creating multiple FGDC metadata
output formats
3Process Overview
- Ingest Observational Data
- Transform to standard nomenclature
- Accumulate in relational database
- Extract metadata summaries in eXtensible Markup
Language (XML) - Translate XML to standard FGDC format using
eXtensible Stylesheet Language (XSL)
4Ingesting Observational Data Sets
- Modular tool design provides the flexibility
required to ingest different file formats (netCDF
and NODC p3) - Program creates a metadata summary in XML using
native keywords - Native XML summary is transformed into a standard
XML summary using XSLT - Standard XML summaries are accumulated in a mySQL
database
5metaArchitect Configuration File Generation
6Ingest/Upload Process
7Database
- Standard XML summaries are stored in a generic
hierarchical structure, for example - Experiment 1
- Cruise A
- Section A-1
- Section A-2
- Cruise B
- Experiment 2
- Cruise A
- Cruise B
8Database cont.
- Metadata records, in XML format, can be generated
by the hierarchical items, for example - Experiment
- Cruise
- Section
- The XML output is then translated, using XSLT,
into FGDC format
9Metadata Generation
10Technology
- XML
- Industry standard
- Third party parsers available
- Human readable
- XSL and XSLT
- Industry standard
- Shortens development time of XML transformations
- Hierarchical database storage
- Flexible Metadata creation
11XSL in a Nut Shell
ltxslstylesheet gt ltxsltemplate matchbookgt
ltHTMLgtltBODYgtltxslapply-templates/gtlt/BODYgtlt/HTML
gt lt/xsltemplategt ltxsltemplate
matchtitlegt ltH1gtltxslapply-templates/gtlt/H1gt
lt/xsltemplategt lt/xslstylesheetgt
ltbookgt lttitlegtMy Lifelt/titlegt ltauthorgtJ.
Smithlt/authorgt ltpublishergtACME
Publishinglt/publishergt lt/bookgt
ltHTMLgt ltBODYgt ltH1gtMy Lifelt/H1gt ltH3gtJ.
Smithlt/H3gt ltPgtltIgtACME Publishinglt/Igtlt/Pgt
lt/BODYgt ltHTMLgt
XSLT Processor
12Future Development
- Ingest tool for EPIC format
- Create NODC p3 format writer
- Ingest tool for NODC p3 format
- Automation tool for FGDC files.
13Metadata Processing
FGDC XML
Configure Supplement (XML)
Data Format Specific Converter
Output Processor
Database
FGDC Format
Data
NOAA FGDC Format
Keyword Translation (XSL)
FGDC Translation (XSL)