Distributed Data, Distributed Governance, Distributed Vocabularies: The NERC DataGrid - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Distributed Data, Distributed Governance, Distributed Vocabularies: The NERC DataGrid

Description:

Title: Describing data through metadata and XML: the CSML experience. Author: Bryan N Lawrence Last modified by: Bryan N Lawrence Created Date: 9/27/2005 10:02:00 AM – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 55
Provided by: bryannl
Category:

less

Transcript and Presenter's Notes

Title: Distributed Data, Distributed Governance, Distributed Vocabularies: The NERC DataGrid


1
Distributed Data, Distributed Governance,
Distributed Vocabularies The NERC DataGrid
Bryan Lawrence (on behalf of a big team, and note
also a substantial piece of work with specific
authorship included herein)






BADC, BODC, CCLRC, PML and SOC
2
Outline
  • Motivation
  • Standards
  • Feature Types
  • Taxonomy
  • Overall Architecture
  • NDG Products
  • Discovery Portal
  • Data Extractor
  • MOLES (NumSim relationship with NMM)
  • CSML
  • CSML
  • Description
  • Prototyping in MarineXML
  • Round-Tripping
  • Vocabulary Issues IN NDG (Hughes, Kondapalli,
    Lowry)
  • NDG Timeline

3
Complexity Volume Remote Access Grid
Challenge
British Atmospheric Data Centre
http//ndg.nerc.ac.uk
British Oceanographic Data Centre
4
Integration semantics
  • Want interdisciplinary semantic access to
    information, not abstract data
  • getData(potential temperature from ERA-40 dataset
    in North Atlantic from 1990 to 2000)
  • not getData(era40.nc, PTMP, 2050, 300340,
    190200)
  • or even worse
  • for j19902000
  • getData(era40_j.nc, PTMP, 2050, 300340)
  • Lossy is OK!
  • Care less about completeness of representation
    than semantic unification

5
Standards
  • ISO 19101 Geographic information Reference
    model

6
Standards
  • Geographic features
  • abstraction of real world phenomena ISO 19101
  • Type or instance
  • Encapsulate important semantics in universe of
    discourse
  • Something you can name
  • Application schema
  • Defines semantic content and logical structure
  • ISO standards provide toolkit
  • spatial/temporal referencing
  • geometry (1-, 2-, 3-D)
  • topology
  • dictionaries (phenomena, units, etc.)
  • GML canonical encoding

from ISO 19109 Geographic information Rules
for Application Schema
7
Architecture NDG Metadata Taxonomy
not one schema, not one solution!
8
Architecture Deployment
9
Architecture Deployment
10
Architecture Deployment
11
Architecture Deployment
12
Current Status
13
Discovery Service
NDG Products Discovery Portal
http//ndg.nerc.ac.uk/discovery
NB Web Service Interface (you can do the search
from your own site and format and present the
results there!
14
(No Transcript)
15
(No Transcript)
16
NDG Products MOLES
Ugly as sin! A hint of things to come
17
MOLES implementation
  • Core linking concept is the deployment

of a Data Production Tool
at an Observation Station
on behalf of an Activity
that produces a Data Entity
Activity
DataProductionTool
ObservationStation
Links the metadata records into a structure that
can be turned into a navigable structure
Deployment
Each of the main metadata objects has security
data attached to it. This means that this can be
applied to queries on the metadata
Data Entity
18
Simulators as data production tools NumSim
NDG Products NumSim
19
NumSim Example
NumSim Example
20
(No Transcript)
21
NDG Products DataExtractor
22
(No Transcript)
23
NDG Products GEOSPLAT
24
  • ERA40
  • All driven from one CDML file, 9 TB online
    spherical harmonics, looking like 40 TB virtual
    gridded!

25
NDG-A Climate Science Modelling Language
  • Aims
  • provide semantic integration mechanism for NDG
    data
  • explore new standards-based interoperability
    framework
  • emphasise content, not container
  • Design principles
  • offload semantics onto parameter type
    (phenomenon, observable, measurand)
  • e.g. wind-profiler, balloon temperature sounding
  • offload semantics onto CRS
  • e.g. scanning radar, sounding radar
  • sensible plotting as discriminant
  • in-principle unsupervised portrayal
  • explicitly aim for small number of weakly-typed
    features (in accordance with governance principle
    and NDG remit)

26
Climate Science Modelling Language
  • CSML feature types
  • defined on basis of geometric and topologic
    structure

CSML feature type Description Examples
TrajectoryFeature Discrete path in time and space of a platform or instrument. ships cruise track, aircrafts flight path
PointFeature Single point measurement. raingauge measurement
ProfileFeature Single profile of some parameter along a directed line in space. wind sounding, XBT, CTD, radiosonde
GridFeature Single time-snapshot of a gridded field. gridded analysis field
PointSeriesFeature Series of single datum measurements. tidegauge, rainfall timeseries
ProfileSeriesFeature Series of profile-type measurements. vertical or scanning radar, shipborne ADCP, thermistor chain timeseries
GridSeriesFeature Timeseries of gridded parameter fields. numerical weather prediction model, ocean general circulation model
27
Climate Science Modelling Language
  • CSML feature types
  • examples...

28
Climate Science Modelling Language
  • Numerical array descriptors
  • provides wrapper architecture for legacy data
    files
  • Connected to data model numerical content
    through xlinkhref
  • Three subtypes
  • InlineArray
  • ArrayGenerator
  • FileExtract (NASAAmes, NetCDF, GRIB)
  • Composite design pattern for aggregation

29
Climate Science Modelling Language
  • Inline array
  • Array generator

ltNDGInlineArraygt ltarraySizegt5 2lt/arraySizegt ltuom
gtudunits.xmldegreeClt/uomgt ltnumericTypegtfloatlt/nu
mericTypegt ltregExpTransformgts/10/9/gelt/regExpTran
sformgt ltnumericTransformgt5lt/numericTransformgt lt
valuesgt1 2 3 4 5 6 7 8 9 10lt/valuesgt lt/NDGInlineAr
raygt
ltNDGArrayGeneratorgt ltarraySizegt10001lt/arraySizegt
ltuomgtudunits.xmlminutelt/uomgt ltnumericTypegtfloat
lt/numericTypegt ltexpressiongt0550000lt/expressiongt
lt/NDGArrayGeneratorgt
30
Climate Science Modelling Language
  • File extract

ltNDGNASAAmesExtractgt ltarraySizegt526lt/arraySizegt
ltnumericTypegtdoublelt/numericTypegt ltfileNamegt/data
/BADC/macehead/mh960606.cf1lt/fileNamegt ltvariableN
amegtCFC-12lt/variableNamegt lt/NDGNASAAmesExtractgt
ltNDGNetCDFExtract gmlid"feat04azimuth"gt ltarra
ySizegt10000lt/arraySizegt ltfileNamegtradar_data.nclt
/fileNamegt ltvariableNamegtazlt/variableNamegt lt/ND
GNetCDFExtractgt
ltNDGGRIBExtractgt ltarraySizegt320
160lt/arraySizegt ltnumericTypegtdoublelt/numericTypegt
ltfileNamegt/e40/ggas1992010100rsn.grblt/fileNamegt
ltparameterCodegt203lt/parameterCodegt ltrecordNumber
gt5lt/ recordNumbergt ltfileOffsetgt289412lt/fileOffset
gt lt/NDGGRIBExtractgt
31
MarineXML Testbed
For each XSD (for the source data) there is an
XSLT to translate the data to the Feature Types
(FT) defined by CSML. The FTs and XSLT are
maintained in a MarineXML registry
Phenomena in the XSD must have an associated
portrayal
Data from different parts of the marine community
conforming to a variety of schema (XSD)
The FTs can then be translated to equivalent FTs
for display in the ECDIS system
XSD
XML
Biological Species
S52 Portrayal Library
XSD
XML
Chl-a from Satellite
XML Parser
MarineGML(NDG) Feature Types
XSLT
XML
XSLT
XSLT
SeeMyDENC
SENC
XSD
MeasuredHydrodynamics
XML
XSLT
XML
XSLT
XSLT
ECDIS acts as an example client for the data.
XSD
Data Dictionary
XML
ModelledHydrodynamics
The result of the translation is an encoding
that contains the marine data in weakly typed
(i.e. generic) Features
Features in the source XSD must be present in the
data dictionary.
XSD
Feature described using S-57v3.1Application
Schema can be imported and are equivalent to the
same features in CSML
XML
S-57v3 GML
Slide adapted from Kieran Millard (AUKEGGS, 2005)
32
MarineXML Testbed
Biological sampling station with attributes for
the species sampled at each
Grid of Chl-a from the MERIS instrument on ENVISAT
Predicted and measured wave climate timeseries
(height, direction and period)
Vectors of currents from instruments
Slide adapted from Kieran Millard (AUKEGGS, 2005)
33
The Concept of re-using Features
Here structured XML is converted to plain ascii
text in the form required for a numerical model
HTML warning service pages are generated on the
fly
Here the same XML is converted to the SENC format
used in a proprietary tool for viewing electronic
navigation charts.
XML can also be converted to SVG to display data
graphically
Slide adapted from Kieran Millard (AUKEGGS, 2005)
34
CSML Round Tripping - 1
Managing semantics
35
CSML Round Tripping - 2
Managing data - 1
36
Managing Data 2
scanner
XSLT
PUBLISH
ISO19115
37
Architecture Deployment
38
Vocabulary Management for NERC DataGrid
  • Michael Hughes, V.Siva Kondapalli and Roy Lowry

39
Vocabulary Presentation Outline
  • Problem and Solution
  • NERC DataGrid Vocabulary Model
  • Vocabulary Technical Governance
  • Vocabulary Content Governance
  • Mappings and Thesaurus Server
  • Potential Role of Local Mappings

40
The Problem
  • NERC DataGrid cannot function operationally
    without metadata and data semantic
    interoperability
  • This will never be achieved without
  • Readily accessible standard terms whose meaning
    is clearly understood
  • Readily accessible semantic maps both within and
    between lists of standard terms
  • Semantic maps between local terms and standard
    terms

41
The Solution?
  • Implementation of a Vocabulary Server
  • Building OWL ontologies mapping between
    domain-relevant de-facto standard vocabularies
  • Deploying the ontologies through a Web Service
    thesaurus server
  • Making tools available for users to build and
    deploy local ontologies

42
NDG Vocabulary Model
43
NDG Vocabulary Model
  • The vocabulary resource is built from Entries
  • The representation of a single object in the real
    world comprising
  • Key - A bit pattern that represents an entity. It
    must be unique, permanent and free from
    semantics.
  • Term Text used to label the entity to
    facilitate human recognition.
  • Abbreviation An shortened version of the term
    for use where space is tight. Target size is
    20-30 bytes.
  • Definition Text that unambiguously specifies
    the entity.
  • Entries are aggregated into Lists (entity class
    or subclass e.g. UK post towns)
  • Lists are aggregated into Constraints (entity
    class e.g. post towns of the world)

44
Vocabulary Technical Governance
  • The story so far
  • Lists are available as flat ASCII files or XML
    documents as URLs e.g.
  • http//www.cgd.ucar.edu/cms/eaton/cf-metadata/stan
    dard_name.xml
  • ftp//ftp.pol.ac.uk/pub/bodc/jgofs/datadict/new/pa
    rameter_group.csv
  • http//www.sea-search.net/cdi_documentation/cdi_sa
    mpling_codes.csv
  • http//gcmd.nasa.gov/Resources/valids//gcmd_parame
    ters.html
  • Some (BODC, SEA-SEARCH) include keys
  • Some (CF, BODC) include definitions
  • None are properly versioned

45
Vocabulary Technical Governance
  • Versioning should
  • Provide a unique label for each instantiation of
    the list
  • Enable any previous instantiation of the list to
    be recreated
  • Provide timestamp information for creation and
    modification of every object in the vocabulary
    system
  • Delivery should
  • Be from the master, not a copy
  • Be accessible to software agents to allow
    automated synchronisation of local copies
  • Have a hotline to content governance

46
Vocabulary Technical Governance
  • NERC DataGrid Vocabulary Server
  • Back End
  • Fully automated record archive, timestamps and
    version numbering. Live April 2006.
  • 47 (of 115) lists publicly accessible.
  • Front End
  • Web Service API. Live June 2006.
  • XML list downloads from website (July 2006?).
  • Web-form tools (August 2006?).

47
Vocabulary Content Governance
  • Standard lists need to respond to ever expanding
    user requirements
  • Change needs to be rapid or users lose interest
  • Standard lists need to maintain information
    quality and internal consistency
  • Content governance has to resolve these
    conflicting requirements

48
Vocabulary Content Governance
  • Content governance in oceanographic and
    atmospheric domains is based on
  • Moderated e-mail discussion lists
  • Benign Dictator and well-meaning volunteers
  • Variable success depending on right people having
    spare time at the right moments
  • More formalism underpinned by more resources
    required
  • But need to be careful about going too far or
    levels of service become unacceptable

49
Mappings and Thesaurus Server
  • There will never be a single list for a given
    topic
  • Term mapping therefore an essential part of
    semantic interoperability
  • Marine Metadata Interoperability
    (http//marinemetdata.org) have developed tooling
    and trialled mappings in the measurement
    phenomena arena

50
Mappings and Thesaurus Server
  • MMI approach
  • Harmonise lists to be mapped in OWL (Voc2OWL
    tool)
  • Map on basis of same as, broader than and
    narrower than relationships (VINE tool)
  • Place a Web Service API over the map to implement
    a term or thesaurus server

51
Mappings and Thesaurus Server
  • NERC DataGrid Plans
  • Use MMI technology plus domain expertise
    available in BODC, BADC and their user
    communities to build a complete map between
  • BODC Parameter Discovery Vocabulary (300 terms)
  • CF Standard Names (5-600 terms)
  • GCMD Parameter Valids (2-300 relevant terms)
  • Incorporate this map into the NDG Discovery
    Service to facilitate smart searching (e.g.
    pigments finds dataset labelled chlorophyll)
    through MMI Web Service
  • Integrate ontology maintenance into source list
    maintenance

52
Role of Local Mappings
  • There will always be local terms and
    understanding
  • Pigment data sets could mean
  • Chlorophyll OR carotenoids OR phaeopigments
  • Chlorophyll AND carotenoids AND phaeopigments
  • Depends on point of view

53
Role of Local Mappings
  • Possible solution to this
  • User builds an ontology reflecting local
    perception of the mapping between local terms and
    standard terms
  • Discovery or data integration tools use ontology
    as a plug-in allowing user to operate with
    local terminology
  • Tools (e.g. VINE) could be made available to
    facilitate this

54
NDG Timeline
  • NDG2 runs until September 2007
  • NDG-Alpha (June 2006)
  • Not all components in place (particularly
    delivery broker)
  • Not many (maybe only DX) products will be
    deployable by non-NDG participants
  • (too much hard work installing things that
    havent been optimised for installation)
  • Discovery portal will be (is now) usable, linking
    to NCAR data etc, but isnt very user friendly
    (options not obvious etc).
  • NDG-Beta (Feb 2007)
  • Most components should work, but deployment of
    software may still be difficult by
    non-participants
  • NDG-Prod (Jun 2007)
  • Should be deployable and far more user friendly
    (spending from Feb-June working on deployment and
    friendliness, no new functionality)
  • Last few months working on sustainability etc

http//proj.badc.rl.ac.uk/trac/roadmap
Write a Comment
User Comments (0)
About PowerShow.com