Title: eScience and Metadata
1e-Science and Metadata Kevin ONeill et alData
Management Group and the NERC DataGrid
projectCCLRC e-Science Centre
k.d.oneill_at_rl.ac.uk
2Integrating distributed data holdings for
distributed users
- Different Data Centres serve different
communities - Different communities have
- different understandings of the same data
- different tools accessing the data that have
different metadata needs - and new capabilities coming along all the time
- This is not a Green Field Site
- Long-established Centres
- Have to fit into to the existing NERC and
international frameworks technical and legal - Heterogeneous data holdings of variable metadata
quality - Little control over the external providers of
data/metadata
3What is metadata?
- A word that means what the speaker/hearer wants
it to mean - and the meaning can change from instance to
instance
- The data about the data that enables a set of
operations or applications - Different tasks need different metadata
4NDG Metadata Taxonomy (1)
5NDG Metadata Taxonomy (2)
The industry standard formats used by
discovery portals
D(iscovery)
DATA BROWSE(A aka CSML) Climate Science
Markup Language aims to provide a semantic view
of the data, covering values and internal
structure
B aka MOLES Metadata Object Links for
Environmental Science identifies the objects of
interest and their significant relations to each
other
S(ummary) The intersection of CSML and
MOLES, but expressed with different syntax and
semantic emphasis in each
6D(iscovery) metadata
- Created by data providers and stored and used at
discovery portals - Its role to help the user find the data
- by providing enough detail to say whether
particular data sets are of interest - without having to move large amounts data or
metadata around - Common formats are FGDC, DIF, and ISO 19115
(including profiles) - Hierarchical
- put the entity they are really interested in at
the top - Used in existing community portals (GCMD,
NMG,GIGateway) - NDG encodes in XML and makes available to portals
7B metadataa domain ontology
- A high-level statement of entities important to
the NDG, and the relations between them - Identifies entities as semantically important
objects - Takes a top-down view, providing an extensible
framework - will get richer in detail and relations as this
detail becomes available (cf Earley Suite
(numerical model description) - carries a lot of semantics in entries from
standard dictionaries etc. - Eventually, use of thesaurus/ontology servers to
provide enhanced and more intelligent discovery
Implemented as MOLES (Metadata Object Links for
Environmental Science)
8Role of MOLES
- MOLES is a store of metadata intended to
- Provide a more complete metadata store than that
demanded by the usual discovery formats,
leveraging the metadata holdings of the data
centres - Allow the production of the various industry
standard discovery formats - DIF, FGDC/GEO, ISO 19115, SensorML, Dublin Core
- Summarising the key points of the data that the
discovery standards require, and that can be
populated - Add elements and relations that dont appear in
the data - Allow a smooth link across to the data browse and
use elements of the NDG - Provide a hook for related systems (e.g.
publications, annotations) via the permanent
identifier scheme
9MOLES is NOT
- a primary storage format
- generated from data providers internal metadata
resources - intended to be a front-line discovery format
- there are enough already
- and there will be more
- but some of the features will be used exploited
by fully NDG-enabled metadata stores
10MOLES a simplified view
11Linking in MOLES
- Core linking concept is the Deployment of a Data
Production Tool at an Observation Station on
behalf of an Activity that produces a Data Entity
Activity
DataProductionTool
ObservationStation
Links the metadata records into a structure that
can be turned into a navigable/processable XML
network of trees with any of the record types
as the root element.
Each of the main metadata objects has security
data attached to it. This can be applied to
queries on the metadata
Deployment
Data Entity
12a network of trees
ObservationStation 1
Instrument 1
Dataset 1
- The core objects can be linked to many deployments
- This provides the means to navigate between
objects in a meaningful way
Activity 1
Dataset 2
- And there are more named relations in there to
exploit(between activities, data sets)
Activity 2
Instrument 2
13Vocabularies and taxonomies
- Several initiatives to build these covering a
huge range of disciplines - sometimes several for the same area
(species-naming), but were not judging - a caveat often more effort goes into the initial
definition than is spent on the maintenance and
user education (
14MOLES Futures
- Extension of the core model classes
- Changing the syntax and internal terminology to
be ISO compliant - Mappings to more discovery formats
- Links to
- publication systems
- annotation systems
15Discovery metadata dissemination
Open Archives Initiative Digital Library
Protocol for harvesting metadata. NDG Supports
Multiple Discovery Services build your own
portal Discovery portal pulls discovery format
into a single corpus
OAI
MOLES -gt Discovery
OAI
MOLES -gt Discovery
MOLES -gt Discovery
16Metadata Usage within NDG