eScience and Metadata - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

eScience and Metadata

Description:

Activating metadata workshop, NIEeS, 6-7 July, 2005. e-Science and Metadata. Kevin O'Neill et al ... Activating metadata workshop, NIEeS, 6-7 July, 2005 ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 17
Provided by: kevon
Category:

less

Transcript and Presenter's Notes

Title: eScience and Metadata


1
e-Science and Metadata Kevin ONeill et alData
Management Group and the NERC DataGrid
projectCCLRC e-Science Centre
k.d.oneill_at_rl.ac.uk
2
Integrating distributed data holdings for
distributed users
  • Different Data Centres serve different
    communities
  • Different communities have
  • different understandings of the same data
  • different tools accessing the data that have
    different metadata needs
  • and new capabilities coming along all the time
  • This is not a Green Field Site
  • Long-established Centres
  • Have to fit into to the existing NERC and
    international frameworks technical and legal
  • Heterogeneous data holdings of variable metadata
    quality
  • Little control over the external providers of
    data/metadata

3
What is metadata?
  • A word that means what the speaker/hearer wants
    it to mean
  • and the meaning can change from instance to
    instance
  • The data about the data that enables a set of
    operations or applications
  • Different tasks need different metadata

4
NDG Metadata Taxonomy (1)
5
NDG Metadata Taxonomy (2)
The industry standard formats used by
discovery portals
D(iscovery)
DATA BROWSE(A aka CSML) Climate Science
Markup Language aims to provide a semantic view
of the data, covering values and internal
structure
B aka MOLES Metadata Object Links for
Environmental Science identifies the objects of
interest and their significant relations to each
other
S(ummary) The intersection of CSML and
MOLES, but expressed with different syntax and
semantic emphasis in each
6
D(iscovery) metadata
  • Created by data providers and stored and used at
    discovery portals
  • Its role to help the user find the data
  • by providing enough detail to say whether
    particular data sets are of interest
  • without having to move large amounts data or
    metadata around
  • Common formats are FGDC, DIF, and ISO 19115
    (including profiles)
  • Hierarchical
  • put the entity they are really interested in at
    the top
  • Used in existing community portals (GCMD,
    NMG,GIGateway)
  • NDG encodes in XML and makes available to portals

7
B metadataa domain ontology
  • A high-level statement of entities important to
    the NDG, and the relations between them
  • Identifies entities as semantically important
    objects
  • Takes a top-down view, providing an extensible
    framework
  • will get richer in detail and relations as this
    detail becomes available (cf Earley Suite
    (numerical model description)
  • carries a lot of semantics in entries from
    standard dictionaries etc.
  • Eventually, use of thesaurus/ontology servers to
    provide enhanced and more intelligent discovery

Implemented as MOLES (Metadata Object Links for
Environmental Science)
8
Role of MOLES
  • MOLES is a store of metadata intended to
  • Provide a more complete metadata store than that
    demanded by the usual discovery formats,
    leveraging the metadata holdings of the data
    centres
  • Allow the production of the various industry
    standard discovery formats
  • DIF, FGDC/GEO, ISO 19115, SensorML, Dublin Core
  • Summarising the key points of the data that the
    discovery standards require, and that can be
    populated
  • Add elements and relations that dont appear in
    the data
  • Allow a smooth link across to the data browse and
    use elements of the NDG
  • Provide a hook for related systems (e.g.
    publications, annotations) via the permanent
    identifier scheme

9
MOLES is NOT
  • a primary storage format
  • generated from data providers internal metadata
    resources
  • intended to be a front-line discovery format
  • there are enough already
  • and there will be more
  • but some of the features will be used exploited
    by fully NDG-enabled metadata stores

10
MOLES a simplified view
11
Linking in MOLES
  • Core linking concept is the Deployment of a Data
    Production Tool at an Observation Station on
    behalf of an Activity that produces a Data Entity

Activity
DataProductionTool
ObservationStation
Links the metadata records into a structure that
can be turned into a navigable/processable XML
network of trees with any of the record types
as the root element.
Each of the main metadata objects has security
data attached to it. This can be applied to
queries on the metadata
Deployment
Data Entity
12
a network of trees
ObservationStation 1
Instrument 1
Dataset 1
  • The core objects can be linked to many deployments
  • This provides the means to navigate between
    objects in a meaningful way

Activity 1
Dataset 2
  • And there are more named relations in there to
    exploit(between activities, data sets)

Activity 2
Instrument 2
13
Vocabularies and taxonomies
  • Several initiatives to build these covering a
    huge range of disciplines
  • sometimes several for the same area
    (species-naming), but were not judging
  • a caveat often more effort goes into the initial
    definition than is spent on the maintenance and
    user education (

14
MOLES Futures
  • Extension of the core model classes
  • Changing the syntax and internal terminology to
    be ISO compliant
  • Mappings to more discovery formats
  • Links to
  • publication systems
  • annotation systems

15
Discovery metadata dissemination
Open Archives Initiative Digital Library
Protocol for harvesting metadata. NDG Supports
Multiple Discovery Services build your own
portal Discovery portal pulls discovery format
into a single corpus
OAI
MOLES -gt Discovery
OAI
MOLES -gt Discovery
MOLES -gt Discovery
16
Metadata Usage within NDG
Write a Comment
User Comments (0)
About PowerShow.com