Combining%20Metadata%20Standards:%20Approaches%20and%20Benefits - PowerPoint PPT Presentation

About This Presentation
Title:

Combining%20Metadata%20Standards:%20Approaches%20and%20Benefits

Description:

Combining Metadata Standards: Approaches and Benefits Arofan Gregory Open Data Foundation – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 30
Provided by: Arof6
Learn more at: https://unece.org
Category:

less

Transcript and Presenter's Notes

Title: Combining%20Metadata%20Standards:%20Approaches%20and%20Benefits


1
Combining Metadata Standards Approaches and
Benefits
  • Arofan Gregory
  • Open Data Foundation

2
Overview
  • Recent events of interest
  • The Standards Comparison and Explanation
  • Emerging Implementation Approaches
  • DDI and SDMX
  • SDMX and the Semantic Web Technologies
  • Classifications Multiple Standards
  • Ideas about Future Work

3
Recent Events of Interest
  • Note Some of these events/implementations have
    been or will be described in detail in other
    papers they are only mentioned here.
  • Schloss Dagstuhl, Germany, November 2009 (DDI 3
    Workshop)
  • SDMX 2.0 DDI 3 field-level mapping work started
  • Topic DDI and the Semantic Web???

4
Recent Events of Interest (2)
  • Semantic Web and SDMX
  • ONS hosted 2-day meeting in the UK, February 2009
    (produced draft SDMX-RDF)
  • Banca dItalia has a prototype project
  • New project launched at University of Tillburg in
    the Netherlands (RDF expression of OECD SDMX
    data)
  • Australian Bureau of Statistics (ABS) starts
    looking at SDMX and DDI to support data
    production lifecycle
  • Prototype implementations
  • Some other NSIs also very interested

5
Recent Events of Interest (3)
  • Classifications and ISO/IEC 11179
  • Australia Government agencies looking to
    exchange classifications with ABS from existing
    ISO/IEC 11179 system, using SDMX, DDI
  • Statistics Canada Evaluation of IMDB (ISO/IEC
    11179-based metadata repository) for use in
    coordination with Canadian RDC Network (based on
    DDI 3)

6
What Does This Mean?
  • Not a complete list of events/implementations,
    but
  • Indicates the interest we are seeing in the
    combined use of standards!
  • These are not just experiments!
  • Organizations are looking at implementation in a
    serious way now

7
Characterizing the Standards
  • SDMX
  • Data structures and formats
  • Reference metadata structures and formats
  • Web-services architecture based on registry
    services
  • Content-oriented gudelines
  • ISO/IEC 11179
  • Model for managing concepts and data elements
  • Metadata registries and lifecycle
  • ISO 19115
  • Standard metadata model for geographies
  • Used by DDI as geographical model

8
Characterizing the Standards (2)
  • Dublin Core
  • Citation metadata
  • Widely used in the Semantic Web
  • Used natively by DDI for citations
  • Semantic Web/ Linked Data / RDF
  • See Open Issues on the Semantic Web
  • DDI 3
  • Will give more detail, as it is not as familiar
    to the METIS community

9
Characterizing the Standards (3)
  • DDI 1./2. was a standard used by archives and
    data libraries
  • Based on a codebook model
  • Used by some NSIs, especially in the developing
    world because of the IHSN Metadata Management
    Toolkit
  • Used by the European network of data archives,
    CESSDA
  • Used by many data archives in North America
  • Documentation of a single Study (survey)
  • Designed to help researchers find and use
    microdata
  • DDI 3 is more ambitious capture and use of
    metadata throughout the entire data lifecycle

10
DDI 3 Lifecycle Model
Notice This is very like a high-level view of
the METIS model!
11
Characterizing the Standards (4)
  • DDI 3 provides machine-actionable metadata to
    support metadata-driven systems throughout the
    lifecycle
  • Focus is on upstream metadata capture and reuse
  • Describes tabulation/aggregation of microdata
  • Provides support for comparison across surveys,
    detailed geography, data processing, register
    data
  • Aggregate NCube model aligned with SDMX
  • No architecture/web services support (yet)

12
An Observation
  • It is easy to say that two standards are
    aligned
  • Many of these standards were intentionally
    aligned as they were developed
  • It is much more difficult to understand how to
    use them in combination effectively

13
Approaches and Benefits
  • SDMX and DDI
  • DDI microdata production/SDMX aggregate
    dissemination
  • Using SDMX data in DDI-based systems (combining
    aggregates and microdata)
  • Combined SDMX/DDI supporting the entire data
    lifecycle
  • DDI register data reported to SDMX collection
    system
  • SDMX and the Semantic Web
  • Classifications and the Standards

14
DDI 3 Metadata
Surveys
Input data
Dissemination data
Registers
Cleaning, editing, estimation, aggregation, etc.
Website/Web Service
SDMX-ML Data, Metadata, Structure
15
DDI SDMX Benefits
  • The benefits of this approach are those found by
    using the standards generally
  • Supports metadata-driven system for data
    production throughout the lifecycle (DDI)
  • Metadata-rich dissemination format, preferred by
    data collectors (SDMX)
  • Shared tools SDMX registry services, Web
    Services for discovery and use of aggregates

16
SDMX DDI Integrating Aggregates and Microdata
  • Scenario is common in some research
  • Economic data is often only available as
    aggregates
  • Challenge is to combine aggregates and other
    microdata

17
SDMX Web Service
SDMX-to-DDI 3 Transform
Data archive/ repository
Surveys
(DDI 3)
Processing to produce Integrated data and
Metadata (DDI 3)
Registers
(DDI 3)
18
SDMX DDI Benefits
  • Allows for easy use of official statistics by
    researchers
  • Solves problems of combining aggregates and
    microdata
  • Note This does not involve dis-aggregation of
    published data
  • Structural transformation only, to allow DDI 3
    systems to process aggregates easily

19
DDI SDMX The Data Lifecycle
  • Uses a metadata model capable of expression as
    either SDMX or DDI, depending
  • Provides support for process management
  • Uses many features of SDMX (process model,
    structure sets, reporting taxonomies, etc.)
  • Uses SDMX architecture/services model
  • Designed to allow incorporation of other standards

20
Process-management system
(BPML)
All registry interactions use SDMX
(SDMX)
Dissemination data store
Input data store
SDMX Registry
Surveys
(DDI 3)
Web site/ Print/ Web Services
Registers
(DDI 3)
Interactions between systems are DDI or SDMX Web
Services, as appropriate
(SDMX, DDI, etc.)
Data and metadata repositories/ application
databases
21
SDMX DDI Benefits
  • Leverages Web-Services technologies (registry,
    event triggers, etc.) for efficient automation,
    migration, flexibility
  • Choice of tools is broad
  • Use the best format for any given task
  • All the benefits of DDI-SDMX case
  • Good support for process management as well as
    data management

22
SDMX and the Semantic Web Technologies
  • Potentially applies to other standards as well
    (DDI, ISO/IEC 11179, etc.)
  • Note that Semantic Web technologies only apply to
    dissemination
  • Not designed to support data production
  • Terms
  • Raw data in an SW context does not mean raw
    data
  • Data in an SW context means anything that can
    be described using RDF not numeric data

23
Assumptions
  • Creation of a harmonized statistical model based
    on proven models/standards, but expressed as RDF
    (ontology or vocabulary in SW terms)
  • Implementation of an SDMX-RDF in standard SDMX
    dissemination packages

24
Internal (production environment)
External (dissemination to Web)
Triplestore (SDMX- RDF)
SDMX-RDF Transform
(SPARQL Queries)
(RDF)
(SDMX-driven production system)
SDMX Web Service
(SDMX-ML)
Dissemination data store (SDMX)
25
SDMX and the Semantic Web Benefits
  • Leverages the Linked Data phenomenon without
    requiring a deep understanding of RDF, etc.
  • Uses existing standards/models and best practices
    to do heavy lifting (data production)
  • Puts a lot of reliable, quality data into the
    Linked Data Web
  • Helps address issues of provenance

26
Warning
  • RDF is verbose!
  • 4.5 Megs of GESMES/TS 45 Megs of compact
    SDMX-ML XML 420 Megs of RDF triples
  • This may encourage the on-demand production of
    RDF data from web services, rather than static
    files

27
Standards and Classifications
  • Some maintainers of standard classifications are
    looking at expressing them in useful formats
    (SDMX, DDI)
  • This is an easy thing to do
  • It is very useful promotes re-use,
    comparability, etc.
  • Could apply to Semantic Web RDF expressions as
    well as XML-based standards

28
Ideas for Future Work
  • Endorse SDMX DDI mappings now being produced
  • Develop an SDMX-RDF (?) or
  • Develop a harmonized statistical model for
    expression in RDF (based on DDI, SDMX, ISO/IEC
    11179) (?)
  • Encourage tools developers to implement it in
    standard dissemination packages
  • Publish standard classifications in standard
    formats

29
Summary
  • Combined use of standards is becoming a reality
  • Proactive engagement with the Semantic Web world
    could provide benefits to all concerned parties,
    as well as users
Write a Comment
User Comments (0)
About PowerShow.com