eXtending MetaData Registries: XMDR Project - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

eXtending MetaData Registries: XMDR Project

Description:

MOF, CWM, UML. Information Artifacts. Metadata. printed 7/5/2005 10:33 AM page 11 of 30 ... ISO WG2: 11179, 19763, 20944, 24707; OMG: ODM, CWM; Say which is which ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 42
Provided by: johnmc2
Category:

less

Transcript and Presenter's Notes

Title: eXtending MetaData Registries: XMDR Project


1
eXtending MetaData RegistriesXMDR Project
Prototype
  • XMDR Working Group
  • Presentation to SC 32/WG 2 meeting
  • And to SC 32/WG 2 (11179 P23 (E3), MMF, and
    OMG/ODM Liaison Meeting
  • September, 2005
  • Toronto, Canada

2
XMDR Outline
  • MDRs Purpose and Goals (slides 3-8)
  • Differentiate - position with other ISO non ISO
    standards (slides 9 11)
  • Present XMDR Collaboration Project Prototype
    Purposes Goals (slides 12 33)
  • Describe metadata/technical platform/architecture
    for XMDR (slide 28-33)
  • Demonstrate XMDR-it (live demo? Screen snaps?)
  • Explain importance of XMDR-it three levels of
    constraints
  • XML, RDF, OWL (slide 34-38)
  • Outline current challenges and future plans
    (slide 40-41)
  • Contacts/URLs/Credits (slide 42-43)

3
In the Beginning
  • Organizational structure and funding created a
    culture fostering stove-pipe, media-specific,
    heterogeneous systems
  • Lack of information sharing/integration
  • Lack of ability to aggregate information across
    systems
  • Inability to retrieve data to answer questions
  • Higher cost due to redundant/incompatible data
    maintenance
  • Lack of technology options to enable integration
    of systems and data

Data Rich, Information Poor
4
OLD Approaches to Semantic Integration and
Interoperability
  • Advantages
  • Ensures interoperability
  • Minimal overhead
  • Disadvantages
  • Not flexible
  • Does not allow data stores for particular use
    cases
  • Unrealistic-especially across different large
    organizations
  • Advantages
  • Flexible
  • Low Overhead
  • Disadvantages
  • Works only where existing bilateral (or
    multilateral) agreements exist
  • Each new node must arrange to be interoperable
    with all other nodes or node cluster
  • Option 1 Thou Shalt
  • Everyone adopts a single data model for a
    particular domain
  • Genbank, PDB, HL7 are examples of these sorts of
    models
  • Option 2 Multi-party Agreements
  • Several sites agree on a format for interchanging
    data
  • Sites maintain a local data dictionary, XML
    schema, etc. to describe information model

Derived from slides by G. Komatsoulis NCICB
5
New Approaches for Semantic Integration and
Interoperability
  • Option 3 Standards based metadata descriptors
  • Common Data Elements
  • Common terms concepts
  • Provide a complete description of all attributes
    in a systematic, uniform and unambiguous format
  • Description must be based on a common (but
    expandable) vocabulary.
  • Rely on concept codes, not concept names
  • Track quality and accessibility
  • Advantages
  • Provides more ways to surface semantic matches
    words and immutable codes
  • Allows new systems to find points of
    interoperability with all other data systems at
    once
  • Machine understandable
  • Stable immutable identifiers
  • Low barriers to entry
  • Disadvantages
  • Requires a very complete description of the
    contents of an attribute.
  • Some degree of overhead associated with creating
    and maintaining a compatible system
  • based on ISO 11179 Information Technology 2nd
    Edition Metadata Registries (MDR) parts 1-6

6
How Does Registered Metadata Promote Better Data
Management?
  • Provides a model that is consistent/exchangeable
    for capturing data about data
  • Captures unambiguous semantic information in
    one place
  • Documents details on the custodian of the data
  • Links directly to online data
  • Reduces risk/cost of duplicating data
    collections

7
What is a metadata registry good for?
  • Data administration (design time)
  • Databases, DB applications
  • Messaging systems
  • Terminologies, Taxonomies, Ontologies
  • Data Integration (design run time)
  • federated queries, data warehousing
  • Discovery of hidden relationships between data
  • Support for interactive users (run time)
  • Data entry forms, output explanation
  • Navigation of databases
  • Semantic Web Services (run time)

8
ISO/IEC 11179 MDR Standard
  • Used to record and link
  • Data elements
  • Data element concepts
  • Conceptual Domains
  • Value Domains e.g, enumerated value domains
  • Classification Schemes
  • ..
  • Goals
  • To record the unambiguous meaning of data
    elements
  • Human Understandable Current paradigm is natural
    language definitions
  • Machine Understandable Formal definitions (and
    axioms) coming in Edition 3 (?)

9
ISO/IEC 11179 Metadata Registry Standard
  • Spans both
  • Conceptual models of the real world
  • Concepts, data element concepts, classification
    schemes
  • Terminologies, taxonomies, ontologies
  • Information Artifacts
  • Data elements, enumerated values, ...
  • UML models (e.g., in caDSR)

10
Conceptual vs. Information CentricMetadata
Standards
Information Artifacts Metadata
OMG Standards MOF, CWM, UML
Ontology Standards OWL, KIF, CL, ...
Connections ???
Terminology Standards
Conceptual Level
11
Space of Metadata Standards
ISO/IEC 11179 spans both conceptual models and
information artifacts.
About information artifacts data elements,
schemas, UML models, ...
Ontology Standards OWL, KIF, CL, XTM, ....
ISO/IEC 11179 Edition 3 Metadata Registry
Standard
OMG Standards MOF, UML, CWM
Terminology Standards
Conceptual models of the real world
12
Example Users of ISO/IEC 11179 Metadata Registries
  • U.S. EPA Environmental Data Registry (EDR)
  • System of Registries need to register lots of
    things
  • National Cancer Institute (NCI) Cancer Data
    Standards Registry (caDSR)
  • Registration of data elements and domain object
    models
  • U.S. Veterans Health Administration
  • U.S. Census Bureau
  • U.S. Bureau of Labor Statistics
  • Data Element Concepts, Value Domains
  • Statistics Canada
  • Australian Health Administration
  • Data Elements
  • European Environment Agency

13
Evolution of ISO/IEC 11179
  • Edition 2 used to Register Data Elements
  • Classification Schemes, Data Element Concepts,
    Value domains, Object Class, Property, etc
  • Data element Data element concept Value
    Domain (representation)
  • Representation data type or code set
    (enumerated list)
  • Concepts as optional
  • Supported Object Class
  • Not widely utilized

14
Challenges of ISO/IEC 11179 Metadata Registries
  • Need to easily retrieve semantically related
    items
  • Need to Support Discovery
  • Enable navigation within and among taxonomies
  • Even when producer and consumer do not share
    common taxonomy
  • Current Classification Scheme Administered Item
    not sufficient for registration of taxonomies,
    ontologies, etc.
  • Need for consistency and richer metamodel
  • Complex Relationships between items

15
Many taxonomies
  • Purpose of registering taxonomies and ontologies
    along with data elements, values, etc. in one MDR
  • Provide visibility to manage and consume Reuse
  • Harmonization
  • Support Discovery
  • Enable navigation within and among
  • Even when producer and consumer do not share
    common taxonomy
  • Ensure availability

16
Example XML Management Challenge XML One
Language, Many Vocabularies
ltlatitude_degreesgt30Nlt/latitude_degreesgt
ltlatitude unitsdegrees hemispherenorthgt30lt/l
atitudegt
ltlatgt lthemispheregtNlt/hemispheregt
ltdeggt30lt/deggt lt/latgt
  • These 3 XML fragments are
  • Equally valid ways to express the same data in
    XML
  • Well-formed per the W3C XML Specification
  • Mediation required for interoperability

Courtesy of ghayes_at_mitre.org
17
Example of Semantic Integration for
Interoperability
C1708 Drug/Agent
nSCNumber
Agent
Drug
name
name
id
id

nSCNumber
NDCCode
C1708C41243
NDCCode
CTEPName
approvalDate
approvalDate
FDAIndID
approver
approver
IUPACName
fdaCode
C1708C41243
CTEPName
FDAIndID
Names are inadequate for performing cross Object
joins opaque/immutable concept identifiers
IUPACName
Example provided by Dr. George Komatsoulis,
NCICB
18
Concept Use and Integrationwith 11179 Part 3,
Edition 2
Conceptual Domain Agent
Object Class Chemopreventive Agent
Valid Values Cyclooxygenase Inhibitor Doxercalcife
rol Eflornithine Ursodiol
Data Element Concept Chemopreventive Agent NSC
Number
Value Domain NSC Code
Classification Schemes caDSRTraining
Property NSCNumber
Representation Code
Data Element Chemopreventive Agent Name
Denise what are 2 EVS squares at bottom? Does
EVS top-right of center apply to both?
Context caCORE
19
Semantic FrameworkNCI Example
Develop Standards
caCORE SDK caFramework All data elements and
objects Registered in XMDR
Object oriented Applications
caDSR 11179 MDR Linked to KOS Concepts
Data Standards
XMDR
NCI Thesaurus NCI Metathesaurus LOINC MGED ...
Concept Systems
20
Where have we been? Where are we now? where are
we planning to go?
System manuals
Semantic grids
Data dictionaries
Semantics services (SSOA)
11179 E1
XMDR Project
11179 E2
XML related standards
11179 E3
Terminologies, ontologies, etc.
Complex semantics management
Data engineering/XML Data
Semantics management for data
Data Standards/Data Administration
21
What is XMDR?eXtended MetaData Registries
  • A set of collaborative initiatives by groups with
    shared goals
  • extend the ISO/IEC 11179 metadata registry
    standard (XMDR-s)
  • EPA, NCI, DOD, LBNL, Mayo Clinic, USGS, Ecoterm,
    UNEP, GBIF
  • align harmonize various related metadata
    standards (XMDR-h)
  • ISO WG2 11179, 19763, 20944, 24707 OMG ODM,
    CWM
  • Say which is which
  • (several of the above groups have members on
    these committees)
  • An open source implementation testbed (XMDR-it)
    to
  • assemble test metadata from diverse sources
    structures
  • e.g., terminologies, ontologies, etc. for health,
    environment, geography,
  • explore emerging semantic technologies (e.g.,
    RDF, OWL, CL, )
  • demonstrate new capabilities
  • e.g., ontology lifecycle management
    harmonization

22
Why do we need metadata registry extensions?
in order to
  • Enhance capabilities to capture and retrieve
    semantics of information artifacts (e.g.,
    data elements and value domains) in metadata
    registries using terminologies, taxonomies,
    ontologies, etc.
  • Improve representation of relationships between
    data (e.g., objects, data elements domains) and
    concept structures (ontologies, taxonomies,
    thesauri, terminologies, )
  • Register complex semantic metadata (concept
    structures, terminologies) in more formal,
    systematic ways (e.g., description logic) to
    facilitate machine processing for
  • creating and managing names, definitions, terms,
    etc.
  • linking together data elements, etc. across
    multiple systems
  • discovering relationships among data elements
    terms

23
XMDR Semantic Extensions Goals
  • Sharable data that can easily be identified and
    aggregated across organizations
  • Unambiguous metadata characteristics to convey
    semantic, syntactic and lexical meaning
  • Human AND Machine understandable
  • Registration and management of everything useful
    for administering and managing data, including
    concept systems, ontologies, etc.
  • Machine understanding of semantics to facilitate
    inference, aggregation, and agent services

24
Goals of the open source XMDR-it prototype
implementation testbed
  • Demonstrate feasibility utility of proposed
    revisions to ISO/IEC 11179
  • Provide open-source reference implementation with
    XMDR capabilities
  • Determine the necessary features to leverage
    semantic interoperability between concept
    systems and data elements
  • e.g., for ontology lifecycle management
    harmonization
  • Explore benefits of representing XMDR content
    using emerging semantic technologies (e.g., RDF,
    OWL, CL, )
  • integrate open source tools to create, maintain,
    deploy XMDR standards
  • test capabilities and performance of candidate
    tools
  • Assemble semantic metadata with different
    structures from diverse sources to test various
    semantic technologies
  • terminologies, thesauri, ontologies,
  • From health, environment, geography,
  • Help resolve registration harmonization issues
    for different metadata standards, including ODM
    MMF

25
Role of terminologies and ontologies in metadata
registries
  • Sources for concepts, concept definitions, object
    classes, properties, value meanings, external
    references
  • Terminologies as classification schemes (e.g.,
    taxonomies)
  • Ontologies to specify semantic relationships
  • is-a, part-of, instance-of,
  • inheritance permits more compact definitions
  • semantic pathways for indexing
  • facilitates searching subclasses inverses
  • Frameworks for integration of multiple schemas
  • Help connect metadata entities via shared terms
  • via automatic indexing of metadata words
  • via text values from specific metadata elements

26
What the XMDR Project IS NOT!
  • An attempt to turn 11179 metadata registries into
    a development and maintenance facility for every
    type of concept structure
  • An attempt to standardize the complete range of
    terminology and ontology data services
  • Production implementation for one organization
  • any other things we want to disavow?

27
XMDR-it example content has been loadedfrom
diverse sources via lexgrid XSLT
Concept System A
XSLT script
Harold Solbrig (Mayo Clinic)
A Concepts
Original Source A
Lexgrid Source A
A Relationships
28
Additional Metadata Content will be added to
XMDR Prototype
  • EDR (EPA Environmental Data Registry)
  • caDSR (NCI Cancer Data Standards Registry)
  • IETF RFC 3066 Language Codes
  • NBII Biocomplexity Thesaurus
  • USGS Geographic Names Information System
  • Getty Thesaurus of Geographic Names
  • I.T.I.S. - Integrated Taxonomic Information
    System
  • Adult Mouse Anatomy
  • Foundational Model of Anatomy
  • NASA SWEET (Semantic Web Earth Environmental
    Terminologies)
  • EPA Chemical Substance Registry
  • GO (Gene Ontology), .Agrovoc

29
XMDR-it now contains an xml file for each 11179
item
Administered
1 7 499 274 1 2 6 0
  • Context for Administered Items e.g., XMDR?
  • Concept Systems e.g., GEMET, DTIC
  • Data Elements e.g., Country Name
  • Data Element Concepts e.g., Country Label
  • Conceptual Domains e.g., Countries of the World
  • Representation Classes e.g., Code
  • Value Domains e.g., countries of the world
  • Relationship Types e.g., ??

Other
30
Each metadata entity (object, concept, data
element) is
  • Logically stored as a separate XML
    file/document
  • Stored in Subversion code management system
  • provides a versioning capability
  • stores files in Berkeley DB
  • Berkeley DB provides transactions, backups, ...
  • Compliant with three complementary standards
  • An XML Schema (document constraints)
  • An RDF Schema (graph constraints)
  • OWL ontology

31
XMDR-it XML schema provides a number of important
benefits
  • Schema specifies what is required as well as what
    is legal
  • Divides metadata into files conforming to XML
    schema
  • Normalizes data (ala relational one fact in one
    place)
  • Facilitates XSLT transformations by reducing
    degrees of freedom to a canonical encoding within
    the RDF standard
  • Relax NG used to create and check XMDR-it schema
  • RNG validator enforces many OWL ontology
    constraints
  • TRang automatically translates into XML schema
    syntax

32
RDF provides complementary benefits on top of XML
  • All the advantages of XML plus
  • RDF provides more explicit semantics than XML
  • Users can employ a growing set of RDF tools
  • e.g., SPARQL query language, SWRL rule language,
    Jena inference
  • More powerful retrieval capabilities
  • Using many different RDF graph query tools
  • RDFs graph data model supports inference
  • e.g., inclusion of subsumed sub-classes
  • Results can be either
  • tuples (ala relational tables)
  • XML/RDF graphs (being developed for W3Cs SPARQL)
  • Facilitates integrated use and management of
    multiple related concepts spanning different
    concept systems

33
OWL ontology specification adds richer semantics
atop RDF XML
  • All the advantages of XML RDF plus
  • RNG validator enforces many OWL ontology
    constraints
  • Classes and subclasses (is-a relationships)
  • Union classes
  • Inverses
  • Same-as, same-property-as, same-class-as
  • Restriction classes (restrict range, cardinality,
    etc. of property based on type of subject)
  • and tools for creation, editing, visualization,
    and management (Protégé plug-ins)

34
OWL, RDF XML Schema used to specify XMDR-it as
UML for 11179-X metamodel
OWL XMDR Ontology annotations
Types Cardinalities
XMDR XML Schema
TRang
XMDRs Relax NG Schema
Triples binary labeled relationships
RDF Spec
XML Schema Language spec
XML Objects
What things go in own files? Which property
direction stored? Sequential ordering of
properties
35
XMDR-it Architecture Initial Implemented Modules
External Interface
RegistryStore
Registry
Java
WritableRegistryStore
Subversion
Authentication Service (defer)
RetrievalIndex
MetadataValidator (defer?) schema-driven syntax
checker
Jena, Xerces
LogicBasedIndex
FullTextIndex
Jena, OWI KS Kowari,Racer
Lucene
MappingEngine (defer)
Ontology Editor
11179 OWL Ontology
Protege
Composition (tight ownership)
Generalization
Aggregation (loose ownership)
36
XMDR-it Advanced Search Interfacehelps explore
registry contents
http//erdos.lbl.gov/xmdr2/
Search for "any(country (code name))"
More Resultsgtgt
XMDR Web Interface 0.4, LBNL
37
Technical Challenges and Issues for XMDR
Implementation Testbed
  • Complexity
  • Representation of Relationships
  • XML RDF OWL is a lot
  • Scalability performance
  • Currently includes only 60,000 objects
  • maybe indexing and/or distributed registries will
    help?
  • RDF Issues
  • RDF queries yield tuples, not RDF objects (but
    W3C at work)
  • RDF tools wont create XMDR files (add wrapper
    constraints?)
  • User-friendly interface for RDF queries (later)
  • External data sources, ontologies, terminologies
  • Harmonization with ODM and MMF
  • XML/RDF objects results display browsing
  • Something like EDR UI with link labels inverse
    refs

38
XMDR-s 11179 extensions Challenges and Issues
  • Harominze and align XMDR recommendations with
  • ISO 11179 Metadata Registries (MDR)
  • ISO 19763 Framework for Metamodel
    Interoperability (MMF)
  • ISO 24707 Common Logic (CL)
  • ISO 20944 Metadata Interoperability and Bindings
  • OMGs ODM
  • Improve the current Part 3 of ISO 11179 standard
  • Separate registration section of the model so we
    can register "anything"
  • Simplify mechanisms for registering and using
    relationships between administered items, along
    with mechanisms for registering using
    ontologies
  • Improve the "classification" region in Part 3,
    particularly with regard to concepts and
    relationships

39
XMDR RegistryFrom XMDR MMF meeting
40
The Requirements for XMDR (from XMDR MMF
meeting)
Ontology Evolution
11179-3
MOF
MMF
11179-2
Administered Item
Administered Item
Content Management
Metamodels for Basic Ontology Constructs
Registration Metamodel
XMDR Registry
Query Service
ODM Metamodel f or CL
Normative Basic Elements
ODM Metamodel for OWL
Terminology Basic Classes Basic
Relationship
Ontologies
Analysis and Extraction
Registering
41
More Information
  • XMDR Web Site
  • http//xmdr.org
  • ISO/IEC 11179 Web site
  • http//www.metadata-standards.org
  • OMG Web Site
  • http//www.omg.org
  • Annual Open Metadata Forum
  • Kobe, Japan, Spring 2006
  • W3C RDF Access Working Group
  • http//www.w3.org/2001/sw/DataAccess/
  • Bruce Bargmeyer
  • XMDR Principal Investigator
  • Contact concerning open postion
  • Lawrence Berkeley National Laboratory
  • bebargmeyer_at_lbl.gov
  • 510-495-2905
Write a Comment
User Comments (0)
About PowerShow.com