Title: eXtending MetaData Registries: XMDR Project
1eXtending MetaData RegistriesXMDR Project
Prototype
- XMDR Working Group
- Presentation to SC 32/WG 2 meeting
- And to SC 32/WG 2 (11179 P23 (E3), MMF, and
OMG/ODM Liaison Meeting - September, 2005
- Toronto, Canada
2XMDR Outline
- MDRs Purpose and Goals (slides 3-8)
- Differentiate - position with other ISO non ISO
standards (slides 9 11) - Present XMDR Collaboration Project Prototype
Purposes Goals (slides 12 33) - Describe metadata/technical platform/architecture
for XMDR (slide 28-33) - Demonstrate XMDR-it (live demo? Screen snaps?)
- Explain importance of XMDR-it three levels of
constraints - XML, RDF, OWL (slide 34-38)
- Outline current challenges and future plans
(slide 40-41) - Contacts/URLs/Credits (slide 42-43)
3In the Beginning
- Organizational structure and funding created a
culture fostering stove-pipe, media-specific,
heterogeneous systems - Lack of information sharing/integration
- Lack of ability to aggregate information across
systems - Inability to retrieve data to answer questions
- Higher cost due to redundant/incompatible data
maintenance - Lack of technology options to enable integration
of systems and data
Data Rich, Information Poor
4OLD Approaches to Semantic Integration and
Interoperability
- Advantages
- Ensures interoperability
- Minimal overhead
- Disadvantages
- Not flexible
- Does not allow data stores for particular use
cases - Unrealistic-especially across different large
organizations - Advantages
- Flexible
- Low Overhead
- Disadvantages
- Works only where existing bilateral (or
multilateral) agreements exist - Each new node must arrange to be interoperable
with all other nodes or node cluster
- Option 1 Thou Shalt
- Everyone adopts a single data model for a
particular domain - Genbank, PDB, HL7 are examples of these sorts of
models - Option 2 Multi-party Agreements
- Several sites agree on a format for interchanging
data - Sites maintain a local data dictionary, XML
schema, etc. to describe information model
Derived from slides by G. Komatsoulis NCICB
5New Approaches for Semantic Integration and
Interoperability
- Option 3 Standards based metadata descriptors
- Common Data Elements
- Common terms concepts
- Provide a complete description of all attributes
in a systematic, uniform and unambiguous format - Description must be based on a common (but
expandable) vocabulary. - Rely on concept codes, not concept names
- Track quality and accessibility
- Advantages
- Provides more ways to surface semantic matches
words and immutable codes - Allows new systems to find points of
interoperability with all other data systems at
once - Machine understandable
- Stable immutable identifiers
- Low barriers to entry
- Disadvantages
- Requires a very complete description of the
contents of an attribute. - Some degree of overhead associated with creating
and maintaining a compatible system
- based on ISO 11179 Information Technology 2nd
Edition Metadata Registries (MDR) parts 1-6
6How Does Registered Metadata Promote Better Data
Management?
- Provides a model that is consistent/exchangeable
for capturing data about data - Captures unambiguous semantic information in
one place - Documents details on the custodian of the data
- Links directly to online data
- Reduces risk/cost of duplicating data
collections
7What is a metadata registry good for?
- Data administration (design time)
- Databases, DB applications
- Messaging systems
- Terminologies, Taxonomies, Ontologies
- Data Integration (design run time)
- federated queries, data warehousing
- Discovery of hidden relationships between data
- Support for interactive users (run time)
- Data entry forms, output explanation
- Navigation of databases
- Semantic Web Services (run time)
8ISO/IEC 11179 MDR Standard
- Used to record and link
- Data elements
- Data element concepts
- Conceptual Domains
- Value Domains e.g, enumerated value domains
- Classification Schemes
- ..
- Goals
- To record the unambiguous meaning of data
elements - Human Understandable Current paradigm is natural
language definitions - Machine Understandable Formal definitions (and
axioms) coming in Edition 3 (?)
9ISO/IEC 11179 Metadata Registry Standard
- Spans both
- Conceptual models of the real world
- Concepts, data element concepts, classification
schemes - Terminologies, taxonomies, ontologies
- Information Artifacts
- Data elements, enumerated values, ...
- UML models (e.g., in caDSR)
10Conceptual vs. Information CentricMetadata
Standards
Information Artifacts Metadata
OMG Standards MOF, CWM, UML
Ontology Standards OWL, KIF, CL, ...
Connections ???
Terminology Standards
Conceptual Level
11Space of Metadata Standards
ISO/IEC 11179 spans both conceptual models and
information artifacts.
About information artifacts data elements,
schemas, UML models, ...
Ontology Standards OWL, KIF, CL, XTM, ....
ISO/IEC 11179 Edition 3 Metadata Registry
Standard
OMG Standards MOF, UML, CWM
Terminology Standards
Conceptual models of the real world
12Example Users of ISO/IEC 11179 Metadata Registries
- U.S. EPA Environmental Data Registry (EDR)
- System of Registries need to register lots of
things - National Cancer Institute (NCI) Cancer Data
Standards Registry (caDSR) - Registration of data elements and domain object
models - U.S. Veterans Health Administration
- U.S. Census Bureau
- U.S. Bureau of Labor Statistics
- Data Element Concepts, Value Domains
- Statistics Canada
- Australian Health Administration
- Data Elements
- European Environment Agency
13Evolution of ISO/IEC 11179
- Edition 2 used to Register Data Elements
- Classification Schemes, Data Element Concepts,
Value domains, Object Class, Property, etc - Data element Data element concept Value
Domain (representation) - Representation data type or code set
(enumerated list) - Concepts as optional
- Supported Object Class
- Not widely utilized
14Challenges of ISO/IEC 11179 Metadata Registries
- Need to easily retrieve semantically related
items - Need to Support Discovery
- Enable navigation within and among taxonomies
- Even when producer and consumer do not share
common taxonomy - Current Classification Scheme Administered Item
not sufficient for registration of taxonomies,
ontologies, etc. - Need for consistency and richer metamodel
- Complex Relationships between items
15Many taxonomies
- Purpose of registering taxonomies and ontologies
along with data elements, values, etc. in one MDR - Provide visibility to manage and consume Reuse
- Harmonization
- Support Discovery
- Enable navigation within and among
- Even when producer and consumer do not share
common taxonomy - Ensure availability
16Example XML Management Challenge XML One
Language, Many Vocabularies
ltlatitude_degreesgt30Nlt/latitude_degreesgt
ltlatitude unitsdegrees hemispherenorthgt30lt/l
atitudegt
ltlatgt lthemispheregtNlt/hemispheregt
ltdeggt30lt/deggt lt/latgt
- These 3 XML fragments are
- Equally valid ways to express the same data in
XML - Well-formed per the W3C XML Specification
- Mediation required for interoperability
Courtesy of ghayes_at_mitre.org
17Example of Semantic Integration for
Interoperability
C1708 Drug/Agent
nSCNumber
Agent
Drug
name
name
id
id
nSCNumber
NDCCode
C1708C41243
NDCCode
CTEPName
approvalDate
approvalDate
FDAIndID
approver
approver
IUPACName
fdaCode
C1708C41243
CTEPName
FDAIndID
Names are inadequate for performing cross Object
joins opaque/immutable concept identifiers
IUPACName
Example provided by Dr. George Komatsoulis,
NCICB
18Concept Use and Integrationwith 11179 Part 3,
Edition 2
Conceptual Domain Agent
Object Class Chemopreventive Agent
Valid Values Cyclooxygenase Inhibitor Doxercalcife
rol Eflornithine Ursodiol
Data Element Concept Chemopreventive Agent NSC
Number
Value Domain NSC Code
Classification Schemes caDSRTraining
Property NSCNumber
Representation Code
Data Element Chemopreventive Agent Name
Denise what are 2 EVS squares at bottom? Does
EVS top-right of center apply to both?
Context caCORE
19Semantic FrameworkNCI Example
Develop Standards
caCORE SDK caFramework All data elements and
objects Registered in XMDR
Object oriented Applications
caDSR 11179 MDR Linked to KOS Concepts
Data Standards
XMDR
NCI Thesaurus NCI Metathesaurus LOINC MGED ...
Concept Systems
20Where have we been? Where are we now? where are
we planning to go?
System manuals
Semantic grids
Data dictionaries
Semantics services (SSOA)
11179 E1
XMDR Project
11179 E2
XML related standards
11179 E3
Terminologies, ontologies, etc.
Complex semantics management
Data engineering/XML Data
Semantics management for data
Data Standards/Data Administration
21What is XMDR?eXtended MetaData Registries
- A set of collaborative initiatives by groups with
shared goals - extend the ISO/IEC 11179 metadata registry
standard (XMDR-s) - EPA, NCI, DOD, LBNL, Mayo Clinic, USGS, Ecoterm,
UNEP, GBIF - align harmonize various related metadata
standards (XMDR-h) - ISO WG2 11179, 19763, 20944, 24707 OMG ODM,
CWM - Say which is which
- (several of the above groups have members on
these committees) - An open source implementation testbed (XMDR-it)
to - assemble test metadata from diverse sources
structures - e.g., terminologies, ontologies, etc. for health,
environment, geography, - explore emerging semantic technologies (e.g.,
RDF, OWL, CL, ) - demonstrate new capabilities
- e.g., ontology lifecycle management
harmonization
22Why do we need metadata registry extensions?
in order to
- Enhance capabilities to capture and retrieve
semantics of information artifacts (e.g.,
data elements and value domains) in metadata
registries using terminologies, taxonomies,
ontologies, etc. - Improve representation of relationships between
data (e.g., objects, data elements domains) and
concept structures (ontologies, taxonomies,
thesauri, terminologies, ) - Register complex semantic metadata (concept
structures, terminologies) in more formal,
systematic ways (e.g., description logic) to
facilitate machine processing for - creating and managing names, definitions, terms,
etc. - linking together data elements, etc. across
multiple systems - discovering relationships among data elements
terms
23XMDR Semantic Extensions Goals
- Sharable data that can easily be identified and
aggregated across organizations - Unambiguous metadata characteristics to convey
semantic, syntactic and lexical meaning - Human AND Machine understandable
- Registration and management of everything useful
for administering and managing data, including
concept systems, ontologies, etc. - Machine understanding of semantics to facilitate
inference, aggregation, and agent services
24Goals of the open source XMDR-it prototype
implementation testbed
- Demonstrate feasibility utility of proposed
revisions to ISO/IEC 11179 - Provide open-source reference implementation with
XMDR capabilities - Determine the necessary features to leverage
semantic interoperability between concept
systems and data elements - e.g., for ontology lifecycle management
harmonization - Explore benefits of representing XMDR content
using emerging semantic technologies (e.g., RDF,
OWL, CL, ) - integrate open source tools to create, maintain,
deploy XMDR standards - test capabilities and performance of candidate
tools - Assemble semantic metadata with different
structures from diverse sources to test various
semantic technologies - terminologies, thesauri, ontologies,
- From health, environment, geography,
- Help resolve registration harmonization issues
for different metadata standards, including ODM
MMF
25Role of terminologies and ontologies in metadata
registries
- Sources for concepts, concept definitions, object
classes, properties, value meanings, external
references - Terminologies as classification schemes (e.g.,
taxonomies) - Ontologies to specify semantic relationships
- is-a, part-of, instance-of,
- inheritance permits more compact definitions
- semantic pathways for indexing
- facilitates searching subclasses inverses
- Frameworks for integration of multiple schemas
- Help connect metadata entities via shared terms
- via automatic indexing of metadata words
- via text values from specific metadata elements
26What the XMDR Project IS NOT!
- An attempt to turn 11179 metadata registries into
a development and maintenance facility for every
type of concept structure - An attempt to standardize the complete range of
terminology and ontology data services - Production implementation for one organization
- any other things we want to disavow?
27XMDR-it example content has been loadedfrom
diverse sources via lexgrid XSLT
Concept System A
XSLT script
Harold Solbrig (Mayo Clinic)
A Concepts
Original Source A
Lexgrid Source A
A Relationships
28Additional Metadata Content will be added to
XMDR Prototype
- EDR (EPA Environmental Data Registry)
- caDSR (NCI Cancer Data Standards Registry)
- IETF RFC 3066 Language Codes
- NBII Biocomplexity Thesaurus
- USGS Geographic Names Information System
- Getty Thesaurus of Geographic Names
- I.T.I.S. - Integrated Taxonomic Information
System - Adult Mouse Anatomy
- Foundational Model of Anatomy
- NASA SWEET (Semantic Web Earth Environmental
Terminologies) - EPA Chemical Substance Registry
- GO (Gene Ontology), .Agrovoc
29XMDR-it now contains an xml file for each 11179
item
Administered
1 7 499 274 1 2 6 0
- Context for Administered Items e.g., XMDR?
- Concept Systems e.g., GEMET, DTIC
- Data Elements e.g., Country Name
- Data Element Concepts e.g., Country Label
- Conceptual Domains e.g., Countries of the World
- Representation Classes e.g., Code
- Value Domains e.g., countries of the world
- Relationship Types e.g., ??
Other
30Each metadata entity (object, concept, data
element) is
- Logically stored as a separate XML
file/document - Stored in Subversion code management system
- provides a versioning capability
- stores files in Berkeley DB
- Berkeley DB provides transactions, backups, ...
- Compliant with three complementary standards
- An XML Schema (document constraints)
- An RDF Schema (graph constraints)
- OWL ontology
31XMDR-it XML schema provides a number of important
benefits
- Schema specifies what is required as well as what
is legal - Divides metadata into files conforming to XML
schema - Normalizes data (ala relational one fact in one
place) - Facilitates XSLT transformations by reducing
degrees of freedom to a canonical encoding within
the RDF standard - Relax NG used to create and check XMDR-it schema
- RNG validator enforces many OWL ontology
constraints - TRang automatically translates into XML schema
syntax
32RDF provides complementary benefits on top of XML
- All the advantages of XML plus
- RDF provides more explicit semantics than XML
- Users can employ a growing set of RDF tools
- e.g., SPARQL query language, SWRL rule language,
Jena inference - More powerful retrieval capabilities
- Using many different RDF graph query tools
- RDFs graph data model supports inference
- e.g., inclusion of subsumed sub-classes
- Results can be either
- tuples (ala relational tables)
- XML/RDF graphs (being developed for W3Cs SPARQL)
- Facilitates integrated use and management of
multiple related concepts spanning different
concept systems
33OWL ontology specification adds richer semantics
atop RDF XML
- All the advantages of XML RDF plus
- RNG validator enforces many OWL ontology
constraints - Classes and subclasses (is-a relationships)
- Union classes
- Inverses
- Same-as, same-property-as, same-class-as
- Restriction classes (restrict range, cardinality,
etc. of property based on type of subject) - and tools for creation, editing, visualization,
and management (Protégé plug-ins)
34OWL, RDF XML Schema used to specify XMDR-it as
UML for 11179-X metamodel
OWL XMDR Ontology annotations
Types Cardinalities
XMDR XML Schema
TRang
XMDRs Relax NG Schema
Triples binary labeled relationships
RDF Spec
XML Schema Language spec
XML Objects
What things go in own files? Which property
direction stored? Sequential ordering of
properties
35XMDR-it Architecture Initial Implemented Modules
External Interface
RegistryStore
Registry
Java
WritableRegistryStore
Subversion
Authentication Service (defer)
RetrievalIndex
MetadataValidator (defer?) schema-driven syntax
checker
Jena, Xerces
LogicBasedIndex
FullTextIndex
Jena, OWI KS Kowari,Racer
Lucene
MappingEngine (defer)
Ontology Editor
11179 OWL Ontology
Protege
Composition (tight ownership)
Generalization
Aggregation (loose ownership)
36XMDR-it Advanced Search Interfacehelps explore
registry contents
http//erdos.lbl.gov/xmdr2/
Search for "any(country (code name))"
More Resultsgtgt
XMDR Web Interface 0.4, LBNL
37Technical Challenges and Issues for XMDR
Implementation Testbed
- Complexity
- Representation of Relationships
- XML RDF OWL is a lot
- Scalability performance
- Currently includes only 60,000 objects
- maybe indexing and/or distributed registries will
help? - RDF Issues
- RDF queries yield tuples, not RDF objects (but
W3C at work) - RDF tools wont create XMDR files (add wrapper
constraints?) - User-friendly interface for RDF queries (later)
- External data sources, ontologies, terminologies
- Harmonization with ODM and MMF
- XML/RDF objects results display browsing
- Something like EDR UI with link labels inverse
refs
38XMDR-s 11179 extensions Challenges and Issues
- Harominze and align XMDR recommendations with
- ISO 11179 Metadata Registries (MDR)
- ISO 19763 Framework for Metamodel
Interoperability (MMF) - ISO 24707 Common Logic (CL)
- ISO 20944 Metadata Interoperability and Bindings
- OMGs ODM
- Improve the current Part 3 of ISO 11179 standard
- Separate registration section of the model so we
can register "anything" - Simplify mechanisms for registering and using
relationships between administered items, along
with mechanisms for registering using
ontologies - Improve the "classification" region in Part 3,
particularly with regard to concepts and
relationships
39XMDR RegistryFrom XMDR MMF meeting
40The Requirements for XMDR (from XMDR MMF
meeting)
Ontology Evolution
11179-3
MOF
MMF
11179-2
Administered Item
Administered Item
Content Management
Metamodels for Basic Ontology Constructs
Registration Metamodel
XMDR Registry
Query Service
ODM Metamodel f or CL
Normative Basic Elements
ODM Metamodel for OWL
Terminology Basic Classes Basic
Relationship
Ontologies
Analysis and Extraction
Registering
41More Information
- XMDR Web Site
- http//xmdr.org
- ISO/IEC 11179 Web site
- http//www.metadata-standards.org
- OMG Web Site
- http//www.omg.org
- Annual Open Metadata Forum
- Kobe, Japan, Spring 2006
- W3C RDF Access Working Group
- http//www.w3.org/2001/sw/DataAccess/
- Bruce Bargmeyer
- XMDR Principal Investigator
- Contact concerning open postion
- Lawrence Berkeley National Laboratory
- bebargmeyer_at_lbl.gov
- 510-495-2905