Challenges to Developing a Syntax to Capture Metadata Mapping within XMDR PowerPoint PPT Presentation

presentation player overlay
1 / 20
About This Presentation
Transcript and Presenter's Notes

Title: Challenges to Developing a Syntax to Capture Metadata Mapping within XMDR


1
Challenges to Developing a Syntax to Capture
Metadata Mapping within XMDR
  • (or A Social Science Data Expert Meets
    Semantics)
  • Fredric Gey
  • UC Berkeley
  • XMDR Project Meeting
  • May, 2007
  • UC Berkeley Faculty Club

2
XMDR Mapping Capture (outline)
  • In January 2007 proposed strawman syntax for XMDR
    to capture mapping between classification systems
  • Heavily influenced by the SIC NAICS content
    loading
  • Very naïve compared to real-world challenges
  • Conceptual framework, example problems
  • Drawn principally from ontology mapping research
  • Discussion points for future development

3
Metadata/Ontology Mapping
  • Ontology mapping is now a substantial research
    area
  • Driven by need to integrate disparate ontologies
    produced by different organizations or different
    units of one organization
  • Commercial and research tools under development
  • Tools generally solve pieces of the mapping
    problem
  • Five workshops on the subject OM-2006 was held
    in Georgia Nov 2006, OM-2007 will be in Busan,
    Korea this November
  • Evaluating mapping tools is also being researched
  • Can you compare apples and oranges (or rather
    apple processing equipment versus orange
    processing equipment)?
  • OM Ontology Alignment Initiative (OM 2004, 2005,
    2006)
  • Elucidates problems and generates evaluation
    frameworks
  • Dimensions of mapping challenge pertinent to XMDR

4
Relationship to Schema Mapping
  • Schema mapping is closely related to Ontology
    mapping
  • Schema mapping researched in the database
    community
  • Driven by need to integrate different databases
    produced by different organizations or different
    units of one organization
  • Essential for Federated databases where database
    administration is under local (distributed)
    control
  • Simple example person name
  • DB1 Name type text
  • DB2 Lastname, Firstname, Middleinitial, Title
    (e.g. Dr, Mr, Ms), Suffix (e.g. Jr, III)

5
Conceptual Context of Mapping
  • Mapping (also known as ontology alignment) can
    consist of
  • Integration of multiple component ontologies into
    a unified ontology
  • Alignment between part of a global ontology to an
    external local ontology
  • Alignment between different parts of global
    ontologies
  • Alignment between special subject areas having
    overlapping instances (e.g. medical literature to
    computational biology literature)
  • Mapping tools (Noy Musen 2002) can be
  • Partial not global
  • Be instance-based or not (for common or
    non-overlapping instances)
  • Be lexical-based (i.e. utilize lexical clues to
    infer equivalences)
  • May only deal with class hierarchies (not
    instances)
  • May only produce articulation rules between
    ontologies

6
Strawman Syntax for XMDR mapping from an Old
Classification to a New Classification
ltlgRelassociation association"mapsTo"
forwardName"mapsTo" ltreverseName"mappe
dFromgt lttargetCodingScheme"coding_scheme_name"gt
-ltlgRelsourceConcept sourceConcept"source_concep
t_code"gt -ltlgReltargetConcept targetConcept"
target_concept_code "gt   ltlgRelassociationQualif
ication associationQualifier"exact
almost_exact approx" /gt
ltlgRelassociationQualification
MappingType"semantic statistical" /gt
ltlgRelassociationQualification
MapsToDegree"fraction" /gt
ltlgRelassociationQualification
MapsFromDegree"fraction" /gt
ltlgRelassociationQualification
MapsToThreshold"percent" /gt lt/lgReltargetConcept
gt  lt/lgRelsourceConceptgtlt/lgRelassociationgt
7
Strawman Syntax Definitions of Qualifiers
  • MappingType
  • Semantic meaning of the source concept is
    aligned with the meaning of the target concept
  • Statistical mapping is done using a common
    database indexed/classified by both source and
    target concept codes
  • ltassociationQualifier"exact almost_exact
    approx" /gt
  • exact - mapping is 1-1
  • Almost_exact - for statistical mapping, mapping
    is almost exact within some e (epsilon) percent
    difference as a MapsToThreshold
  • Approx mapping is inexact with overlaps between
    concepts
  • Degree of (statistical) mapping
  • ltlgRelassociationQualification
    MapsToDegree"fraction1" /gt
  • ltlgRelassociationQualification
    MapsFromDegree"fraction2" /gt
  • i.e. a fraction2 of the source concept is
    represented by fraction1 of the target concept

8
Strawman Syntax Additional Qualifiers?
  • MappingMode
  • Automatic meaning of the source concept is
    automatically mapped between source and target
    concepts by some software
  • Manual mapping is done with human editors
  • Observation Semantic mapping could be automatic
    if done by NLP software which compares the
    textual definitions of source and concept codes
  • ReferenceDatabase - the detailed database which
    has been utilized for the statistical mapping
    (e.g. 1997 Economic Census company establishment
    for SIC and NAICS)
  • Your qualifier goes here

9
SIC to NAICS Matching(example for Mineral
Industries)
47
10
NAICS to SIC Matching
  • Important for historical data comparison and
    development of time series

11
NAICS to SIC Matching
  • Mappings are
  • One to one (rarely)
  • Many to one (sometimes)
  • Many to many (usually)
  • Census Bureau supplies
  • Comparable statistics for 1997 Economics Census
  • Downloadable code files with bridge
  • Open Issue
  • Can statistical allocation between code sets be
    captured?
  • I.e. NAICS1 ? SIC1 (.35) ?SIC2(.65)

12
What are the Limitations/Assumptions of the
Strawman Proposal?
  • It assumes the mapping is instance based
  • Good enough for classification schemes like SIC
    and NAICS where the instances (reporting of
    individual business establishment statistics) are
    identical
  • It only provides for individual
    concept-to-concept equivalence
  • It assumes the structure of the source and target
    will provide context of relationships between
    (exact or amorphous) groups of concepts
  • It does not capture the possibility of large
    numbers of non-overlapping concepts when the
    conceptual overlap between two ontologies is
    small
  • It does not capture structural inconsistency
  • It does not capture temporal drift and evolution
  • It does not capture that corresponding concepts
    may be considerably more detailed (have more
    attributes) in a particular ontology

13
Ontology Integration
  • Multiple component ontologies are integrated into
    a single global ontology
  • Oi n Ok ? (null set), or
  • Oi n Ok e
  • Does XMDR need to concern itself with this case?

Oglobal
14
Ontology Structural Inconsistency (thanks Mala)
  • How do we capture that two different ontologies
    may have the identical concept at a different
    structural positions in their hierarchies/directed
    graphs?

O2
O1
C1m
C11
C1m
  Computers                          
                                   
Computing Machines         /      \ 
                                            
                         /   
\         /         \              
                                   
                              /       \      
software  hardware                        
                     Internet     
Standalone     /  \           
  / \                                     
        / \                   /
\Internet Standalone Internet Standalone         
      Software Hardware Software Hardware
15
Ontology Concept Subset Mapping
  • How do we capture that subsets of concepts
    between two different ontologies are equivalent
    rather than individual concepts?

O1
O2
C1k C2r
C1m C2s
16
Ontology Overlap/Non-overlap
  • How do we capture that two different ontologies
    each have large numbers of concepts with no
    corresponding concept in the other ontology?
  • 0(O1 n O2) ltlt 0(O1 n O2)
  • Is this important?

O2
C1k ? C2r
C1m C2s
17
Ontology Granularity
  • How do we capture that concepts in one ontology
    may have significantly greater detail than in
    another ontology
  • In terms of subconcepts
  • In terms of concept attributes/properties
  • Omega is light on geographic places compared to
    GNS

GNS (Geographic Names Server)
Omega (geography)
18
Ontology Granularity (Omega)
  • Omega is light (and inconsistent) on geographic
    places compared to GNIS

19
Ontology Granularity (GNS)
  • GeoNames Server search of populated places in
    Finland
  • Tampere, Finland is described by
  • Feature type PPL (populated place)
  • Latitude/Longitude
  • Name variants
  • http//en.wikipedia.org/wiki/Tampere

20
Discussion Issues on XMDR Mappings Syntaxes
  • Beyond individual concept mapping
  • Individual concept to set of concepts
  • Between sets or clusters of equivalent concepts
  • Capturing extent of non-overlap
  • Granularity differences between equivalent
    source and target concepts
  • Capturing structure in the mapping syntax
  • Dealing with structural mismatches
  • Can we borrow from the mathematical literature on
    topological mapping between directed graph
    structures?
Write a Comment
User Comments (0)
About PowerShow.com