Title: Challenges to Developing a Syntax to Capture Metadata Mapping within XMDR
1Challenges to Developing a Syntax to Capture
Metadata Mapping within XMDR
- (or A Social Science Data Expert Meets
Semantics) - Fredric Gey
- UC Berkeley
- XMDR Project Meeting
- May, 2007
- UC Berkeley Faculty Club
2XMDR Mapping Capture (outline)
- In January 2007 proposed strawman syntax for XMDR
to capture mapping between classification systems - Heavily influenced by the SIC NAICS content
loading - Very naïve compared to real-world challenges
- Conceptual framework, example problems
- Drawn principally from ontology mapping research
- Discussion points for future development
3Metadata/Ontology Mapping
- Ontology mapping is now a substantial research
area - Driven by need to integrate disparate ontologies
produced by different organizations or different
units of one organization - Commercial and research tools under development
- Tools generally solve pieces of the mapping
problem - Five workshops on the subject OM-2006 was held
in Georgia Nov 2006, OM-2007 will be in Busan,
Korea this November - Evaluating mapping tools is also being researched
- Can you compare apples and oranges (or rather
apple processing equipment versus orange
processing equipment)? - OM Ontology Alignment Initiative (OM 2004, 2005,
2006) - Elucidates problems and generates evaluation
frameworks - Dimensions of mapping challenge pertinent to XMDR
4Relationship to Schema Mapping
- Schema mapping is closely related to Ontology
mapping - Schema mapping researched in the database
community - Driven by need to integrate different databases
produced by different organizations or different
units of one organization - Essential for Federated databases where database
administration is under local (distributed)
control - Simple example person name
- DB1 Name type text
- DB2 Lastname, Firstname, Middleinitial, Title
(e.g. Dr, Mr, Ms), Suffix (e.g. Jr, III)
5Conceptual Context of Mapping
- Mapping (also known as ontology alignment) can
consist of - Integration of multiple component ontologies into
a unified ontology - Alignment between part of a global ontology to an
external local ontology - Alignment between different parts of global
ontologies - Alignment between special subject areas having
overlapping instances (e.g. medical literature to
computational biology literature) - Mapping tools (Noy Musen 2002) can be
- Partial not global
- Be instance-based or not (for common or
non-overlapping instances) - Be lexical-based (i.e. utilize lexical clues to
infer equivalences) - May only deal with class hierarchies (not
instances) - May only produce articulation rules between
ontologies
6Strawman Syntax for XMDR mapping from an Old
Classification to a New Classification
ltlgRelassociation association"mapsTo"
forwardName"mapsTo" ltreverseName"mappe
dFromgt lttargetCodingScheme"coding_scheme_name"gt
-ltlgRelsourceConcept sourceConcept"source_concep
t_code"gt -ltlgReltargetConcept targetConcept"
target_concept_code "gt ltlgRelassociationQualif
ication associationQualifier"exact
almost_exact approx" /gt
ltlgRelassociationQualification
MappingType"semantic statistical" /gt
ltlgRelassociationQualification
MapsToDegree"fraction" /gt
ltlgRelassociationQualification
MapsFromDegree"fraction" /gt
ltlgRelassociationQualification
MapsToThreshold"percent" /gt lt/lgReltargetConcept
gt lt/lgRelsourceConceptgtlt/lgRelassociationgt
7Strawman Syntax Definitions of Qualifiers
- MappingType
- Semantic meaning of the source concept is
aligned with the meaning of the target concept - Statistical mapping is done using a common
database indexed/classified by both source and
target concept codes - ltassociationQualifier"exact almost_exact
approx" /gt - exact - mapping is 1-1
- Almost_exact - for statistical mapping, mapping
is almost exact within some e (epsilon) percent
difference as a MapsToThreshold - Approx mapping is inexact with overlaps between
concepts - Degree of (statistical) mapping
- ltlgRelassociationQualification
MapsToDegree"fraction1" /gt - ltlgRelassociationQualification
MapsFromDegree"fraction2" /gt - i.e. a fraction2 of the source concept is
represented by fraction1 of the target concept
8Strawman Syntax Additional Qualifiers?
- MappingMode
- Automatic meaning of the source concept is
automatically mapped between source and target
concepts by some software - Manual mapping is done with human editors
- Observation Semantic mapping could be automatic
if done by NLP software which compares the
textual definitions of source and concept codes - ReferenceDatabase - the detailed database which
has been utilized for the statistical mapping
(e.g. 1997 Economic Census company establishment
for SIC and NAICS) - Your qualifier goes here
9SIC to NAICS Matching(example for Mineral
Industries)
47
10NAICS to SIC Matching
- Important for historical data comparison and
development of time series
11NAICS to SIC Matching
- Mappings are
- One to one (rarely)
- Many to one (sometimes)
- Many to many (usually)
- Census Bureau supplies
- Comparable statistics for 1997 Economics Census
- Downloadable code files with bridge
- Open Issue
- Can statistical allocation between code sets be
captured? - I.e. NAICS1 ? SIC1 (.35) ?SIC2(.65)
12What are the Limitations/Assumptions of the
Strawman Proposal?
- It assumes the mapping is instance based
- Good enough for classification schemes like SIC
and NAICS where the instances (reporting of
individual business establishment statistics) are
identical - It only provides for individual
concept-to-concept equivalence - It assumes the structure of the source and target
will provide context of relationships between
(exact or amorphous) groups of concepts - It does not capture the possibility of large
numbers of non-overlapping concepts when the
conceptual overlap between two ontologies is
small - It does not capture structural inconsistency
- It does not capture temporal drift and evolution
- It does not capture that corresponding concepts
may be considerably more detailed (have more
attributes) in a particular ontology
13Ontology Integration
- Multiple component ontologies are integrated into
a single global ontology - Oi n Ok ? (null set), or
- Oi n Ok e
- Does XMDR need to concern itself with this case?
Oglobal
14Ontology Structural Inconsistency (thanks Mala)
- How do we capture that two different ontologies
may have the identical concept at a different
structural positions in their hierarchies/directed
graphs?
O2
O1
C1m
C11
C1m
Computers
Computing Machines / \
/
\ / \
/ \
software hardware
Internet
Standalone / \
/ \
/ \ /
\Internet Standalone Internet Standalone
Software Hardware Software Hardware
15Ontology Concept Subset Mapping
- How do we capture that subsets of concepts
between two different ontologies are equivalent
rather than individual concepts?
O1
O2
C1k C2r
C1m C2s
16Ontology Overlap/Non-overlap
- How do we capture that two different ontologies
each have large numbers of concepts with no
corresponding concept in the other ontology? - 0(O1 n O2) ltlt 0(O1 n O2)
- Is this important?
O2
C1k ? C2r
C1m C2s
17Ontology Granularity
- How do we capture that concepts in one ontology
may have significantly greater detail than in
another ontology - In terms of subconcepts
- In terms of concept attributes/properties
- Omega is light on geographic places compared to
GNS
GNS (Geographic Names Server)
Omega (geography)
18Ontology Granularity (Omega)
- Omega is light (and inconsistent) on geographic
places compared to GNIS
19Ontology Granularity (GNS)
- GeoNames Server search of populated places in
Finland - Tampere, Finland is described by
- Feature type PPL (populated place)
- Latitude/Longitude
- Name variants
- http//en.wikipedia.org/wiki/Tampere
20Discussion Issues on XMDR Mappings Syntaxes
- Beyond individual concept mapping
- Individual concept to set of concepts
- Between sets or clusters of equivalent concepts
- Capturing extent of non-overlap
- Granularity differences between equivalent
source and target concepts - Capturing structure in the mapping syntax
- Dealing with structural mismatches
- Can we borrow from the mathematical literature on
topological mapping between directed graph
structures?