Title: oMAP: An Implemented Framework for Automatically Aligning OWL Ontologies
1oMAP An Implemented Framework for Automatically
Aligning OWL Ontologies
Raphaël Troncy, Umberto Straccia ISTI-CNR straccia
_at_isti.cnr.it
SWAP, December, 2005
2Outline
- Motivations
- oMAP
- A formal framework
- The different classifiers used
- Evaluation
- Conclusion
3Motivations
- Heterogeneity of information systems
- Ontologies as a solution to data heterogeneity on
the Web - Ontologies are themselves heterogeneous
- knowledge representation language
- degree of formalization
- Semantic Web
- More and more OWL/RDF ontologies on the Web
- Need for comparing/reusing/merging ontologies
- partially covering the same domain
- different version of the same ontology
4Motivations (cont.)
- Distributed Information Retrieval
- Resource selection The agent has to select a
subset of some relevant resources - Query reformulation For every selected
resource, the agent has to re-formulate its
information need accordingly - Data fusion rank aggregation The results from
the selected resources have finally to be merged
together.
5Aligning Ontologies
- A matching operator
- Input a set of discrete entities (tables, XML
elements, classes, properties) - Output
- relationship holding between the entities
(subsumption, equivalence, disjointness) - a confidence measure
- Automatic vs manual techniques
- Numerous work from various communities
- schema matching, machine learning, data
integration
6Example
Equivalence Subsumption Disjointness
7oMAP A Formal Framework
- Inspirations
- Formal work in data exchange Fagin et al., 2003
- GLUE combining several specialized components
for finding the best set of mappings Doan et
al., 2003 - Notations
- A mapping is a tuple M (T, S, ?)
- S et T are the source and target ontologies
- Si is an OWL entity (class, datatype property,
object property) of the ontology - ? is a set of mapping rules aij Tj ? Si
8oMAP Overall Strategy
- A three step process
- Form possible ? sets and estimate its quality
based on the quality measures for its mapping
rules - For each mapping rule Tj ? Si, estimate its
confidence aij which also depends on the ? it
belongs to - Use heuristics to build iteratively the final set
of mappings
9oMAP Combining Classifiers
- Weight of a mapping rule
- aij w (Si,Tj, ?)
- Using different classifiers
- w (Si,Tj,CLk) is the classifier's approximation
of the rule Tj ? Si - Combining the approximations
- Use of a priority list CL1 CL2 CLn
10Terminological Classifiers
- Same entity names (or URI)
- Same entity name stems
11Terminological Classifiers
- String distance name
- WordNet distance name
- lcs is the longest common substring between Si
and Tj - sim
12Machine Learning-Based Classifiers
- Collecting individuals
- label for the named individuals
- data value for the datatype properties
- type for the anonymous individuals and the range
of object properties - Recursion on the OWL definition
- depth parameter
13Machine Learning-Based Classifiers
- Example
- Individual (x1 type (Conference)
- value (label "Int. Conf. on WISE") value
(location x2) ) - Individual (x2 type (Address)
- value (city "New York city") value (country
"USA") ) - u1 ("Int. Conf. on WISE", "Address")
- u2 ("Address", "New York City", "USA")
- Naïve Bayes text classifier
14Structural and Semantics-Based Classifier
- If Si and Tj are property names
- If Si and Tj are concept names1
1 Where D D(Si) D(Tj) D(Si) represents the
set of concepts directly parent of Si
15Structural and Semantics-Based Classifier
- Let CS(QR.C) and DT(QR.D), then1
- Let CS(op C1Cm) and DT(op D1Dm), then2
1 Where Q,Q are quantifiers, R,R are property
names and C,D concept expressions 2 Where op, op
are concept constructors and n,m 1
16Structural and Semantics-Based Classifier
- Possible values for wop and wQ weights
- wop wQ
17Evaluation
- More and more techniques / tools for aligning
ontologies - difficult to compare all the approaches
theoretically - pragmatism evaluation campaign and contest
- I3CON based on the NIST Text Retrieval
Conference model - EON systematic benchmark tests on all OWL
constructs - OAEI http//oaei.inrialpes.fr
- Alignment API Euzenat, ISWC 2004
- common format for representing / exchanging the
alignments found - tools and metrics for evaluating these alignments
18- 3 series of tests on bibliographic ontologies
- simple tests identity, specialization/generalizat
ion of the language - systematic tests some features of the initial
ontology are progressively discarded - complex tests aligning 4 real ontologies
available on the Web - The directory real world case consists of
aligning web sites directory using the large
dataset
19(No Transcript)
20Conclusion
- oMAP a formal framework for aligning
automatically OWL ontologies - Combining several specific classifiers
- terminological classifiers
- machine learning-based classifiers
- structural and semantics-based classifier
- Empirical evaluation on benchmark tests
- using traditional information retrieval metrics
- machine resources, memory, computation time not
yet considered
21Future Work
- Alignment
- Using additional classifiers
- kNN, KL-distance, WordNet or other terminological
resources - straightforward theoretically but practically
difficult - Finding complex alignment
- name firstName lastName
- Distributed Information Retrieval
- Automated relevant resource selection
22Useful Links
- oMAP http//homepages.cwi.nl/troncy/oMAP/
- Tutorial Schema and Ontology Matching _at_ ESWC
http//dit.unitn.it/accord/Presentations/ESWC'05-
MatchingHandOuts.pdf - Alignment API http//co4.inrialpes.fr/align/align
.html - OAEI http//oaei.inrialpes.fr/
- State of the Art
- P. Shvaiko and J. Euzenat A Survey of
Shema-based Matching Approaches. Journal on Data
Semantics (JoDS), 2005 - KW Consortium State of the Art on Ontology
Alignment. Knowledge Web D2.2.3, 2004