Title: Affinity-based Schema Matching
1Affinity-based Schema Matching
- Silvana Castano
- Università di Milano
D2I Modena, 27 aprile 2001
2Affinity relationships
- Schema-level matching (no instances)
- Find mappings between schema elements that
correspond semantically to each other - Affinity mappings have an associated calculated
degree of similarity - ARTEMIS is the schema matching component of the
MOMIS system which evaluates schema element
(e.g., ODLI3 classes) affinity based on the
following comparison features - class names
- class attributes (name and domain)
- class references (name and domain)
3Affinity Coefficients
- Name Affinity ?0,1
- Names are compared by exploiting knowledge
provided by a reference ontology O (e.g., the
Common Thesaurus) - Ontology O is organized according to given
strengthened terminological relationships (e.g.,
synonymy, hypernymy). - Given two names n and n an affinity function
A(n,n)?0,1 returns the strength of the path
between them in O they have affinity iff
A(n,n) is greater than a
4Name Affinity
Thesaurus Definition
Determining the path with highest strength
School_Member
Computing Affinity coefficient
NA(School_Member, Professor) 0.8 0.8 0.64
5Affinity Coefficients
- Structural Affinity ?0,1
- Domains are compared based on compatibility
relationships (e.g., validated attributes in the
Common Thesaurus) - The coefficient is proportional to the number of
matching attributes and referenced classes - Global Affinity ?0,1
- Comprehensive affinity value, weighted sum of the
two previous coefficients
6Structural Affinity(Interactive validation)
ODLI3 Classes
Semantic Correspondences
Validity Check
SA(School_Member, Professor) 0.25
7Affinity-based Clustering
- Hierachical clustering techniques are employed to
identify all schema elements candidates to
integration based on evaluated affinities. - Clusters of candidates are selected interactively
based on affinity thresholds - Clusters selection occurs in a way that schema
elements inside a cluster are characterized by
high values of affinity with other elements of
the cluster and by lower levels of affinity with
elements outside.
8Affinity Tree Candidate Clusters
Threshold choice (es. 0,6)
Candidate cluster choice
9Ongoing work and open issues
- Application of the affinity/clustering schema
matching techniques to integration of XML
datasources has been started by Milano and first
results have been published IWASS00,Retis01,DS-90
1. Ongoing work is on XML data reconciliation by
providing a further ontological layer on top of
integrated representations. - Affinity and clustering are actually performed
based on intensional inter-schema properties - Extension of affinity and clustering techniques
to consider also extensional inter-schema
properties will be performed in collaboration
with Modena and Reggio-Emilia.