Title: A Survey of Approaches to Automatic Schema Matching
1A Survey of Approaches to Automatic Schema
Matching
Erhard Rahm Philip A. Bernstein
The VLDB Journal 10334-350 (2001)
2The Problem
- Schema matching
- Input schemas
- Output mappings
- Motivations
- Manual schema matching
- Generic and customizable schema matching
3Application Domains
- Schema Integration Structures and Terminological
relationships - Data warehouses Source-to-warehouse
Transformation - E-commerce Message Translation
- Semantic query processing A Run-time Scenario
4The Match Operator
- Representations of Input Schemas and Output
Mapping - Schema representation
- Schema elements
- Structure
- Mapping representation
- Mapping elements
- Mapping expressions
- Matching Function
- Mathematically unsatisfying
- Heuristics
5Architecture for Generic Match
Tool 2 (E-business schemas)
Tool 1 (Portal schemas)
Tool 3 (Data warehousing schemas)
Global libraries (dictionaries, schemas, )
Schema import/export
Generic Match Implementation
Internal schema representation
6Classification of Approaches
- Individual matchers
- Instance vs Schema
- Element vs Structure Matching
- Language vs Constraint
- Matching Cardinality (11, 1n, n1, and nm)
- Auxiliary Information
- Combinations of multiple matchers
7Schema-level Approaches
- Granularity of match (element-level vs.
structure-level) - Match cardinality
- Linguistic approaches
- Constraint-based approaches
- Reusing schema and mapping information
8Granularity of match
S1 elements S2 elements
Address Street City State Zip CustomerAddress Street City USState PostalCode Full structure match of Address and CustomerAddress
AccountOwner Name Address Birthdate TaxExempt Customer Cname CAddress Cphone Partial structural match of AccountOwner and Customer
9Match Cardinality
Local match cardinalities S1 element(s) S2 element(s) Matching expression
1. 11, element level Price Amount Amount Price
2. n1, element-level Price, Tax Cost Cost Price (1 Tax/100)
3. 1n, element-level Name FirstName, LastName FirstName, LastName Extract(Name, )
4. n1, structure-level (nm element-level) B.Title, B.PuNo, P.PuNo, P.Name A.Book, A.Publisher A.Book, A.Publisher select B.Title, P.Name from B, P where B.PuNo P.PuNo
10Linguistic Approaches
- Name Matching
- Equality of names
- Equality of canonical name representations
- Equality of synonyms
- Equality of hypernyms
- Similarity of names based on common substrings,
edit distance, pronunciation, and soundex - User provided name matches
- Description Matching
- Ex. S1 empn //employee name
- Ex. S2 name //name of employee
11Constraint-based Approaches
12Reusing Schema and Mapping Information
13Instance-level Approaches
- Linguistic characterization
- Information retrieval techniques
- Ex. Extracting keywords and themes
- Constraint-based characterization
- Numeric value ranges
- Numeric value averages
- Character patterns (PhoneNr, ISBNs,, SSNs)
14Combining Different Matchers
- Hybrid matchers
- Hard-wired combination of multiple matching
criteria - Better performance
- Composite matchers
- Independent basic matchers
- Flexible execution order
15Sample Approaches
- SEMINT
- LSD
- SKAT
- TranScm
- DIKE
- ARTEMIS
- CUPID
16Sample Approaches
- SEMINT
- LSD
- SKAT
- TranScm
- DIKE
- ARTEMIS
- CUPID
17Â
Â
18Conclusion
- Propose a taxonomy that covers many of the
existing approaches - Suggest quantitative work on the relative
performance and accuracy of different approaches