Title: A Survey of Approaches to Automatic Schema Matching
1A Survey of Approaches to Automatic Schema
Matching
Erhard Rahm Philip A. Bernstein
The VLDB Journal 10334-350 (2001)
2The Problem
- Schema matching
- Input schemas
- Output mappings
- Motivations
- Manual schema matching
- Generic and customizable schema matching
3Application Domains
- Schema Integration Structures and Terminological
relationships - Data warehouses Source-to-warehouse
Transformation - E-commerce Message Translation
- Semantic query processing A Run-time Scenario
4The Match Operator
- Representations of Input Schemas and Output
Mapping - Schema representation
- Schema elements
- Structure
- Mapping representation
- Mapping elements
- Mapping expressions
- Matching Function
- Mathematically unsatisfying
- Heuristics
5Architecture for Generic Match
Tool 2 (E-business schemas)
Tool 1 (Portal schemas)
Tool 3 (Data warehousing schemas)
Global libraries (dictionaries, schemas, )
Schema import/export
Generic Match Implementation
Internal schema representation
6Classification of Approaches
- Individual matchers
- Instance vs Schema
- Element vs Structure Matching
- Language vs Constraint
- Matching Cardinality (11, 1n, n1, and nm)
- Auxiliary Information
- Combinations of multiple matchers
7Schema-level Approaches
- Granularity of match (element-level vs.
structure-level) - Match cardinality
- Linguistic approaches
- Constraint-based approaches
- Reusing schema and mapping information
8Granularity of match
9Match Cardinality
10Linguistic Approaches
- Name Matching
- Equality of names
- Equality of canonical name representations
- Equality of synonyms
- Equality of hypernyms
- Similarity of names based on common substrings,
edit distance, pronunciation, and soundex - User provided name matches
- Description Matching
- Ex. S1 empn //employee name
- Ex. S2 name //name of employee
11Constraint-based Approaches
12Reusing Schema and Mapping Information
13Instance-level Approaches
- Linguistic characterization
- Information retrieval techniques
- Ex. Extracting keywords and themes
- Constraint-based characterization
- Numeric value ranges
- Numeric value averages
- Character patterns (PhoneNr, ISBNs,, SSNs)
14Combining Different Matchers
- Hybrid matchers
- Hard-wired combination of multiple matching
criteria - Better performance
- Composite matchers
- Independent basic matchers
- Flexible execution order
15Sample Approaches
- SEMINT
- LSD
- SKAT
- TranScm
- DIKE
- ARTEMIS
- CUPID
16Sample Approaches
- SEMINT
- LSD
- SKAT
- TranScm
- DIKE
- ARTEMIS
- CUPID
17Â
Â
18Conclusion
- Propose a taxonomy that covers many of the
existing approaches - Suggest quantitative work on the relative
performance and accuracy of different approaches