A Survey of Approaches to Automatic Schema Matching - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

A Survey of Approaches to Automatic Schema Matching

Description:

Schema Integration: Structures and Terminological relationships ... Cupid. BYU Approach. Schema Type. Relational, files. XML. SGML, OO. XML, relational. OSM ... – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 19
Provided by: blondi
Category:

less

Transcript and Presenter's Notes

Title: A Survey of Approaches to Automatic Schema Matching


1
A Survey of Approaches to Automatic Schema
Matching
Erhard Rahm Philip A. Bernstein
The VLDB Journal 10334-350 (2001)
2
The Problem
  • Schema matching
  • Input schemas
  • Output mappings
  • Motivations
  • Manual schema matching
  • Generic and customizable schema matching

3
Application Domains
  • Schema Integration Structures and Terminological
    relationships
  • Data warehouses Source-to-warehouse
    Transformation
  • E-commerce Message Translation
  • Semantic query processing A Run-time Scenario

4
The Match Operator
  • Representations of Input Schemas and Output
    Mapping
  • Schema representation
  • Schema elements
  • Structure
  • Mapping representation
  • Mapping elements
  • Mapping expressions
  • Matching Function
  • Mathematically unsatisfying
  • Heuristics

5
Architecture for Generic Match
Tool 2 (E-business schemas)
Tool 1 (Portal schemas)
Tool 3 (Data warehousing schemas)
Global libraries (dictionaries, schemas, )
Schema import/export
Generic Match Implementation
Internal schema representation
6
Classification of Approaches
  • Individual matchers
  • Instance vs Schema
  • Element vs Structure Matching
  • Language vs Constraint
  • Matching Cardinality (11, 1n, n1, and nm)
  • Auxiliary Information
  • Combinations of multiple matchers

7
Schema-level Approaches
  • Granularity of match (element-level vs.
    structure-level)
  • Match cardinality
  • Linguistic approaches
  • Constraint-based approaches
  • Reusing schema and mapping information

8
Granularity of match
9
Match Cardinality
10
Linguistic Approaches
  • Name Matching
  • Equality of names
  • Equality of canonical name representations
  • Equality of synonyms
  • Equality of hypernyms
  • Similarity of names based on common substrings,
    edit distance, pronunciation, and soundex
  • User provided name matches
  • Description Matching
  • Ex. S1 empn //employee name
  • Ex. S2 name //name of employee

11
Constraint-based Approaches
12
Reusing Schema and Mapping Information
13
Instance-level Approaches
  • Linguistic characterization
  • Information retrieval techniques
  • Ex. Extracting keywords and themes
  • Constraint-based characterization
  • Numeric value ranges
  • Numeric value averages
  • Character patterns (PhoneNr, ISBNs,, SSNs)

14
Combining Different Matchers
  • Hybrid matchers
  • Hard-wired combination of multiple matching
    criteria
  • Better performance
  • Composite matchers
  • Independent basic matchers
  • Flexible execution order

15
Sample Approaches
  • SEMINT
  • LSD
  • SKAT
  • TranScm
  • DIKE
  • ARTEMIS
  • CUPID

16
Sample Approaches
  • SEMINT
  • LSD
  • SKAT
  • TranScm
  • DIKE
  • ARTEMIS
  • CUPID

17
 
 
18
Conclusion
  • Propose a taxonomy that covers many of the
    existing approaches
  • Suggest quantitative work on the relative
    performance and accuracy of different approaches
Write a Comment
User Comments (0)
About PowerShow.com