A Survey of Approaches to Automatic Schema Matching - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

A Survey of Approaches to Automatic Schema Matching

Description:

Composite matchers. Independent basic matchers. Flexible execution order. 15. Sample Approaches ... Composite. Manual work/ user input. Application area. Data ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 19
Provided by: dagwoo
Category:

less

Transcript and Presenter's Notes

Title: A Survey of Approaches to Automatic Schema Matching


1
A Survey of Approaches to Automatic Schema
Matching
Erhard Rahm Philip A. Bernstein
The VLDB Journal 10334-350 (2001)
2
The Problem
  • Schema matching
  • Input schemas
  • Output mappings
  • Motivations
  • Manual schema matching
  • Generic and customizable schema matching

3
Application Domains
  • Schema Integration Structures and Terminological
    relationships
  • Data warehouses Source-to-warehouse
    Transformation
  • E-commerce Message Translation
  • Semantic query processing A Run-time Scenario

4
The Match Operator
  • Representations of Input Schemas and Output
    Mapping
  • Schema representation
  • Schema elements
  • Structure
  • Mapping representation
  • Mapping elements
  • Mapping expressions
  • Matching Function
  • Mathematically unsatisfying
  • Heuristics

5
Architecture for Generic Match
Tool 2 (E-business schemas)
Tool 1 (Portal schemas)
Tool 3 (Data warehousing schemas)
Global libraries (dictionaries, schemas, )
Schema import/export
Generic Match Implementation
Internal schema representation
6
Classification of Approaches
  • Individual matchers
  • Instance vs Schema
  • Element vs Structure Matching
  • Language vs Constraint
  • Matching Cardinality (11, 1n, n1, and nm)
  • Auxiliary Information
  • Combinations of multiple matchers

7
Schema-level Approaches
  • Granularity of match (element-level vs.
    structure-level)
  • Match cardinality
  • Linguistic approaches
  • Constraint-based approaches
  • Reusing schema and mapping information

8
Granularity of match
S1 elements S2 elements
Address Street City State Zip CustomerAddress Street City USState PostalCode Full structure match of Address and CustomerAddress
AccountOwner Name Address Birthdate TaxExempt Customer Cname CAddress Cphone Partial structural match of AccountOwner and Customer
9
Match Cardinality
Local match cardinalities S1 element(s) S2 element(s) Matching expression
1. 11, element level Price Amount Amount Price
2. n1, element-level Price, Tax Cost Cost Price (1 Tax/100)
3. 1n, element-level Name FirstName, LastName FirstName, LastName Extract(Name, )
4.  n1, structure-level (nm element-level) B.Title, B.PuNo, P.PuNo, P.Name A.Book, A.Publisher A.Book, A.Publisher select B.Title, P.Name from B, P where B.PuNo P.PuNo
10
Linguistic Approaches
  • Name Matching
  • Equality of names
  • Equality of canonical name representations
  • Equality of synonyms
  • Equality of hypernyms
  • Similarity of names based on common substrings,
    edit distance, pronunciation, and soundex
  • User provided name matches
  • Description Matching
  • Ex. S1 empn //employee name
  • Ex. S2 name //name of employee

11
Constraint-based Approaches
12
Reusing Schema and Mapping Information
13
Instance-level Approaches
  • Linguistic characterization
  • Information retrieval techniques
  • Ex. Extracting keywords and themes
  • Constraint-based characterization
  • Numeric value ranges
  • Numeric value averages
  • Character patterns (PhoneNr, ISBNs,, SSNs)

14
Combining Different Matchers
  • Hybrid matchers
  • Hard-wired combination of multiple matching
    criteria
  • Better performance
  • Composite matchers
  • Independent basic matchers
  • Flexible execution order

15
Sample Approaches
  • SEMINT
  • LSD
  • SKAT
  • TranScm
  • DIKE
  • ARTEMIS
  • CUPID

16
Sample Approaches
  • SEMINT
  • LSD
  • SKAT
  • TranScm
  • DIKE
  • ARTEMIS
  • CUPID

17
 
 
18
Conclusion
  • Propose a taxonomy that covers many of the
    existing approaches
  • Suggest quantitative work on the relative
    performance and accuracy of different approaches
Write a Comment
User Comments (0)
About PowerShow.com