A survey of approaches to automatic schema matching - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

A survey of approaches to automatic schema matching

Description:

Schema matching is typically done by hand in current implementations ... Homonyms may mislead the matcher. 21. Schema-level matchers. 3. Linguistic approaches ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 36
Provided by: steliospa
Category:

less

Transcript and Presenter's Notes

Title: A survey of approaches to automatic schema matching


1
A survey of approaches to automatic schema
matching
  • Presenter Pantouvakis Stelios

2
Introduction
  • Schema matching is typically done by hand in
    current implementations
  • Drawbacks (time- effort-consuming, error-prone)
  • Need for automatic schema matching at
  • data (schema) integration
  • E-business
  • data warehousing
  • semantic query processing

3
The Match operator
  • Schema set of elements connected with some
    structure
  • Independent of representation (XML, ER-model,
    OO-model, directed graph,)
  • Mapping set of mapping elements (certain
    elements from S1 mapped to certain elements from
    S2) plus a mapping expression for each mapping
    element, (which specifies the relation)

4
The Match operator
  • Mapping expressions may be
  • scalar (,lt)
  • functions (addition, concatenation)
  • ER-style relationships (is-a, part-of)
  • set-oriented relationships (overlaps, contains)
  • Symbol shows mapping elements without
    determining the mapping expression

5
The Match operator
  • The match operation is a function that takes two
    schemas S1 and S2 as input and returns a mapping
    between them (matching result)
  • Implementation is similar to Join in that
    checking for each element of S1 if each element
    in S2 matches and produce an output. But there
    are differences
  • operates on metadata (schema elements)
  • each element of S1 may match with multiple
    elements of S2
  • many comparison expressions may be use
  • mappings may have multiple mapping expressions

6
ExampleMappings
  • Mappings may be
  • Cust.C Costumer.CustID
  • Concatenate( Cust.FirstName, Cust.LastName)
    Costumer.Contact

7
Architecture of generic match
8
Architecture of generic match
  • In general, it is not possible to determine fully
    automatically all matches between two schemas
  • The implementation of Match should therefore only
    determine match candidates
  • The user has to accept, reject or change them
  • The user should be able to specify matches for
    elements for which the system was unable to find
    satisfactory match candidates

9
Classification of schema matching approaches
  • One match operator may use multiple matching
    algorithms (matchers)
  • Different matchers work better to different
    application domains
  • categorization of individual matchers is first
    checked

10
Classification of schema matching approaches
  • Instance vs. Schema matchers can consider
    instance data or only schema-level information
  • Element vs. Structure matching match individual
    schema elements or combination of elements
  • Language vs. Constraint matcher can use
    linguistic-based approach or constraint-based
    approach

11
Classification of schema matching approaches
  • Matching Cardinality match result may relate
    multiple elements of the two schemas
  • Auxiliary Information matchers may use also
    dictionaries, global schemas, previous matching
    decisions and user input.

12
(No Transcript)
13
Schema-level matchers
  • In general
  • Consider schema information, like name,
    description, data type, relationship types
    (part-of, is-a, etc), constraints and schema
    structure.
  • Matchers may find multiple match candidates,
    attaching to it a degree of similarity in the
    range 0-1, in order to identify the best
    candidates.

14
Schema-level matchers1. Granularity of
match(element-level vs. structure-level)
  • Element-level matching
  • for each element of S1 determine matching
    elements in S2
  • may be at atomic level (attributes) or higher
    level (entities, classes, relational tables) but
    considers them in isolation, ignoring its
    substructure and components

15
Schema-level matchers1. Granularity of
match(element-level vs. structure-level)
  • Structure-level matching
  • matches combinations of elements that appear
    together in a structure in S1 with combinations
    of elements in S2
  • full match complete structures
  • partial match some components of each structure
    match
  • may use equivalence patterns (from a library)
    (e.g. is-a hierarchy ? single structure with
    Boolean attribute)

16
ExampleFull Partial Structural match
Atomic-level match
Address.ZIP CustomerAddress.PostalCode
17
ExampleEquivalence Pattern
18
Schema-level matchers2. Match cardinality
  • Each element of S1 (or S2) may participate in 0,
    1 or many mapping elements.
  • Within an individual mapping element one or more
    S1 elements can match one or more S2 elements.
    Cases are
  • 11, 1n, n1 (local cardinality)
  • nm (global cardinality requires structural
    match)
  • Most existing approaches do 11 and 1n

19
ExampleMatch cardinalities
20
Schema-level matchers3. Linguistic approaches
  • Matchers use names and text to find semantically
    similar schema elements
  • Need dictionaries (general nature, domain- or
    enterprise-specific, even multilanguage)
  • These specific dictionaries require much effort
    to be build up
  • Homonyms may mislead the matcher

21
Schema-level matchers3. Linguistic approaches
  • Name matching
  • equality of names
  • equality of canonical name (Cust CustNo)
  • equality of synonyms (make brand)
  • equality of hypernyms (book is-a publication
    article is-a publication book article)
  • Similarity based on pronunciation or soundex
  • user-provided name matches (reportsTo
    manager)
  • May be used for element- or structure- based
    matchers or even match different levels
    (author.name AuthorName)
  • Not limited to 11 matches(phone homePhone,
    officePhone )

22
Schema-level matchers3. Linguistic approaches
  • Description matching
  • Use comments of schema elements in natural
    language to match elements
  • simply by extracting words for synonym comparison
  • or as sophisticated as using natural language
    understanding technology for semantically
    equivalent expressions

Example
23
Schema-level matchers4. Constraint-based
approaches
  • Schemas often contain constraints to define data
    types and value rangedm uniqueness, optionality,
    relationship types and cardinalities.
  • If both schemas have such information the matcher
    can use it to match elements.
  • Obviously this criterion alone will make many
    matching errors.
  • Still this approach can be combined with other
    matchers to limit match candidates

24
Schema-level matchers5. Reusing schema and
mapping information
  • Improve effectiveness of Match by supporting the
    reuse of common schema components (schemas from
    same domains are often very similar)
  • reusable components are from atomic-level
    components to entire schema fragments
  • reuse of previously determined mappings. If
    matching S?S2 is already done and S1?S2 matching
    is needed, optionally S1?S could be found (if it
    is easier)

25
ExampleReuse of previously determined mappings
26
Instance-level matchers
  • Instance-level data can give insight into the
    contents and meaning of schema elements, using
    frequencies of words, combination of words, range
    of values etc.
  • Useful when schema information is limited and
    when semi-structured data is used
  • Even when schema information is available this
    approach can help decision between equally
    plausible matchings

27
Instance-level matchers
  • Applicable to the most above approaches but
    especially to
  • linguistic based approaches
  • constrained-based approaches
  • e.g. A constrained-based matcher may use a
    instance-level check to choose Pno EmpNo and
    not Pno DeptNo based on the range of values
    of the three attributes
  • Main drawback possible number of schema elements
    for evaluating instances

28
Combining different matchers
  • Several types of matchers. They can be combined
    into a single Match operator in two ways
  • Hybrid matcher that intergrades multiple matching
    criteria
  • Composite matcher that combines the results of
    independently executed matchers (including hybrid
    matchers)
  • Approaches must evaluate the possibility of using
    criteria simultaneously or in a specific order

29
Combining different matchers Hybrid matcher
  • Typically uses hard-wired combination of
    particular matching techniques that are executed
    simultaneously or in a fixed order.
  • Better match candidates and better performance
    than composite matcher
  • poor match candidates can be filtered out early
  • reduced number of passes

30
Combining different matchers Composite matcher
  • Allow a selection between several matchers
  • The user can choose the matchers to be executed
    either simultaneously or in a specific order and
    the way to combine results so that it better
    applies the particular domain
  • The composite matcher may find a selection and
    order automatically

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Conclusion
  • User interaction is necessary in any case because
    the implementation of Match can only determine
    match candidates which a user can accept, reject
    or change
  • The more configurable the matcher is the best
    results can be obtained
  • The current implementations have yet to explore
    more general view over the problem (independence
    of schema representation, more criteria available
    for the user to choose among, applicable in
    various domains)

35
Comments
  • If user must check all matchings and have to
    interfere with most of matchers steps, when do
    we win time and effort doing the work
    automatically?
  • Time Space complexity of the (multiple)
    algorithms?
Write a Comment
User Comments (0)
About PowerShow.com