Learning Object Identification Rules for Information Integration - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Learning Object Identification Rules for Information Integration

Description:

Previous methods manually construct mapping rules for object identification ... Uses query by bagging technique. Selects a small set of initial training examples ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 23
Provided by: nove98
Category:

less

Transcript and Presenter's Notes

Title: Learning Object Identification Rules for Information Integration


1
Learning Object Identification Rules for
Information Integration
  • Sheila Tejada
  • Craig A. Knobleock
  • Steven Minton
  • _at_ University of Southern California

2
Introduction
  • When integrating information, data objects can
    exist in inconsistent text formats across several
    sources
  • Previous methods manually construct mapping rules
    for object identification
  • Active Atlas learns to tailor mapping rules,
    through limited user input, to a specific
    application domain
  • Active Atlas achieves higher accuracy and require
    less user involvement than previous methods

3
Object Identification Example
4
Ariadne Information Mediator
5
Ariadne Information Mediator(contd)
6
Active Atlas Approach to Map Objects
  • First, determine the text formatting
    transformations and propose candidate mappings
  • Then, learn domain-specific mapping rules

7
Active Atlas Architecture
8
Mapping Objects(Transformation Functions)
  • General Transformation Functions
  • Type I
  • Stemming, Soundex, Abbreviation
  • Type II
  • Equality, Initial, Prefix, Suffix, Substring,
  • Abbreviation, Acronym

9
Mapping Objects(Transformation Functions Example)
10
Mapping Objects(Compute Attribute Similarity
Scores)
11
Mapping Objects(Compute Total Similarity Scores)
  • Total object similarity score is computed as a
    weighted sum of the attribute similarity scores
  • Each attribute has a uniqueness weight that is a
    heuristic measure of the importance of that
    attribute

12
Mapping Objects(Output of Candidate Generator)
13
Mapping Objects(Mapping-Rule Learning)
  • Decision Tree Learning
  • Passive Learning
  • Requires a large set of training examples
  • Active Learning
  • Uses query by bagging technique
  • Selects a small set of initial training examples
  • Includes a variety of training examples
  • Creates a diverse set of decision tree learners
  • Actively chooses the examples for user to label

14
Mapping Objects(Active Learning)
15
Experimental Results
  • Three different domains
  • Restaurants, Companies and Airports
  • Experiments
  • Two base line experiments
  • Compare the shared attributes seperately
  • Compare the object as a whole
  • Both requires choosing an optimal threshold
  • Passive learning
  • Active learning

16
Experimental Results(Restaurants)
  • Source A 331 objects
  • Source B 533 objects
  • 112 correct mappings
  • 3259 candidate mappings over 10 runs

17
Measurement of Accuracy
  • Accuracy
  • The total number of correct classifications over
    the total number of mappings plus the number of
    correct mappings not proposed

18
Experimental Results
19
Experimental Results
20
Related Work
21
Conclusion
  • The research addresses the problem of mapping
    objects between structured web sources
  • The experiments results show that Active Atlas
    can achieve high accuracy, while limiting the
    user involvement.

22
Future Work
Write a Comment
User Comments (0)
About PowerShow.com