Learning Object Identification Rules for Information Integration

About This Presentation

Title:

Learning Object Identification Rules for Information Integration

Description:

Previous methods manually construct mapping rules for object identification ... Uses query by bagging technique. Selects a small set of initial training examples ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 23

Provided by: nove98

Category:

more less

Transcript and Presenter's Notes

Title: Learning Object Identification Rules for Information Integration

1
Learning Object Identification Rules for
Information Integration

Sheila Tejada
Craig A. Knobleock
Steven Minton
_at_ University of Southern California

2
Introduction

When integrating information, data objects can
exist in inconsistent text formats across several
sources
Previous methods manually construct mapping rules
for object identification
Active Atlas learns to tailor mapping rules,
through limited user input, to a specific
application domain
Active Atlas achieves higher accuracy and require
less user involvement than previous methods

3
Object Identification Example
4
Ariadne Information Mediator
5
Ariadne Information Mediator(contd)
6
Active Atlas Approach to Map Objects

First, determine the text formatting
transformations and propose candidate mappings
Then, learn domain-specific mapping rules

7
Active Atlas Architecture
8
Mapping Objects(Transformation Functions)

General Transformation Functions
Type I
Stemming, Soundex, Abbreviation
Type II
Equality, Initial, Prefix, Suffix, Substring,
Abbreviation, Acronym

9
Mapping Objects(Transformation Functions Example)
10
Mapping Objects(Compute Attribute Similarity
Scores)
11
Mapping Objects(Compute Total Similarity Scores)

Total object similarity score is computed as a
weighted sum of the attribute similarity scores
Each attribute has a uniqueness weight that is a
heuristic measure of the importance of that
attribute

12
Mapping Objects(Output of Candidate Generator)
13
Mapping Objects(Mapping-Rule Learning)

Decision Tree Learning
Passive Learning
Requires a large set of training examples
Active Learning
Uses query by bagging technique
Selects a small set of initial training examples
Includes a variety of training examples
Creates a diverse set of decision tree learners
Actively chooses the examples for user to label

14
Mapping Objects(Active Learning)
15
Experimental Results

Three different domains
Restaurants, Companies and Airports
Experiments
Two base line experiments
Compare the shared attributes seperately
Compare the object as a whole
Both requires choosing an optimal threshold
Passive learning
Active learning

16
Experimental Results(Restaurants)

Source A 331 objects
Source B 533 objects
112 correct mappings
3259 candidate mappings over 10 runs

17
Measurement of Accuracy

Accuracy
The total number of correct classifications over
the total number of mappings plus the number of
correct mappings not proposed

18
Experimental Results
19
Experimental Results
20
Related Work
21
Conclusion

The research addresses the problem of mapping
objects between structured web sources
The experiments results show that Active Atlas
can achieve high accuracy, while limiting the
user involvement.

22
Future Work

Write a Comment

User Comments (0)