PASCAL CHALLENGE ON EVALUATING MACHINE LEARNING FOR INFORMATION EXTRACTION

About This Presentation

Title:

PASCAL CHALLENGE ON EVALUATING MACHINE LEARNING FOR INFORMATION EXTRACTION

Description:

Unannotated Data 2. 250 Conference CFP. WWW. PASCAL. PASCAL. Annotation Slots. 100.0% 2.3 ... Same as Task1 but can use the 500 unannotated documents ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 35

Provided by: neili7

Category:

more less

Transcript and Presenter's Notes

Title: PASCAL CHALLENGE ON EVALUATING MACHINE LEARNING FOR INFORMATION EXTRACTION

1
PASCAL CHALLENGE ON EVALUATING MACHINE LEARNING
FOR INFORMATION EXTRACTION
Neil Ireson Local Challenge Coordinator Web
Intelligent Group Department of Computer
Science University of Sheffield UK
2
Organisers

Sheffield Fabio Ciravegna
UCD Dublin Nicholas Kushmerick
ITC-IRST Alberto Lavelli
University of Illinois Mary-Elaine Califf
FairIsaac Dayne Freitag

3
Outline

Challenge Goals
Data
Tasks
Participants
Experimental Results
Conclusions

4
Goal Provide a testbed for comparative
evaluation of ML-based IE

Standardisation
Data
Partitioning
Same set of features
Corpus preprocessed using Gate
No features allowed other than the ones provided
Explicit Tasks
Evaluation Metrics
For future use
Available for further test with same or new
systems
Possible to publish and new corpora or tasks

5
Data (Workshop CFP)
2005
Testing Data 200 Workshop CFP
2000
Training Data 400 Workshop CFP
1993
6
Data (Workshop CFP)
2005
Testing Data 200 Workshop CFP
2000
Training Data 400 Workshop CFP
1993
7
Data (Workshop CFP)
2005
Testing Data 200 Workshop CFP
2000
Training Data 400 Workshop CFP
1993
8
Data (Workshop CFP)
2005
Testing Data 200 Workshop CFP
2000
Training Data 400 Workshop CFP
1993
9
(No Transcript)
10
Annotation Slots
11
Preprocessing

GATE
Tokenisation
Part-Of-Speech
Named-Entities
Date, Location, Person, Number, Money

12
Evaluation Tasks

Task1 - ML for IE Annotating implicit
information
4-fold cross-validation on 400 training documents
Final Test on 200 unseen test documents
Task2a - Learning Curve
Effect of increasing amounts of training data on
learning
Task2b - Active learning Learning to select
documents
Given seed documents select the documents to add
to training set
Task3a Semi-supervised Learning Given data
Same as Task1 but can use the 500 unannotated
documents
Task3b - Semi-supervised Learning Any Data
Same as Task1 but can use all available
unannotated documents

13
Evaluation

Precision/Recall/F1Measure
MUC Scorer
Automatic Evaluation Server
Exact matching
Extract every slot occurrence

14
Participants
15
Task1

Information Extraction with all the available data

16
Task1 Test Corpus
17
Task1 Test Corpus
18
Task1 Test Corpus
19
Task1 4-Fold Cross-validation
20
Task1 4-Fold Test Corpus
21
Task1 Slot FMeasure
22
Best Slot FMeasures Task1 Test Corpus
23
Task 2a