Sources of Success for Information Extraction Methods - PowerPoint PPT Presentation

About This Presentation

Title:

Sources of Success for Information Extraction Methods

Description:

... it will aid in future studies of hip revisions and in preoperative planning. ... classified standardized preoperative anteroposterior and lateral hip radiographs ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 31

Provided by: joseph121

Category:

more less

Transcript and Presenter's Notes

Title: Sources of Success for Information Extraction Methods

1
Sources of Success for Information Extraction
Methods

Seminar for Computational Learning and Adaptation
Stanford University
October 25, 2001
Joseph Smarr
jsmarr_at_stanford.edu
Based on research conducted at UC San Diego in
Summer 2001 with Charles Elkan and David Kauchak

2
Overview and Themes Identifying Sources of
Success

Brief overview of Information Extraction (IE)
paradigm and current methods
Getting under the hood of current systems to
understand the source of their performance and
limitations
Identifying new sources of information to exploit
for increased performance and usefulness

3
Motivation for Information Extraction

Abundance of freely available text in digital
form (WWW, MEDLINE, etc.)
Information contained un-annotated text is
largely inaccessible to computers
Much of this information appears ripe for the
plucking without having to do full text
understanding

4
Highly Structured Example Amazon.com Book Info
Pages
Desired Info title, author(s), price,
availability, etc.
5
Partially Structured ExampleSCLA Speaker
Announcement Emails
Desired Info title, speaker, date, abstract, etc.
6
Natural Text ExampleMEDLINE Journal Abstracts
BACKGROUND The most challenging aspect of
revision hip surgery is the management of bone
loss. A reliable and valid measure of bone loss
is important since it will aid in future studies
of hip revisions and in preoperative planning. We
developed a measure of femoral and acetabular
bone loss associated with failed total hip
arthroplasty. The purpose of the present study
was to measure the reliability and the
intraoperative validity of this measure and to
determine how it may be useful in preoperative
planning. METHODS From July 1997 to December
1998, forty-five consecutive patients with a
failed hip prosthesis in need of revision surgery
were prospectively followed. Three general
orthopaedic surgeons were taught the radiographic
classification system, and two of them classified
standardized preoperative anteroposterior and
lateral hip radiographs with use of the system.
Interobserver testing was carried out in a
blinded fashion. These results were then compared
with the intraoperative findings of the third
surgeon, who was blinded to the preoperative
ratings. Kappa statistics (unweighted and
weighted) were used to assess correlation.
Interobserver reliability was assessed by
examining the agreement between the two
preoperative raters. Prognostic validity was
assessed by examining the agreement between the
assessment by either Rater 1 or Rater 2 and the
intraoperative assessment (reference standard).
RESULTS With regard to the assessments of both
the femur and the acetabulum, there was
significant agreement (p lt 0.0001) between the
preoperative raters (reliability), with weighted
kappa values of gt0.75. There was also significant
agreement (p lt 0.0001) between each rater's
assessment and the intraoperative assessment
(validity) of both the femur and the acetabulum,
with weighted kappa values of gt0.75. CONCLUSIONS
With use of the newly developed classification
system, preoperative radiographs are reliable and
valid for assessment of the severity of bone loss
that will be found intraoperatively.
Desired Info subject size, study type, condition
studied, etc.
7
Current Types of IE Systems

Hand-built systems
Often effective, but slow and expensive to build
and adapt
Stochastic generative models
HMMs, N-Grams, PCFGs, etc.
Keep separate distributions for content and
filler states
Induced rule-based systems
Learn to identify local landmarks for beginning
and end of target information

8
Formalization of Information Extraction

Performance task
Extract specific tokens from a set of documents
that contain the desired information
Performance measure
Precision correct returned / total returned
Recall correct returned / total correct
F1 harmonic mean of precision and recall
Learning paradigm
Supervised learning on set of documents with
target fields manually labeled
Usually train/test on one field at a time

9
IE as a Classification Task Token Extraction as
Boundary Detection
Input Linear Sequence of Tokens
Date Thursday , October 25 Time 4
15 - 5 30 PM
Method Binary Classification of Inter-Token
Boundaries
Start / End of Content
Date Thursday , October 25 Time 4
15 - 5 30 PM

Unimportant Boundaries
Output Tokens Between Identified Start / End
Boundary
Date Thursday , October 25 Time 4
15 - 5 30 PM
10
Representation of Boundary Classifiers

Boundary Detectors are pairs of token sequences
ltp,sgt
Detector matches a boundary iff p matches text
before boundary and s matches text after boundary
Detectors can contain wildcards, e.g.
capitalized word, number, etc.
Example
ltDate,CapitalizedWordgt matches beginning of
Date Thursday, October 25

11
Boosted Wrapper Induction (BWI)Exemplar of
Current Rule-Based Systems

Wrapper Induction is a high-precision, low-recall
learner that performs well for highly structured
tasks
Boosting is a technique for combining multiple
weak learners into a strong learner by
reweighting examples
Boosted Wrapper Induction (BWI) was proposed by
Freitag and Kushmerick in 2000 as the marriage of
these two techniques

12
BWI Algorithm

Given set of documents with labeled fore and aft
boundaries, induce ltF,A,Hgt
F set of fore detectors
A set of aft detectors
H histogram of field lengths (for pairing fore
and aft detectors)
To learn each boundary detector
Start with an empty rule
Exhaustively enumerate all extensions up to
lookahead length L
Add best scoring token extension
Repeat until no extensions improve score
After learning a new detector
Re-weight documents according to AdaBoost
(down-weight correctly covered docs, up-weight
incorrectly covered docs, normalize all weights)
Repeat process, learning a new rule and
re-weighting each time
Stop after a predetermined number of iterations

13
Summary of Original BWI Results

BWI gives state-of-the-art performance on highly
structured and partially structured tasks
No systematic analysis of why BWI performs well
BWI proposed as a solution for natural text IE,
but no tests conducted

14
Goals of Our Research

Understand specifically how boosting contributes
to BWIs performance
Investigate the relationship between performance
and task regularity
Identify new sources of information to improve
performance, particularly for natural language
tasks

15
Comparison AlgorithmSequential Wrapper
Induction (SWI)

Same formulation as BWI, but uses set covering
instead of boosting to learn multiple rules
Find highest scoring rule
Remove all positive examples covered by new rule
Stop when all positive examples have been removed
Scoring function two choices
Greedy-SWI Most positive examples covered
without covering any negative examples
Root-SWI Sqrt(W) Sqrt(W-) (W and W- are
total weight of positive and negative examples
covered)
BWI uses root scoring, but many set covering
methods use greedy scoring

16
Component Matrix of Algorithms
Method for accumulating multiple
detectors Boosting Set Covering
BWI
Root-SWI
Method for scoring individual detectors Greedy
Root
Greedy-SWI
17
Question 1 Does BWI Outperform Greedy Approach
of SWI?

BWI has higher F1 than Greedy-SWI
Greedy-SWI tends to have slightly higher
precision, but BWI has considerably higher recall
Does this difference come from the scoring
function or the accumulation method?

Average of 8 partially structured IE tasks
18
Question 2 How Does Performance Differ By
Choice of Scoring Function?

Greedy-SWI and Root-SWI differ only by their
scoring function
Greedy-SWI has higher precision, Root-SWI had
higher recall, they have similar F1
BWI still outperforms Root-SWI, even though they
use identical scoring functions
Remaining differences
boosting vs. set covering
total number of rules learned

Average of 8 partially structured IE tasks
19
Question 3 How Does Number of Rules Learned
Affect Performance?

BWI learns predetermined number of rules, but SWI
stops when all examples are covered
Usually BWI learns many more rules than Root-SWI
Stop BWI after its learned as many rules as
Root-SWI too (Fixed-BWI)
Results in precision-recall tradeoff from
Root-SWI
BWI outperforms Fixed-BWI

Average of 8 partially structured IE tasks
20
Analysis of Experimental Results Why Does BWI
Outperform SWI?

Key Insight Source of BWIs success is
interaction of two complimentary effects, both
due to boosting
Re-weighting examples causes increasingly
specific rules to be learned to cover exceptional
cases (high precision)
Re-weighting examples instead of removing them
means rules can be learned even after all
examples have been covered (high recall)

21
Performance vs. Task Regularity Reveals
Important Interaction

All methods perform better on tasks with more
structure
Relative power of different algorithmic
components varies with task regularity

22
How Do We Quantify Task Regularity?

Goal Measure relationship between task
regularity and performance
Proposed solution
SWI-Ratio
of iterations Greedy-SWI takes to cover all
positive examples
total number of positive examples
Most regular case 1 rule covers all examples
1/? 0
Least regular case separate rule for each
example N/N 1
Since each new rule must cover at least one
example, SWI will learn at most N rules for N
examples (and usually much fewer) ? SWI-Ratio
always between 0 and 1 (smaller more regular)

23
Desirable Properties of SWI-Ratio

Relative to size of document collection ?
suitable for comparison across different sizes
General and objective
SWI is very simple and doesnt allow any negative
examples ? unbiased account of how many
non-overlapping rules are needed to perfectly
cover all examples
Quick and easy to run
No free parameters to set (except lookahead,
which we kept fixed in all tests)

24
Performance of BWI and Greedy-SWI (F1) vs. Task
Regularity (SWI-Ratio)
Dotted lines separate highly structured,
partially structured, and natural text domains
25
Improving IE Performance on Natural Text
Documents

Goal Compensate for weak IE performance on
natural language tasks
Need to look elsewhere for regularities to
exploit
Idea Consider grammatical structure
Run shallow parser on each sentence
Flatten output into sequence of typed phrase
segments (using XML tags to mark text)

26
Typed Phrase Segments Improve BWIs Performance
on Natural Text IE Tasks
21 increase
65 increase
45 increase
27
Typed Phrase Segments Increase Regularity of
Natural Text IE Tasks
Average decrease of 21
28
Encouraging Results Suggest Exploiting Other
Sources of Regularity

Key Insight We can improve performance on
natural text while maintaining simple IE
framework if we expose the right regularities
Suggests other linguistic abstractions may be
useful
More grammatical info, semantic categories,
lexical features, etc.

29
Conclusions and Summary

Boosting is key source of BWIs success
Learns specific rules, but learns many of them
IE performance is sensitive to task regularity
SWI-Ratio is quantitative, objective measure of
regularity (vs. subjective document classes)
Exploiting more regularities in text is key to
IEs future, particularly in natural text
Canonical formatting and keywords are often
sufficient in structured text documents
Exposing grammatical information boosts
performance for natural text IE tasks

30
Acknowledgements

Dayne Fretiag, for making BWI code available
Mark Craven, for giving us natural text MEDLINE
documents with annotated phrase segments
MedExpert International, Inc. for financial
support of this research
Charles Elkan and David Kauchak, for hosting me
at UCSD this summer

This work was conducted as part of the California
Institute for Telecommunications and Information
Technology, Cal-(IT)2.

Write a Comment

User Comments (0)