Sources of Success for Information Extraction Methods - PowerPoint PPT Presentation

About This Presentation
Title:

Sources of Success for Information Extraction Methods

Description:

... it will aid in future studies of hip revisions and in preoperative planning. ... classified standardized preoperative anteroposterior and lateral hip radiographs ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 31
Provided by: joseph121
Category:

less

Transcript and Presenter's Notes

Title: Sources of Success for Information Extraction Methods


1
Sources of Success for Information Extraction
Methods
  • Seminar for Computational Learning and Adaptation
  • Stanford University
  • October 25, 2001
  • Joseph Smarr
  • jsmarr_at_stanford.edu
  • Based on research conducted at UC San Diego in
    Summer 2001 with Charles Elkan and David Kauchak

2
Overview and Themes Identifying Sources of
Success
  • Brief overview of Information Extraction (IE)
    paradigm and current methods
  • Getting under the hood of current systems to
    understand the source of their performance and
    limitations
  • Identifying new sources of information to exploit
    for increased performance and usefulness

3
Motivation for Information Extraction
  • Abundance of freely available text in digital
    form (WWW, MEDLINE, etc.)
  • Information contained un-annotated text is
    largely inaccessible to computers
  • Much of this information appears ripe for the
    plucking without having to do full text
    understanding

4
Highly Structured Example Amazon.com Book Info
Pages
Desired Info title, author(s), price,
availability, etc.
5
Partially Structured ExampleSCLA Speaker
Announcement Emails
Desired Info title, speaker, date, abstract, etc.
6
Natural Text ExampleMEDLINE Journal Abstracts
BACKGROUND The most challenging aspect of
revision hip surgery is the management of bone
loss. A reliable and valid measure of bone loss
is important since it will aid in future studies
of hip revisions and in preoperative planning. We
developed a measure of femoral and acetabular
bone loss associated with failed total hip
arthroplasty. The purpose of the present study
was to measure the reliability and the
intraoperative validity of this measure and to
determine how it may be useful in preoperative
planning. METHODS From July 1997 to December
1998, forty-five consecutive patients with a
failed hip prosthesis in need of revision surgery
were prospectively followed. Three general
orthopaedic surgeons were taught the radiographic
classification system, and two of them classified
standardized preoperative anteroposterior and
lateral hip radiographs with use of the system.
Interobserver testing was carried out in a
blinded fashion. These results were then compared
with the intraoperative findings of the third
surgeon, who was blinded to the preoperative
ratings. Kappa statistics (unweighted and
weighted) were used to assess correlation.
Interobserver reliability was assessed by
examining the agreement between the two
preoperative raters. Prognostic validity was
assessed by examining the agreement between the
assessment by either Rater 1 or Rater 2 and the
intraoperative assessment (reference standard).
RESULTS With regard to the assessments of both
the femur and the acetabulum, there was
significant agreement (p lt 0.0001) between the
preoperative raters (reliability), with weighted
kappa values of gt0.75. There was also significant
agreement (p lt 0.0001) between each rater's
assessment and the intraoperative assessment
(validity) of both the femur and the acetabulum,
with weighted kappa values of gt0.75. CONCLUSIONS
With use of the newly developed classification
system, preoperative radiographs are reliable and
valid for assessment of the severity of bone loss
that will be found intraoperatively.
Desired Info subject size, study type, condition
studied, etc.
7
Current Types of IE Systems
  • Hand-built systems
  • Often effective, but slow and expensive to build
    and adapt
  • Stochastic generative models
  • HMMs, N-Grams, PCFGs, etc.
  • Keep separate distributions for content and
    filler states
  • Induced rule-based systems
  • Learn to identify local landmarks for beginning
    and end of target information

8
Formalization of Information Extraction
  • Performance task
  • Extract specific tokens from a set of documents
    that contain the desired information
  • Performance measure
  • Precision correct returned / total returned
  • Recall correct returned / total correct
  • F1 harmonic mean of precision and recall
  • Learning paradigm
  • Supervised learning on set of documents with
    target fields manually labeled
  • Usually train/test on one field at a time

9
IE as a Classification Task Token Extraction as
Boundary Detection
Input Linear Sequence of Tokens
Date Thursday , October 25 Time 4
15 - 5 30 PM
Method Binary Classification of Inter-Token
Boundaries
Start / End of Content
Date Thursday , October 25 Time 4
15 - 5 30 PM

Unimportant Boundaries
Output Tokens Between Identified Start / End
Boundary
Date Thursday , October 25 Time 4
15 - 5 30 PM
10
Representation of Boundary Classifiers
  • Boundary Detectors are pairs of token sequences
    ltp,sgt
  • Detector matches a boundary iff p matches text
    before boundary and s matches text after boundary
  • Detectors can contain wildcards, e.g.
    capitalized word, number, etc.
  • Example
  • ltDate,CapitalizedWordgt matches beginning of
  • Date Thursday, October 25

11
Boosted Wrapper Induction (BWI)Exemplar of
Current Rule-Based Systems
  • Wrapper Induction is a high-precision, low-recall
    learner that performs well for highly structured
    tasks
  • Boosting is a technique for combining multiple
    weak learners into a strong learner by
    reweighting examples
  • Boosted Wrapper Induction (BWI) was proposed by
    Freitag and Kushmerick in 2000 as the marriage of
    these two techniques

12
BWI Algorithm
  • Given set of documents with labeled fore and aft
    boundaries, induce ltF,A,Hgt
  • F set of fore detectors
  • A set of aft detectors
  • H histogram of field lengths (for pairing fore
    and aft detectors)
  • To learn each boundary detector
  • Start with an empty rule
  • Exhaustively enumerate all extensions up to
    lookahead length L
  • Add best scoring token extension
  • Repeat until no extensions improve score
  • After learning a new detector
  • Re-weight documents according to AdaBoost
    (down-weight correctly covered docs, up-weight
    incorrectly covered docs, normalize all weights)
  • Repeat process, learning a new rule and
    re-weighting each time
  • Stop after a predetermined number of iterations

13
Summary of Original BWI Results
  • BWI gives state-of-the-art performance on highly
    structured and partially structured tasks
  • No systematic analysis of why BWI performs well
  • BWI proposed as a solution for natural text IE,
    but no tests conducted

14
Goals of Our Research
  • Understand specifically how boosting contributes
    to BWIs performance
  • Investigate the relationship between performance
    and task regularity
  • Identify new sources of information to improve
    performance, particularly for natural language
    tasks

15
Comparison AlgorithmSequential Wrapper
Induction (SWI)
  • Same formulation as BWI, but uses set covering
    instead of boosting to learn multiple rules
  • Find highest scoring rule
  • Remove all positive examples covered by new rule
  • Stop when all positive examples have been removed
  • Scoring function two choices
  • Greedy-SWI Most positive examples covered
    without covering any negative examples
  • Root-SWI Sqrt(W) Sqrt(W-) (W and W- are
    total weight of positive and negative examples
    covered)
  • BWI uses root scoring, but many set covering
    methods use greedy scoring

16
Component Matrix of Algorithms
Method for accumulating multiple
detectors Boosting Set Covering
BWI
Root-SWI
Method for scoring individual detectors Greedy
Root
Greedy-SWI
17
Question 1 Does BWI Outperform Greedy Approach
of SWI?
  • BWI has higher F1 than Greedy-SWI
  • Greedy-SWI tends to have slightly higher
    precision, but BWI has considerably higher recall
  • Does this difference come from the scoring
    function or the accumulation method?

Average of 8 partially structured IE tasks
18
Question 2 How Does Performance Differ By
Choice of Scoring Function?
  • Greedy-SWI and Root-SWI differ only by their
    scoring function
  • Greedy-SWI has higher precision, Root-SWI had
    higher recall, they have similar F1
  • BWI still outperforms Root-SWI, even though they
    use identical scoring functions
  • Remaining differences
  • boosting vs. set covering
  • total number of rules learned

Average of 8 partially structured IE tasks
19
Question 3 How Does Number of Rules Learned
Affect Performance?
  • BWI learns predetermined number of rules, but SWI
    stops when all examples are covered
  • Usually BWI learns many more rules than Root-SWI
  • Stop BWI after its learned as many rules as
    Root-SWI too (Fixed-BWI)
  • Results in precision-recall tradeoff from
    Root-SWI
  • BWI outperforms Fixed-BWI

Average of 8 partially structured IE tasks
20
Analysis of Experimental Results Why Does BWI
Outperform SWI?
  • Key Insight Source of BWIs success is
    interaction of two complimentary effects, both
    due to boosting
  • Re-weighting examples causes increasingly
    specific rules to be learned to cover exceptional
    cases (high precision)
  • Re-weighting examples instead of removing them
    means rules can be learned even after all
    examples have been covered (high recall)

21
Performance vs. Task Regularity Reveals
Important Interaction
  • All methods perform better on tasks with more
    structure
  • Relative power of different algorithmic
    components varies with task regularity

22
How Do We Quantify Task Regularity?
  • Goal Measure relationship between task
    regularity and performance
  • Proposed solution
  • SWI-Ratio
  • of iterations Greedy-SWI takes to cover all
    positive examples
  • total number of positive examples
  • Most regular case 1 rule covers all examples
    1/? 0
  • Least regular case separate rule for each
    example N/N 1
  • Since each new rule must cover at least one
    example, SWI will learn at most N rules for N
    examples (and usually much fewer) ? SWI-Ratio
    always between 0 and 1 (smaller more regular)

23
Desirable Properties of SWI-Ratio
  • Relative to size of document collection ?
    suitable for comparison across different sizes
  • General and objective
  • SWI is very simple and doesnt allow any negative
    examples ? unbiased account of how many
    non-overlapping rules are needed to perfectly
    cover all examples
  • Quick and easy to run
  • No free parameters to set (except lookahead,
    which we kept fixed in all tests)

24
Performance of BWI and Greedy-SWI (F1) vs. Task
Regularity (SWI-Ratio)
Dotted lines separate highly structured,
partially structured, and natural text domains
25
Improving IE Performance on Natural Text
Documents
  • Goal Compensate for weak IE performance on
    natural language tasks
  • Need to look elsewhere for regularities to
    exploit
  • Idea Consider grammatical structure
  • Run shallow parser on each sentence
  • Flatten output into sequence of typed phrase
    segments (using XML tags to mark text)

26
Typed Phrase Segments Improve BWIs Performance
on Natural Text IE Tasks
21 increase
65 increase
45 increase
27
Typed Phrase Segments Increase Regularity of
Natural Text IE Tasks
Average decrease of 21
28
Encouraging Results Suggest Exploiting Other
Sources of Regularity
  • Key Insight We can improve performance on
    natural text while maintaining simple IE
    framework if we expose the right regularities
  • Suggests other linguistic abstractions may be
    useful
  • More grammatical info, semantic categories,
    lexical features, etc.

29
Conclusions and Summary
  • Boosting is key source of BWIs success
  • Learns specific rules, but learns many of them
  • IE performance is sensitive to task regularity
  • SWI-Ratio is quantitative, objective measure of
    regularity (vs. subjective document classes)
  • Exploiting more regularities in text is key to
    IEs future, particularly in natural text
  • Canonical formatting and keywords are often
    sufficient in structured text documents
  • Exposing grammatical information boosts
    performance for natural text IE tasks

30
Acknowledgements
  • Dayne Fretiag, for making BWI code available
  • Mark Craven, for giving us natural text MEDLINE
    documents with annotated phrase segments
  • MedExpert International, Inc. for financial
    support of this research
  • Charles Elkan and David Kauchak, for hosting me
    at UCSD this summer

This work was conducted as part of the California
Institute for Telecommunications and Information
Technology, Cal-(IT)2.
Write a Comment
User Comments (0)
About PowerShow.com