Learning Ensembles of First-Order Clauses for Recall-Precision Curves - PowerPoint PPT Presentation

About This Presentation
Title:

Learning Ensembles of First-Order Clauses for Recall-Precision Curves

Description:

Divide data into train and test sets. Generate hypotheses on train set and then measure performance on test set ... set of high precision. rule sets spanning ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 43
Provided by: markgo9
Category:

less

Transcript and Presenter's Notes

Title: Learning Ensembles of First-Order Clauses for Recall-Precision Curves


1
Learning Ensembles ofFirst-Order Clauses for
Recall-Precision Curves
  • Preliminary Thesis Proposal
  • Mark Goadrich
  • Department of Computer Sciences
  • University of Wisconsin Madison USA
  • 17 Dec 2004

2
Talk Outline
  • Background
  • Inductive Logic Programming
  • Evaluation Metrics
  • Biomedical Information Extraction
  • Preliminary Work
  • Three Ensemble Approaches
  • Empirical Results
  • Proposed Work
  • Extensions to Algorithms
  • Theoretical Results

3
Inductive Logic Programming
  • Machine Learning
  • Classify data into positive, negative categories
  • Divide data into train and test sets
  • Generate hypotheses on train set and then measure
    performance on test set
  • In ILP, data are Objects
  • person, block, molecule, word, phrase,
  • and Relations between them
  • grandfather, has_bond, is_member,

4
Learning daughter(A,B)
  • Positive Examples
  • daughter(mary, ann)
  • daughter(eve, tom)
  • Negative Examples
  • daughter(tom, ann)
  • daughter(eve, ann)
  • daughter(ian, tom)
  • daughter(ian, ann)
  • Background Knowledge
  • mother(ann, mary)
  • mother(ann, tom)
  • father(tom, eve)
  • father(tom, ian)
  • female(ann)
  • female(mary)
  • female(eve)
  • male(tom)
  • male(ian)

Ann
Mother
Mother
Mary
Tom
Father
Father
Ian
Eve
  • Possible Clauses
  • daughter(A,B) - true.
  • daughter(A,B) - female(A).
  • daughter(A,B) - female(A), male(B).
  • daughter(A,B) - female(A), father(B,A).
  • daughter(A,B) - female(A), mother(B,A).

Correct Theory
5
ILP Domains
  • Object Learning
  • Trains, Carcinogenesis
  • Link Learning
  • Binary predicates

6
Link Learning
  • Large skew toward negatives
  • 500 relational objects
  • 5000 positive links means 245,000 negative links
  • Enormous quantity of data
  • 4,285,199,774 web pages indexed by Google
  • PubMed includes over 15 million citations
  • Difficult to measure success
  • Always negative classifier is 98 accurate
  • ROC curves look overly optimistic

7
Evaluation Metrics
  • Classification vs Correctness
  • Positive or Negative
  • True or False
  • Evaluation
  • Recall
  • Precision

classification
TP
TN
correctness
FN
FP
  • True Positive Rate
  • False Positive Rate

8
Evaluation Metrics
1.0
  • Area Under Recall-Precision Curve (AURPC)
  • Cumulative measure over recall-precision space
  • All curves standardized to cover full recall
    range
  • Average AURPC over 5 folds

Precision
Recall
1.0
9
AURPC Interpolation
  • Convex interpolation in RP space?
  • Precision interpolation is counterintuitive
  • Example 1000 positive 9000 negative

TP FP TP Rate FP Rate Recall Prec
500 500 0.50 0.06 0.50 0.50

1000 9000 1.00 1.00 1.00 0.10
750
4750
0.75
0.53
0.75
0.14
Example Counts
RP Curves
ROC Curves
10
AURPC Interpolation
11
Biomedical Information Extraction
image courtesy of National Human Genome Research
Institute
12
Biomedical Information Extraction
  • Given Medical Journal abstracts tagged
    with protein localization relations
  • Do Construct system to extract protein
    localization phrases from unseen text
  • NPL3 encodes a nuclear protein with an RNA
    recognition motif and similarities to a family of
    proteins involved in RNA metabolism.

13
Biomedical Information Extraction
  • Hand-labeled dataset (Ray Craven 01)
  • 7,245 sentences from 871 abstracts
  • Examples are phrase-phrase combinations
  • 1,810 positive 279,154 negative
  • 1.6 GB of background knowledge
  • Structural, Statistical, Lexical and Ontological
  • In total, 200 distinct background predicates

14
Biomedical Information Extraction
  • NPL3 encodes a nuclear protein with

alphanumeric
marked location
15
Related Work
  • Bagging in ILP (Dutra et. al.)
  • Boosting FOIL (Quinlan)
  • Boosting ILP (Hoche)
  • Structural HMM (Ray and Craven)
  • WAWA-IE (Eliassi-Rad and Shavlik)
  • Markov Logic Nets (Richardson and Domingos)
  • ELCS (Bunescu et. al.)

16
Talk Outline
  • Background
  • Inductive Logic Programming
  • Evaluation Metrics
  • Biomedical Information Extraction
  • Preliminary Work
  • Three Ensemble Approaches
  • Empirical Results
  • Proposed Work
  • Extensions to Algorithms
  • Theoretical Results

17
Aleph - Background
  • Seed Example
  • A positive example that our clause must cover
  • Bottom Clause
  • All predicates which are true about seed example

seed
18
Aleph - Learning
  • Aleph learns theories of clauses (Srinivasan,
    v4, 2003)
  • Pick positive seed example, find bottom clause
  • Use heuristic search to find best clause
  • Pick new seed from uncovered positivesand repeat
    until threshold of positives covered
  • Theory produces one recall-precision point
  • Learning complete theories is time-consuming
  • Can produce ranking with ensembles

19
ILP Ensembles
  • Three Approaches
  • Aleph Ensembles of Multiple Theories
  • Clause Weighting of One Theory
  • Gleaner
  • Evaluation
  • Area Under Recall Precision Curve (AURPC)
  • Time Number of clauses considered

20
Aleph Ensembles
  • We construct ensembles of theories
  • Algorithm (Dutra et al ILP 2002)
  • Use K different initial seeds
  • Learn K theories containing C clauses
  • Rank examples by the number of theories
  • Need to balance C for high performance
  • Small C leads to low recall
  • Large C leads to converging theories

21
Aleph Ensembles (100 theories)
22
Clause Weighting
  • Single Theory Ensemble
  • rank by how many clauses cover examples
  • Weight clauses using tuneset statistics
  • Ordered
  • Rank by Precision or Lowest False Positive Rate
  • Average
  • Among all matching clauses
  • Cumulative
  • Precision Diversity on negatives
  • F1 score Recall

23
Clause Weighting
24
Gleaner
  • Develop fast ensemble algorithms focused on
    recall and precision evaluation
  • Definition of Gleaner
  • One who gathers grain left behind by reapers
  • Key Ideas of Gleaner
  • Keep wide range of clauses
  • Create separate theories for different recall
    ranges

25
Gleaner - Background
  • Rapid Random Restart (Zelezny et al ILP 2002)
  • Stochastic selection of initial clause
  • Time-limited local heuristic search
  • Randomly choose new initial clause and repeat

seed
26
Gleaner - Learning
  • Create B Bins
  • Generate Clauses
  • Record Best per Bin
  • Repeat for K seeds

Precision
Recall
27
Gleaner - Combining
  • Combine K clauses per bin
  • If at least L of K clauses match, call example
    positive
  • How to choose L ?
  • L1 then high recall, low  precision
  • LK then low  recall, high precision
  • Our method
  • Choose L such that ensemble recall matches bin b
  • Bin bs precision should be higher than any
    clause in it
  • We should now have set of high precision rule
    sets spanning space of recall levels

28
How to use Gleaner
  • Generate Curve
  • User Selects Recall Bin
  • Return ClassificationsWith Precision Confidence

Precision
Recall 0.50 Precision 0.70
Recall
29
Experimental Methodology
  • Performed five-fold cross-validation
  • Variation of parameters
  • Gleaner (20 recall bins)
  • seeds 25, 50, 75, 100
  • clauses 1K, 10K, 25K, 50K, 100K, 250K, 500K
  • Aleph Ensembles (0.75 minacc, 35,000 nodes)
  • theories 10, 25, 50, 75, 100
  • clauses per theory 1, 5, 10, 15, 20, 25, 50
  • Clause Weighting (1 Aleph theory)
  • clauses 25, 50, 100, 271

30
Empirical Results
31
Results Testfold 5 at 1,000,000 clauses
Gleaner
Ensembles
32
Results Testfold 5 at 1,000,000 clauses
33
Conclusions
  • Gleaner
  • Focuses on recall and precision
  • Keeps wide spectrum of clauses
  • Aleph ensembles
  • Early stopping helpful
  • Clause Weighting
  • Cumulative statistics important
  • AURPC
  • Useful metric for comparison
  • Interpolation unintuitive

34
Talk Outline
  • Background
  • Inductive Logic Programming
  • Evaluation Metrics
  • Biomedical Information Extraction
  • Preliminary Work
  • Three Ensemble Approaches
  • Empirical Results
  • Proposed Work
  • Extensions to Algorithms
  • Theoretical Results

35
Proposed Work
  • Improve Gleaner in High Recall areas
  • Need more emphasis on diverse clauses
  • Search for clauses that optimize AURPC
  • Use RankBoost and AURPC heuristic
  • Examine more ILP link-learning datasets
  • Focus within Information Extraction
  • Better understanding of AURPC
  • Relationship with ROC curves, F1-score

36
Gleaner Precision Bins
  • Create B Bins
  • Generate Clauses
  • Record Best per Bin
  • Repeat for K seeds

Precision
Recall
37
Gleaner Save Per Jump
  • Rapid Random Restart makes Jumps
  • Every 1,000 clauses, find new space to search
  • Saving best per jump will increase diversity

seed
38
Gleaner Negative Seeds
  • High Recall clauses found at top of lattice
  • Perform Breadth-First Search
  • Bias search away from Negative Examples

seed
39
ROC vs. RP Curves
40
ROC vs RP Curves
  • What is the relationship between ROC curves and
    RP curves?
  • Will optimizing one optimize the other?

41
Optimizing AURPC
  • WARNING! SLIDE INCOMPLETE!

42
Acknowledgements
  • USA NLM Grant 5T15LM007359-02
  • USA NLM Grant 1R01LM07050-01
  • USA DARPA Grant F30602-01-2-0571
  • USA Air Force Grant F30602-01-2-0571
  • Condor Group
  • David Page
  • Vitor Santos Costa, Ines Dutra
  • Soumya Ray, Marios Skounakis, Mark Craven
Write a Comment
User Comments (0)
About PowerShow.com