Relational Learning of Pattern-Match Rules for Information Extraction - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Relational Learning of Pattern-Match Rules for Information Extraction

Description:

Title: PowerPoint Presentation Last modified by: Tim Chartrand Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 12
Provided by: blondieCs
Category:

less

Transcript and Presenter's Notes

Title: Relational Learning of Pattern-Match Rules for Information Extraction


1
Relational Learning of Pattern-Match Rules for
Information Extraction
  • Presentation by Tim Chartrand of
  • A paper by
  • Mary Elaine Califf and
  • Raymond J. Mooney

2
Introduction
  • Information Extraction (IE) is the task of
    locating specific pieces of information in NL
    text
  • IE is an important subpart of text understanding
  • IE systems are difficult and time consuming to
    build and they dont port well to different
    domains
  • Researchers are combining learning methods with
    NLP methods to automate IE

3
Overview of RAPIER
  • RAPIER Robust Automated Production of
    Information Extraction Rules
  • Learn IE rules automatically
  • Use a corpus of documents paired with filled
    templates
  • Resulting rules do not require prior parsing or
    subsequent processing
  • Uses limited syntactic information from a POS
    tagger
  • Induced patterns incorporate semantic classes
  • Rules characterize slot-fillers and their context

4
RAPIER Rules
  • Consist of three parts
  • Pre-filler pattern matches text immediately
    preceding the extracted information
  • Filler pattern matches the exact text to be
    extracted
  • Post-filler pattern matches text after
    information
  • Each pattern is a sequence of pattern items or
    pattern lists
  • Pattern item specifies constraints for one word
    or symbol
  • Pattern list specifies constraints for 0..n words
    or symbols
  • Constraints include
  • List of words, one of which must match the item
  • POS tag
  • Semantic class

5
RAPIER Rules (cont.)
Pre-Filler Filler Post-Filler
1)word leading 1)list len2 tagsnn, nns 1)word firm, company
Leading telecommunications firm in need Leading telecommunications firm in need Leading telecommunications firm in need
1)tagnn, nnp 2)list length 2 1)word undisclosed tag jj 1)sem price
sold to the bank for an undisclosed amount paid Honeywell an undisclosed price sold to the bank for an undisclosed amount paid Honeywell an undisclosed price sold to the bank for an undisclosed amount paid Honeywell an undisclosed price
6
Learning Algorithm
located in Atlanta, Georgia. offices in Kansas
City, Missouri.
Pre-Filler Filler Post-Filler
S R U L E S 1)word located tagvbn 2) word in tag in 1)word atlanta tagnnp 1)word , tag , 2)word georgia tagnnp 3)word . tag .
S R U L E S 1)word offices tagnns 2)word in tag in 1)wordkansas tagnnp 2)wordcity tagnnp 1)word , tag , 2)word missouri tagnnp 3)word . tag .
R L I S T 1)list len-2 wordatlanta,kansas,city tagnnp
R L I S T 1)list len-2 tagnnp
R L I S T 1)word in tag in 1)list len-2 tagnnp 1)word , tag , 2)tagnnp semanticstate
For each slot, S in the template being learned
SlotRules most specific rules from document
S while compression has failed fewer than lim
times randomly select r pairs of rules from
SlotRules find the set L of generalizations of
the fillers of the rule pairs create rules
from L, evaluate, and initialize RulesList let
n 0 while best rule in RuleList produces
spurious fillers and weighted information
value of best rule is improving increme
nt n specialize each rule in RuleList with
generalizations of the last n items of the
pre-filler patterns of the rule pair and add
specializations to RuleList specialize each
rule in RuleList with generalizations of the
last n items of the post-filler patterns of
the rule pair and add specializations to
RuleList if best rule in RuleList produces
only valid fillers Add it to
SlotRules Remove empirically subsumed rules
7
Experimental Results
  • The task Extract information from
    coputer-related job postings
  • 17 slots used, including employer, salary, etc.
  • Results do not employ semantic categories
  • 100 document dataset with filled templates with
    10-fold cross validation
  • Measured precision, recall,
    and F-measure

8
Experimental Results continued
  • Performance
  • Is comparable to Crystal on a medical domain
  • Is better than AutoSlog and AutoSlog-TS on MUC-4
    terrorism task
  • Is hard to compare because of the different
    domains tested
  • Is good because precision is most important

9
Related Work
  • Resolve
  • Uses decision trees
  • Uses annotated coreference examples
  • Crystal
  • Uses a clustering algorithm to build a dictionary
    of extraction patterns
  • Requires patterns identified by an expert
  • Requires prior syntax analysis to identify
    syntactic elements and their relationships
  • AutoSlog
  • Specializes a set of general syntatic patterns
  • An expert must examine the patterns it produces
  • Requires prior syntax analysis
  • Liep
  • Requires prior syntax analysis
  • Makes no real use of semantic information
  • Has not been applied to complex domains

10
Related Work BYU DEG
  • RAPIER rules correspond closely to DEG data
    frames.
  • Data frames are finer-grained, based on character
    patterns, whereas rules are based on word
    patterns
  • Pre-filler and Post-filler patterns correspond
    closely to data frame contexts and key words
  • Semantic categories correspond closely with
    lexicons
  • Not mentioned how RAPIER handles multiple record
    documents
  • Rapier data structure is given by the template
    (slots) defined in the input data
  • RAPIER is very similar in purpose to what Joe is
    trying to do learn extraction rules based on a
    filled in form

11
Conclusions
  • Extracting desired pieces of information from NL
    text is important
  • Manually constructing IE systems too hard
  • RAPIER uses relational learning to build a set of
    pattern-match rules given a database of texts and
    filled templates
  • Learned patterns employ syntactic and semantic
    information to match slot fillers and context
  • Fairly accurate results can be obtained for a
    real-world problem with relatively small datasets
  • RAPIER compares favorably with other IE learning
    systems
Write a Comment
User Comments (0)
About PowerShow.com