Information Extraction A Practical Survey - PowerPoint PPT Presentation

About This Presentation
Title:

Information Extraction A Practical Survey

Description:

Information Extraction A Practical Survey Mihai Surdeanu TALP Research Center Dep. Llenguatges i Sistemes Inform tics Universitat Polit cnica de Catalunya – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 59
Provided by: DanM82
Learn more at: https://www.cs.upc.edu
Category:

less

Transcript and Presenter's Notes

Title: Information Extraction A Practical Survey


1
Information ExtractionA Practical Survey
Mihai Surdeanu
  • TALP Research Center
  • Dep. Llenguatges i Sistemes InformĂ tics
  • Universitat Politècnica de Catalunya
  • surdeanu_at_lsi.upc.es

2
Overview
  • What is information extraction?
  • A traditional system and its problems
  • Pattern learning and classification
  • Beyond patterns

3
What is information extraction?
  • The extraction or pulling out of pertinent
    information from large volumes of texts.
    (http//www.itl.nist.gov/iad/894.02/related_projec
    ts/muc/index.html)
  • Information extraction (IE) systems extract
    concepts, events, and relations that are relevant
    for a given scenario domain.
  • But, what is a concept, an event, or a scenario
    domain? Actual implementations of IE systems
    varied throughout the history of the task MUC,
    Event99, EELD.
  • The tendency is to simplify the definition (or
    rather the implementation) of the task.

4
Information Extraction at the Message
Understanding Conferences
  • Seven MUC conferences, between 1987 and 1998.
  • Scenario domains driven by template
    specifications (fairly similar to database
    schemas), which define the content to be
    extracted.
  • Each event fills exactly one template (fairly
    similar to a database record).
  • Each template slot contains either text, or
    pointers to other templates.
  • The goal was to use IE technology to populate
    relational databases. Never really happened
  • The chosen representation was too complicated.
  • Did not address real-world problems, but
    artificial benchmarks.
  • Systems never achieved good-enough accuracy.

5
MUC-6 Management Succession Example
Barry Diller was appointed chief executive
officer of QVC Network Inc
  • ltSUCCESSION_EVENT-9301190125-1gt
  • SUCCESSION_ORG ltORGANIZATION-9301190125-1gt
  • POST chief executive officer
  • IN_AND_OUT
  • ltIN_AND_OUT- 9301190125-1gt
  • ltIN_AND_OUT- 9301190125-2gt
  • VACANCY_REASON REASSIGNMENT
  • lt IN_AND_OUT- 9301190125-1gt
  • IO_PERSON ltPERSON- 9301190125-1gt
  • NEW_STATUS IN
  • ON_THE_JOB UNCLEAR
  • OTHER_ORG ltORGANIZATION- 9301190125-2gt
  • REL_OTHER_ORG OUTSIDE_ORG
  • COMMENT Barry Diller IN
  • ltORGANIZATION-9301190125-1gt
  • ORG_NAME QVC Network Inc.
  • ORG_TYPE COMPANY

MUC6 Template
Template slot with a text fill
Template slot that points to another template
6
Information Extraction at DARPAs HUB-4 Event99
  • Was planned as a successor of MUC.
  • Identification and extraction of relevant
    information dictated by templettes, which are
    flat, simplified templates. Slots are filled
    only with text, no pointers to other templettes
    are accepted.
  • Domains closer to real-world applications are
    addressed natural disasters, bombing, deaths,
    elections, financial fluctuations, illness
    outbreaks.
  • The goal was to provide event-level indexing into
    documents such as news wires, radio and
    television transcripts etcetera. Imagine
    querying BOMBING AND Gaza in news messages,
    and retrieving only the relevant text about
    bombing events in the Gaza area classified into
    templettes.
  • Event99 A Proposed Event Indexing Task For
    Broadcast News. Lynette Hirschman et al.
    (http//citeseer.nj.nec.com/424439.html)

7
Event99 Death ExampleTemplettes Versus
Templates
The sole survivor of the car crash that killed
Princess Diana and Dodi Fayed last year in France
is remembering more about the accident.
ltDEATH-CNN3-1gt DECEASED Princess
Diana / Dodi Fayed MANNER_OF_DEATH
the car crash that killed Princess Diana and
Dodi Fayed / the accident LOCATION in
France DATE last year
8
Information Extraction at DARPAs Evidence
Extraction and Link Detection (EELD) Program
  • IE used as a tool for the more general problem of
    link discovery sift through large data
    collections and derive complex rules from
    collections of simpler IE patterns.
  • Example certain sets of account_number(Person,Acc
    ount), deposit(Account,Amount),
    greater_than(Amount,reporting_amount) patterns
    imply is_a(Person, money_launderer). Note the
    fact that Person is a money_launderer is not
    stated in any form in text!
  • IE used to identify concepts (typically named
    entities), events (typically identified by
    trigger words), and basic entity-entity and
    entity-event relations.
  • Simpler IE problem
  • No templates or templettes generated.
  • Not dealing with event merging.
  • Events always marked by trigger words, e.g.
    murder triggers a MURDER event.
  • Relations are always intra-sentential.
  • EELD web portal http//www.rl.af.mil/tech/program
    s/eeld/

9
EELD Example
John Smith is the chief scientist of Hardcom
Corporation.
Entities Person(John Smith), Organization(
Hardcom Corporation) Events -- Relations
person-affiliation(Person(John Smith),
Organization(Hardcom
Corporation))
The murder of John Smith
Entities Person(John Smith) Events
Murder(murder) Relations murder-victim(Person(Joh
n Smith),
Murder(murder))
10
Overview
  • What is information extraction?
  • A traditional system and its problems
  • Pattern learning and classification
  • Beyond patterns

11
Traditional IE Architecture
  • The Finite State Automaton Text Understanding
    System (FASTUS) approach cascaded finite state
    automata (FSA).
  • Each FSA level recognizes larger linguistic
    contructs (from tokens to chunks to clauses to
    domain patterns), which become the simplified
    input for the next FSA in the cascade.
  • Why? Speed. Robustness to unstructured input.
    Handles data sparsity well.
  • The FSA cascade is enriched with limited
    discourse processing components coreference
    resolution and event merging.
  • Most systems in MUC ended up using this
    architecture CIRCUS from UMass (was actually the
    first to introduce the cascaded FSA
    architecture), PROTEUS (NYU), PLUM (BBN), CICERO
    (LCC) and many others.
  • An ocean of information available
  • FASTUS A Cascaded Finite-State Transducer for
    Extracting Information from Natural-Language
    Text. Jerry R. Hobbs et al. http//www.ai.sri.com/
    natural-language/projects/fastus-schabes.html
  • Infrastructure for Open-Domain Information
    Extraction. Mihai Surdeanu and Sanda Harabagiu.
    http//www.languagecomputer.com/papers/hlt2002.pdf
  • Rich IE bibliography maintained by Horacio
    Rodriguez at http//www.lsi.upc.es/horacio/vario
    s/sevilla2001.zip

12
Language Computers CICERO Information
Extraction System
Documents
Recognizes known concepts using lexicons and
gazetteers.
known word recognition
Identifies numerical entities such as money,
percents, dates and times (FSA)
numerical-entity recognition
stand-alone named-entity recognizer
Identifies named entities such as person,
location, and organization names (FSA)
named-entity recognition
Disambiguates incomplete or ambiguous names
name aliasing
phrasal parser
Identifies basic, noun, verb, and particle
phrases (TBL FSA)
phrase combiner
Identifies domain-dependent complex noun and verb
phrases (FSA)
entity coreference resolution
Detects pronominal and nominal coreference links
domain pattern recognition
Identifies domain-dependent patterns (FSA)
event coreference
Resolves empty templette slots
event merging
Merges templettes belonging to the same event
Templettes/Templates
13
Walk-Through Example (1/5)
At least seven police officers were killed and as
many as 52 other people, including several
children, were injured Monday in a car bombing
that also wrecked a police station. Kirkuks
police said they had "good information" that
Ansar al-Islam was behind the blast.
  • ltBOMBINGgt
  • BOMB a car bombing
  • PERPETRATOR Ansar al-Islam
  • DEAD At least seven police officers
  • INJURED as many as 52 other people, including
    several children
  • DAMAGE a police station
  • LOCATION Kirkuk
  • DATE Monday

14
Walk-Through Example (2/5)
15
Walk-Through Example (3/5)
Entity coreference resolution
they ? The police the blast ? a car bombing
16
Walk-Through Example (4/5)
At least seven police officers were
killed/PATTERN and as many as 52 other people,
including several children, were injured Monday
in a car bombing/PATTERN car bombing that also
wrecked a police station/PATTERN. Kirkuks police
said they had "good information" that Ansar
al-Islam was behind the blast/PATTERN.
17
Walk-Through Example (5/5)
18
Coreference for IE
  • Algorithm detailed in Recognizing Referential
    Links An Information Extraction Perspective.
    Megumi Kameyama. http//citeseer.nj.nec.com/kameya
    ma97recognizing.html
  • 3 step algorithm
  • Identify all anaphoric entities, e.g. pronouns,
    nouns, ambiguous named-entities.
  • For each anaphoric entity identify all possible
    candidates and sort them according to same
    salience ordering, e.g. left-to-right traversal
    in the same sentence, right-to-left traversal in
    previous sentences.
  • Extract the first candidate that matches some
    semantic constraints, e.g. number and gender
    consistency. Merge the candidate with the
    anaphoric entity.

19
The Role of Coreference in Named Entity
Recognition
  • Classifies unknown named-entities, that are
    likely part of a name but can not be identified
    as such due to insufficient local context.
  • Example Michigan National Corp./ORG said it
    will eliminate some senior management jobs
    Michigan National/? said the restructuring
  • Disambiguates named entities of ambiguous length
    and/or ambiguous type.
  • Michigan changed from LOC to ORG when Michigan
    Corp. appears in the same context.
  • The text McDonalds may contain a person name
    McDonald or an organization name McDonalds.
    Non-deterministic FSA used to maintain both
    alternatives until after name aliasing, when one
    is selected.
  • Disambiguate headline named entities.
  • Headlines typically capitalized, e.g. McDermott
    Completes Sale
  • Processing of headlines postponed until after the
    body of text is processed.
  • A longest-match approach is used to match the
    headline sequence of tokens against entities
    found in the first body paragraph. For example,
    McDermott is labeled to ORG because it matches
    over McDermott International Inc. in the first
    document paragraph.
  • Over 5 increase in accuracy (F-measure) from
    87.81 to 93.64.

20
The Role of Coreference in IE
21
The Good
  • Relatively good performance with a simple system
  • F-measures over 75 up to 88 for some simpler
    Event99 domains
  • Execution times below 10 seconds per 5KB document
  • Improvements to the FSA-only approach
  • Coreference almost doubles the FSA-only
    performance
  • More extraction rules add little to the IE
    performance whereas different forms of
    coreference add more
  • Non-determinism used to mitigate the limited
    power of FSA grammars

22
The Bad
  • Needs domain-specific lexicons, e.g. an ontology
    of bombing devices. Work the automate this
    process Learning Dictionaries for Information
    Extraction by Multi-Level Bootstrapping. Ellen
    Riloff and Rosie Jones. http//www.cs.utah.edu/ri
    loff/psfiles/aaai99.pdf (not covered in this
    presentation)
  • Domain-specific patterns must be developed, e.g.
    ltSUBJECTgt explode.
  • Patterns must be classified What does the above
    pattern mean? Is the subject a bomb, a
    perpetrator, a location?
  • Patterns can not cover the flexibility of the
    natural language. Need better models that go
    beyond the pattern limitations.
  • Event merging is another NP-complete problem. One
    of the few stochastic models for event merging
    Probabilistic Coreference in Information
    Extraction. Andrew Kehler. http//ling.ucsd.edu/k
    ehler/Papers/emnlp97.ps.gz (not covered in this
    presentation)
  • All of the above issues are manually developed,
    which yields high domain development time (larger
    than 40 person hours per domain). This prohibits
    the use of this approach for real-time
    information extraction.

23
Overview
  • What is information extraction?
  • A traditional system and its problems
  • Pattern learning and classification
  • Beyond patterns

24
Automatically Generating Extraction Patterns from
Untagged Text
  • The first system to successfully discover domain
    patterns ? AutoSlog-TS.
  • Automatically Generating Extraction Patterns from
    Untagged Text. Ellen Riloff. http//www.cs.utah.ed
    u/riloff/psfiles/aaai96.pdf
  • The intuition is that domain-specific patterns
    will appear more often in documents related to
    the domain of interest than in unrelated
    documents.

25
Weakly-Supervised Pattern Learning Algorithm
(1/2)
  1. Separate the training document set into relevant
    and irrelevant documents (manual process).
  2. Generate all possible patterns in all documents,
    according to some meta-patterns. Examples below.

Meta Pattern Pattern
ltsubjgt active-verb ltperpetratorgt bombed
active-verb ltdobjgt bombed lttargetgt
infinitive ltdobjgt to kill ltvictimgt
gerund ltdobjgt killing ltvictimgt
ltnpgt prep ltnpgt ltbombgt against lttargetgt
26
Weakly-Supervised Pattern Learning Algorithm
(2/2)
  1. Rank all generated patterns according to the
    formula relevance_rate x log2(frequency), where
    the relevance_rate indicates the ratio of
    relevant instances (i.e. in relevant documents
    versus non-relevant documents) of the
    corresponding pattern, and frequency indicates
    the number of times the pattern was seen in
    relevant documents.
  2. Add the top-ranked pattern to the list of learned
    patterns, and mark all documents where the
    pattern appears as relevant.
  3. Repeat the process from Step 3 for a number of N
    iterations. Hence the output of the algorithm is
    N learned patterns.

27
Examples of Learned Patterns
Patterns learned for the MUC-4 terrorism domain
ltsubjgt exploded
murder of ltnpgt
assasination of ltnpgt
ltsubjgt was killed
ltsubjgt was kidnapped
attack on ltnpgt
ltsubjgt was injured
exploded in ltnpgt
death of ltnpgt
ltsubjgt took_place
28
The Good and the Bad
  • The good
  • Performance very close to the manually-customized
    system
  • The bad
  • Documents must be separated into
    relevant/irrelevant by hand
  • When does the learning process stop?
  • Pattern classification and event merging still
    developed by human experts

29
The ExDisco IE System
  • Automatic Acquisition of Domain Knowledge for
    Information Extraction. Roman Yangarber et al.
    http//www.cs.nyu.edu/roman/Papers/2000-coling-pub
    .ps.gz
  • Quasi automatically separates documents in
    relevant/non-relevant using a set of seed
    patterns selected by the user, e.g. ltcompanygt
    appoint-verb ltpersongt for the MUC-6 management
    succession domain.
  • In addition to ranking patterns, ExDisco ranks
    documents based on how many relevant patterns
    they contain ? immediate application to text
    filtering.

30
Counter-Training for Pattern Discovery
  • Counter-Training in Discovery of Semantic
    Patterns. Roman Yangarber. http//www.cs.nyu.edu/r
    oman/Papers/2003-acl-countertrain-web.pdf
  • Previous approaches are iterative learning
    algorithms, where the output is a continuous
    stream of patterns with degrading precision. What
    is the best stopping point?
  • The approach is to introduce competition among
    multiple scenario learners (e.g. management
    succession, mergers and acquisitions, legal
    actions). Stop when the learners wander in the
    territories already discovered by others.
  • Pattern frequency weighted by the document
    relevance.
  • Document relevance receives negative weight based
    on how many patterns from a different scenario it
    contains.
  • The learning for each scenario stops when the
    best pattern has a negative score.

31
Pattern Classification
  • Multiple systems perform successful pattern
    acquisition by now, e.g. attacked ltnpgt is
    discovered for the bombing domain. But what does
    the ltnpgt actually mean? Is it the victim, the
    physical target, or something else?
  • An Empirical Approach to Conceptual Case Frame
    Acquisition. Ellen Riloff and Mark Schmelzenbach.
    http//www.cs.utah.edu/riloff/psfiles/wvlc98.pdf

32
Pattern Classification Algorithm
  • Requires 5 seed words per semantic category (e.g.
    PERPETRATOR, VICTIM etc)
  • Builds a context for each semantic category by
    expanding the seed word set with words that
    appear frequently in the proximity of previous
    seed words.
  • Uses AutoSlog to discover domain patterns.
  • Builds a semantic profile for each discovered
    pattern based on the overlap between the noun
    phrases contained in the pattern and the previous
    semantic contexts.
  • Each pattern is associated with the best ranked
    semantic category.

33
Pattern Classification Example
Semantic Category Probability
BUILDING 0.10
CIVILIAN 0.03
DATE 0.05
GOVOFFICIAL 0.03
LOCATION 0.03
MILITARYPEOPLE 0.09
TERRORIST 0.00
VEHICLE 0.03
WEAPON 0.00
Semantic profile for the pattern attack on ltnpgt
34
Other Pattern-Learning Systems RAPIER (1/2)
  • Relational Learning of Pattern-Match Rules for
    Information Extraction. Mary Elaine Califf and
    Raymond J. Mooney. http//citeseer.nj.nec.com/cali
    ff98relational.html
  • Uses Inductive Logic Programming (ILP) to
    implement a bottom-up generalization of patterns.
  • Patterns specified with pre-fillers (conditions
    on the tokens preceding the pattern), fillers
    (conditions on the tokens included in the
    pattern), and post-fillers (conditions on the
    tokens following the pattern)
  • The only linguistic resource used is a
    part-of-speech (POS) tagger. No parser (full or
    partial) used!
  • More robust to unstructured text.
  • Applicability limited to simpler domains (e.g.
    job postings)

35
Other Pattern-Learning Systems RAPIER (2/2)
located in Atlanta, Georgia
Pre-filler Filler Post-filler
word located, tag VBN word in, tag IN word Atlanta, tag NNP word , , tag , word Georgia, tag NNP
offices in Kansas City, Missouri
Pre-filler Filler Post-filler
word offices, tag NNS word in, tag IN word Kansas, tag NNP word City, tag NNP word , , tag , word Missouri, tag NNP
Pre-filler Filler Post-filler
word in, tag IN list len 2, tag NNP word , , tag , semantic STATE, tag NNP
36
Other Pattern-Learning Systems
  • SRV
  • Toward General-Purpose Learning for Information
    Extraction. Dayne Freitag. http//citeseer.nj.nec.
    com/freitag98toward.html
  • Supervised machine learning based on FOIL.
    Constructs HORN clauses from examples.
  • Active learning
  • Active Learning for Information Extraction with
    Multiple View Feature Sets. Rosie Jones et al.
    http//www.cs.utah.edu/riloff/psfiles/ecml-wkshp0
    3.pdf
  • Active learning with multiple views. Ion Muslea.
    http//www.ai.sri.com/muslea/PS/dissertation-02.p
    df
  • Interactively learn and annotate data to reduce
    human effort in data annotation.

37
Overview
  • What is information extraction?
  • A traditional system and its problems
  • Pattern learning and classification
  • Beyond patterns

38
The Need to Move Beyond the Pattern-Based
Paradigm (1/2)
The space shuttle Challenger/AGENT_OF_DEATH flew
apart over Florida like a billion-dollar
confetti killing/MANNER_OF_DEATH six
astronauts/DECEASED.
Hard using surface-level information
Easier using full parse trees
AGENT_OF_DEATH
MANNER_OF_DEATH
DECEASED
39
The Need to Move Beyond the Pattern-Based
Paradigm (2/2)
  • Pattern-based systems
  • Have limited power due to the strict formalism ?
    accuracy lt 60 without additional discourse
    processing.
  • Were developed also due to the historical
    conjecture there was no high-performance full
    parser widely available.
  • Recent NLP developments
  • Full syntactic parsing ? 90 Collins,
    1997Charniak, 2000.
  • Predicate-argument frames provide open-domain
    event representation Surdeanu et al, 2003,
    Gildea and Jurafsky, 2002Gildea and Palmer,
    2002.

40
Goal
  • Novel IE paradigm
  • Syntactic representation provided by full parser.
  • Event representation based on predicate-argument
    frames.
  • Entity coreference provides pronominal and
    nominal anaphora resolution (future work).
  • Event merging merges similar/overlapping events
    (future work).
  • Advantages
  • High accuracy due to enhanced syntactic and
    semantic processing.
  • Minimal domain customization time because most
    components are open-domain.

41
Proposition Bank Overview
S
VP
NP
VP
PP
NP
The futures halt
was
assailed
by
Big Board floor traders
ARG1 entity assailed
PRED
ARG0 agent
  • A one million word corpus annotated with
    predicate argument structures Kingsbury, 2002.
    Currently only predicates lexicalized by verbs.
  • Numbered arguments from 0 to 5. Typically ARG0
    agent, ARG1 direct object or theme, ARG2
    indirect object, benefactive, or instrument, but
    they are predicate dependent!
  • Functional tags ARMG-LOC locative, ARGM-TMP
    temporal, ARGM-DIR direction.

42
Block Architecture
identification of pred-arg structures
43
Walk-Through Example
The space shuttle Challenger flew apart over
Florida like a billion-dollar confetti killing
six astronauts.
44
The Model
  • Consists of two tasks (1) identifying parse tree
    constituents corresponding to predicate
    arguments, and (2) assigning a role to each
    argument constituent.
  • Both tasks modeled using C5.0 decision tree
    learning, and two sets of features Feature Set 1
    adapted from Gildea and Jurafsky, 2002, and
    Feature Set 2, novel set of semantic and
    syntactic features.

45
Feature Set 1
  • POSITION (pos) indicates if constituent appears
    before predicate in sentence. E.g. true for ARG1
    and false for ARG2.
  • VOICE (voice) predicate voice (active or
    passive). E.g. passive for PRED.
  • HEAD WORD (hw) head word of the evaluated
    phrase. E.g. halt for ARG1.
  • GOVERNING CATEGORY (gov) indicates if an NP is
    dominated by a S phrase or a VP phrase. E.g. S
    for ARG1, VP for ARG0.
  • PREDICATE WORD the verb with morphological
    information preserved (verb), and the verb
    normalized to lower case and infinitive form
    (lemma). E.g. for PRED verb is assailed, lemma
    is assail.
  • PHRASE TYPE (pt) type of the syntactic phrase as
    argument. E.g. NP for ARG1.
  • PARSE TREE PATH (path) path between argument
    and predicate. E.g. NP ? S ? VP ? VP for ARG1.
  • PATH LENGTH (pathLen) number of labels stored in
    the predicate-argument path. E.g. 4 for ARG1.

46
Observations about Feature Set 1
  • Because most of the argument constituents are
    prepositional attachments (PP) and relative
    clauses (SBAR), often the head word (hw) is not
    the most informative word in the phrase.
  • Due to its strong lexicalization, the model
    suffers from data sparsity. E.g. hw used lt 3.
    The problem can be addressed with a back-off
    model from words to part of speech tags.
  • The features in set 1 capture only syntactic
    information, even though semantic information
    like named-entity tags should help. For example,
    ARGM-TMP typically contains DATE entities, and
    ARGM-LOC includes LOCATION named entities.
  • Feature set 1 does not capture predicates
    lexicalized by phrasal verbs, e.g. put up.

47
Feature Set 2 (1/2)
  • CONTENT WORD (cw) lexicalized feature that
    selects an informative word from the constituent,
    other than the head. Selection heuristics
    available in the paper. E.g. June for the
    phrase in last June.
  • PART OF SPEECH OF CONTENT WORD (cPos) part of
    speech tag of the content word. E.g. NNP for the
    phrase in last June.
  • PART OF SPEECH OF HEAD WORD (hPos) part of
    speech tag of the head word. E.g. NN for the
    phrase the futures halt.
  • NAMED ENTITY CLASS OF CONTENT WORD (cNE) The
    class of the named entity that includes the
    content word. 7 named entity classes (from the
    MUC-7 specification) covered. E.g. DATE for in
    last June.

48
Feature Set 2 (2/2)
  • BOOLEAN NAMED ENTITY FLAGS set of features that
    indicate if a named entity is included at any
    position in the phrase
  • neOrganization set to true if an organization
    name is recognized in the phrase.
  • neLocation set to true if a location name is
    recognized in the phrase.
  • nePerson set to true if a person name is
    recognized in the phrase.
  • neMoney set to true if a currency expression is
    recognized in the phrase.
  • nePercent set to true if a percentage expression
    is recognized in the phrase.
  • neTime set to true if a time of day expression
    is recognized in the phrase.
  • neDate set to true if a date temporal expression
    is recognized in the phrase.
  • PHRASAL VERB COLLOCATIONS set of two features
    that capture information about phrasal verbs
  • pvcSum the frequency with which a verb is
    immediately followed by any preposition or
    particle.
  • pvcMax the frequency with which a verb is
    followed by its predominant preposition or
    particle.

49
Experiments (1/3)
  • Trained on PropBank release 2002/7/15, Treebank
    release 2, both without Section 23. Named entity
    information extracted using CiceroLite.
  • Tested on PropBank and Treebank section 23. Used
    gold-standard trees from Treebank, and named
    entities from CiceroLite.
  • Task 1 (identifying argument constituents)
  • Negative examples any Treebank phrases not
    tagged in PropBank. Due to memory limitations, we
    used 11 of Treebank.
  • Positive examples Treebank phrases (from the
    same 11 set) annotated with any PropBank role.
  • Task 2 (assigning roles to argument
    constituents)
  • Due to memory limitations we limited the example
    set to the first 60 of PropBank annotations.

50
Experiments (2/3)
Features Arg P Arg R Arg F1 Role A
FS1 84.96 84.26 84.61 78.76
FS1 POS tag of head word 92.24 84.50 88.20 79.04
FS1 content word and POS tag 92.19 84.67 88.27 80.80
FS1 NE label of content word 83.93 85.69 84.80 79.85
FS1 phrase NE flags 87.78 85.71 86.73 81.28
FS1 phrasal verb information 84.88 82.77 83.81 78.62
FS1 FS2 91.62 85.06 88.22 83.05
FS1 FS2 boosting 93.00 85.29 88.98 83.74
51
Experiments (3/3)
  • Four models compared
  • Gildea and Palmer, 2002
  • Gildea and Palmer, 2002, our implementation
  • Our model with FS1
  • Our model with FS1 FS2 boosting

Model Implementation Arg F1 Role A
Statistical Gildea and Palmer - 82.8
Statistical This study 71.86 78.87
Decision Trees FS1 84.61 78.76
Decision Trees FS1 FS2 boosting 88.98 83.74
52
Mapping Predicate-Argument Structures to
Templettes
  • The mapping rules from predicate-arguments
    structures to templette slots are currently
    manually produced, using training texts and the
    corresponding templettes. Effort per domain lt 3
    person hours, if training information is
    available.
  • We focused on two Event99 domains
  • Market change tracks changes of financial
    instruments. Relevant slots INSTRUMENT
    description of the financial instrument
    AMOUNT_CHANGE change amount and CURRENT_VALUE
    current instrument value after change.
  • Death extracts person death events. Relevant
    slots DECEASED person deceased
    MANNER_OF_DEATH manner of death and
    AGENT_OF_DEATH entity that caused the death
    event.

53
Mappings for Event99 Death and Market Change
Domains
54
Experimental Setup
  • Three systems compared
  • This model with predicate-argument structures
    detected using the statistical approach.
  • This model with predicate-argument structures
    detected using decision trees.
  • Cascaded Finite-State-Automata system (Cicero).
  • In all systems entity coreference and event
    fusion disabled.

55
Experiments
System Market Change Death
Pred/Args Statistical 68.9 58.4
Pred/Args Inductive 82.8 67.0
FSA 91.3 72.7
System Correct Missed Incorrect
Pred/Args Statistical 26 16 3
Pred/Args Inductive 33 9 2
FSA 38 4 2
56
The good and the bad
  • The good
  • The method achieves over 88 F-measure for the
    task of identifying argument constituents, and
    over 83 accuracy for role labeling.
  • The model scales well to unknown predicates
    because predicate lexical information is used for
    less than 5 of the branching decisions.
  • Domain customization of the complete IE system is
    less than 3 person hours per domain because most
    of the components are open-domain.
    Domain-specific components can be modeled with
    machine learning (future work).
  • Performance degradation versus a fully-customized
    IE system is only 10. Will be further decreased
    by including coreference resolution (open-domain)
    and event fusion (domain-specific).
  • The bad
  • Currently PropBank provides annotations only for
    verb-based predicates. Noun-noun relations cannot
    be modeled for now.
  • Can not be applied to unstructured text, where
    full parsing does not work.
  • Slower than the cascaded FSA models.

57
Other Pattern-Free Systems
  • Algorithms That Learn To Extract Information.
    BBN Description Of The Sift System As Used For
    MUC-7. Scott Miller et al. http//citeseer.nj.nec.
    com/miller98algorithms.html
  • Probabilistic model with features extracted from
    full parse trees enhanced with NEs
  • Kernel Methods for Relation Extraction. Dmitry
    Zelenko and Chinatsu Aone. http//citeseer.nj.nec.
    com/zelenko02kernel.html
  • Tree-based SVM kernels used to discover EELD
    relations.
  • Automatic Pattern Acquisition for Japanese
    Information Extraction. Kiyoshi Sudo et al.
    http//citeseer.nj.nec.com/sudo01automatic.html
  • Learns parse trees that subsume the information
    of interest.

58
End
GrĂ cies!
Write a Comment
User Comments (0)
About PowerShow.com