The BIOCREATIVE Task in SEER - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

The BIOCREATIVE Task in SEER

Description:

Thousands of nucleotides link to form a DNA/RNA molecule. Nucleotide ... as ultraviolet erythema in guinea pigs , carrageenin edema , evans blue and ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 23
Provided by: sdin7
Category:

less

Transcript and Presenter's Notes

Title: The BIOCREATIVE Task in SEER


1
  • The BIOCREATIVE Task in SEER

2
Outline
  • Background for biomedical information extraction
    and BIOCREATIVE
  • BIOCREATIVE NER Task
  • Stanford-Edinburgh System
  • Problems

3
Terms and Resources
4
Biotechnology Information Explosion
David Landsman NCBI Presentation
5
NER in the Biomedical Domain
  • Many types of entities can be studied in the
    biomedical domain (drug names, chemicals)
  • Much research has focused on molecular biological
    entities, particularly genes and proteins

6
Gene Names
  • Genes and gene products are constantly being
    discovered and new names invented
  • Nomenclatures exist but vary from organism to
    organism
  • Diverse
  • bride of frizzled disco, cheap date, broken
    heart
  • REP2, RFM
  • Ambiguous
  • With other genes
  • Acronyms
  • With proteins, where genes and their products are
    often referred to by the same name. (1st gene in
    LocusLink is officially alpha-1-B-glycoprotein)

7
Varying Tasks, Results and Evaluation Methods
8
BIOCREATIVE Motivations
  • Seeking to be the MUC of the biomedical
    information extraction field

9
The BIOCREATIVE NER Task
  • Given a single sentence from an abstract, to
    identify all mentions of genes
  • (or proteins where there is ambiguity)
  • In November changed the task to identify all
    mentions of genes and proteins (but not
    distinguishing between them)

10
The BIOCREATIVE NER Data
Data consisted of MEDLINE abstracts annotated for
the single NE GENE
11
The BIOCREATIVE NER Evaluation Method
  • Only exact matches to the gold standard (which
    includes alternate correct boundaries for several
    cases) are counted as correct.
  • Genes detected with incorrect boundaries are
    doubly penalized as false negatives and false
    positives.
  • chloramphenicol acetyl transferase
    reporter gene (FN)
  • transferase reporter gene (FP)

12
Outline
  • Background for BIOCREATIVE and biomedical
    information extraction
  • BIOCREATIVE NER Task
  • ? Stanford-Edinburgh System
  • Problems

13
Baseline System
  • Maximum Entropy Tagger in Java
  • Based on Klein et al (2003) CoNLL submission
  • Baseline Performance
  • Precision 0.79 Recall 0.74 F-Score 0.76
  • Efforts were mostly in trying different features,
    including different POS taggers, NP-chunking,
    Parsing, Gazetteers, Web, Abbreviations, Word
    Shapes, Tokenization

14
Feature Set
15
Features External
16
Postprocessing
  • Discarded results with mismatched parentheses
  • Different boundaries were detected when searching
    the sentence forwards versus backwards
  • Unioned the results of both in cases where
    boundary disagreements meant that one detected
    gene was contained in the other, we kept the
    shorter gene

17
Final System and Results
  • Trained on trainingdevelopment data (1000
    sentences)
  • 1,247,775 features

18
Outline
  • Background for BIOCREATIVE and biomedical
    information extraction
  • BIOCREATIVE NER Task
  • Stanford-Edinburgh System
  • ?Problems

19
Performance Discrepancy
20
Gene Entity Pitfalls
  • Language is complex
  • Stably transfected human kidney 293 cells
    expressing the wild type rat LH / CG receptor (
    rLHR ) or receptors with C-terminal tails
    truncated at residues 653 , 631 , or 628
    (designated rLHR-t653 , rLHR-t631 , and rLHR-t628
    ) were used to probe the importance of this
    region on the regulation of hormonal
    responsiveness.
  • Gene names are frequently uncapitalized
  • The chick axon-associated surface glycoprotein
    neurofascin is implicated in axonal growth and
    fasciculation as revealed by antibody
    perturbation experiments .
  • Looks weird is not indicative
  • A newly synthesized anti-inflammatory agent ,
    Y-8004 demonstrated a greater inhibition than did
    indomethacin ( IM ) . on inflammatory response
    such as ultraviolet erythema in guinea pigs ,
    carrageenin edema , evans blue and
    carrageenin-induced pleuritis and acetic
    acid-induced peritonitis in rats .

21
Boundary Problems
  • Gene names can be long and complex
  • 37 of our false positives and 39 of false
    negatives were boundary problems
  • Gold chloramphenicol acetyl transferase reporter
    gene
  • chloramphenicol acetyl transferase
    reporter gene deletion
  • Gold estrogen receptor
  • estrogen receptor ligand

22
Interannotator Agreement
  • MUC-7 interannotator agreement was measured at 97
    F-Score
  • Demetriou and Gaizauskas
  • Interannotator agreement for biomedical terms
    at 89 F-Score
  • Hirschman measured interannotator agreement for
    gene names at 87 F-Score
Write a Comment
User Comments (0)
About PowerShow.com