Annotators' Overview of the ITR Grant - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Annotators' Overview of the ITR Grant

Description:

qualitatively better methods for automatically extracting information from the ... Max.: half-time = 20 hours. Min.: 10 hours. Weekly meeting ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 15
Provided by: IRCS
Category:

less

Transcript and Presenter's Notes

Title: Annotators' Overview of the ITR Grant


1
Annotators' Overview of the ITR Grant
  • Information Extraction from the Biomedical
    Literature

2
Goal
  • qualitatively better methods for automatically
    extracting information from the biomedical
    literature

3
Example of eventual goal
  • Program input
  • Amiodarone weakly inhibited CYP2C9, CYP2D6, and
    CYP3A4-mediated activities with Ki values of
    45.1--271.6 µM.
  • Output database entries meaning
  • amiodarone inhibits CYP2C9 with Ki45.1--271.6
  • amiodarone inhibits CYP2D6 with Ki45.1--271.6
  • amiodarone inhibits CYP3A4 with Ki45.1--271.6

4
IE techniques
  • high-accuracy parsing (Initial annotation
    work)
  • shallow semantic analysis (Later annotation
    work)
  • integration of large volumes of diverse data

5
Initial focus topics
  • drug developmentwith the Knowledge Integration
    and Discovery Systems group at GlaxoSmithKline
    GSK
  • pediatric oncology with the eGenome group at
    Children's Hospital of Pennsylvania CHOP

6
Develop and test
  • new general methods for information extraction
    IE from text, based on ongoing Penn research in
    corpus-based
  • parsing (Treebanking)
  • predicate-argument analysis (Propbanking)
  • reference resolution (coreference annotation,
    named entity tagging)

7
Human and machine annotation Year 1
  • defining, organizing and implementing various
    kind of (human) annotation of biomedical texts
  • creating (mostly by statistical methods)
    automatic analyzers to apply these same kinds of
    annotation, especially to improve productivity of
    human annotators

8
Deliverables (1)
  • All in biomedical domain
  • Large corpora
  • Part of speech (lexical)
  • Treebank (syntactic)
  • Named entities (terminological)
  • Propbank (shallow semantic)
  • Factbank (entities and relations)
  • Lexicons and tools

9
Deliverables (2)
  • abstracts and full-text articles annotated with
    entities and relations of interest to researchers
    (Factbanks), such as
  • enzyme inhibition by various compounds
  • genotype/phenotype connections

10
Deliverables (3)
  • broad-coverage lexicons and tools for analysis of
    text

11
Summary Types of annotation
  • Part of speech (POS)
  • (required for automated Treebanking)
  • Syntax (Treebank)
  • Terminology (Named entities)
  • Shallow semantics (Propbank)
  • Entities and relations (Factbank)
  • now in progress starting June 2003

12
Expectations Time
  • Work week
  • Max. half-time 20 hours
  • Min. 10 hours
  • Weekly meeting
  • These may be modified for particular
    circumstances.

13
Expectations Pay
  • Starting 10/hr and up, depending on background
    and task
  • Increasing with experience and productivityThis
    includes getting out of introductory training and
    into production.

14
Expectations Other
  • Training and studyLearning how to do it is work
    and takes effort, not just looking at the manual
    and coming to meetings.
  • Discussion
  • With each other
  • With leaders
Write a Comment
User Comments (0)
About PowerShow.com