Complex Sentence Processor - PowerPoint PPT Presentation

About This Presentation
Title:

Complex Sentence Processor

Description:

Complex Sentence Processor Using Link Grammar to simplify complex sentences Problem Statement Extraction of gene-gene interactions from unstructured biomedical text. – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 14
Provided by: publicAs5
Category:

less

Transcript and Presenter's Notes

Title: Complex Sentence Processor


1
Complex Sentence Processor
  • Using Link Grammar to simplify complex sentences

2
Problem Statement
  • Extraction of gene-gene interactions from
    unstructured biomedical text.
  • Corpus Biomedical abstracts, curated text
  • Rich in interactions
  • Freely available
  • Approach verb based extraction.
  • John played the pipes.

Crux of the sentence
3
Sentences in abstracts
  • Interactions specified in creative ways
  • HMBA inhibits MEC-1 cell proliferation.
  • GBMs commonly overexpress the oncogenes EGFR and
    PDGFR, and contain mutations and deletions of
    tumor suppressor genes PTEN and TP53.
  • Protein kinase B (PKB) has emerged as the focal
    point for many signal transduction pathways,
    regulating multiple cellular processes such as
    glucose metabolism, transcription, apoptosis,
    cell proliferation, angiogenesis, and cell
    motility.

4
Problems that come up
  • Anaphora resolution Anaphora
  • Pronominals It activates HMBA.
  • Sortal anaphora Both enzymes are
    phosphorylated.
  • Event anaphora This reaction acts in a mediated
    environment.
  • Multiple interactions - Complex sentences

Most of the tumor-suppressive properties of Pten
are dependent on its lipid phosphatase activity,
which inhibits the phosphatidylinositol-3'-kinase
(PI3K)/Akt signaling pathway through
dephosphorylation of phosphatidylinositol-(3,4,5)
-triphosphate
5
Our solution Pronoun resolution
  • Pronouns in abstracts third person
  • It, itself, them, themselves.
  • Replace pronouns with first noun group that
    matches the number.
  • References in the absence of pronouns handled
    by Link Grammar.

6
Pronoun Resolution walkthrough
  • Ku loads onto dsDNA ends and it can diffuse along
    the DNA in an energy-independent manner.

Ku loads onto dsDNA ends and Ku can diffuse along
the DNA in an energy-independent manner.
When breast cancers were examined for NGAL mRNA
and protein levels, they were found to exhibit
heterogeneous expression.
When breast cancers were examined for NGAL mRNA
and protein levels , breast cancers were found
to exhibit heterogeneous expression .
7
Complex Sentence Structures
  • Independent clauses with connectives
  • Many dependent clauses with one independent
    clause with / without connectives
  • Multiple agents and goals in a single clause

Gene14 binds to Gene15 in response to 1-b-Gene16
or methylmethanesulfonate this interaction does
not require Gene17-Gene18-Gene19.
Gene57-Gene58-Gene59-Gene60 is blocked by Gene61,
which binds to Gene62-Gene63-Gene64-Gene65.
Gene96 or Gene97 competes with Gene98 for binding
to Gene99 and Gene100 or Gene101 stimulates
Gene102-Gene103-Gene104 in vitro in the absence
of Gene105.
8
Our Solution Complex Sentences
  • Identify clauses in complex sentences.
  • Build simple sentences from the clauses.
  • Tool used Link Grammar Parser Link
  • Clause Format.
  • Subject Verb Object Modifying phrase
    (Adverbial Phrase/ Prepositional Phrase)

9
CSP Goal
  • Upon growth factor stimulation of quiescent
    cells, Gene100 declines
  • late in Gene101 and Gene102 is replaced by
    Gene103, which is absent
  • in quiescent cells.

Upon growth factor stimulation of quiescent
cells, Gene100 declines late in Gene101.
Gene102 is replaced by Gene103.
Gene103 is absent in quiescent cells.
10
Complex Sentence Processor
  • E18Upon growth factor stimulation of quiescent
    cells, Gene100 declines late in Gene101 and
    Gene102 is replaced by Gene103, which is absent
    in quiescent cells.
  • C2In Gene11-Gene12, Gene13 stimulates
    Gene14-Gene15-Gene16-Gene17.

CSP
E18upon growth factor stimulation of quiescent
cells , Gene100declineslatein
Gene101 E18Gene102is replacedby Gene103 ,
which E18Gene103 is absentin quiescent
cells C2in Gene11-Gene12 , Gene13stimulatesG
ene14-Gene15-Gene16-Gene17
Subject Verb Objects Modifying Phrases
Upon declines late in Gene101

11
CSP Data Flow
Pronoun Resolution module
Prolog
Abstracts
Gene Tagger
Pre-Processor
Link Grammar, Java
Complex Sentence Processor
Sentence database
12
Illustration
13
Partial List of References
  • Link Daniel Sleator and Davy Temperley. 1991.
    Parsing English with a Link Grammar. Carnegie
    Mellon University Computer Science technical
    report CMU-CS-91-196, October 1991.
  • Kohn Kohn, K. W. (1999). "Molecular Interaction
    Map of the Mammalian Cell Cycle Control and DNA
    Repair Systems." Molecular Biology of the cell
    10 2703-2734.
  • Locuslink Pruitt, K. D. and D. R. Maglott
    (2001). "RefSeq and LocusLink NCBI gene-centered
    resources." Nucleic Acids Res 29(1) 137-140.
    (http//www.ncbi.nlm.nih.gov/LocusLink/ )
  • Anaphora Castano, J., Zhang, J., Pustejovsky,
    J., Anaphora Resolution in Biomedical Literature
Write a Comment
User Comments (0)
About PowerShow.com