Title: Complex Sentence Processor
1Complex Sentence Processor
- Using Link Grammar to simplify complex sentences
2Problem Statement
- Extraction of gene-gene interactions from
unstructured biomedical text. - Corpus Biomedical abstracts, curated text
- Rich in interactions
- Freely available
- Approach verb based extraction.
- John played the pipes.
Crux of the sentence
3Sentences in abstracts
- Interactions specified in creative ways
- HMBA inhibits MEC-1 cell proliferation.
- GBMs commonly overexpress the oncogenes EGFR and
PDGFR, and contain mutations and deletions of
tumor suppressor genes PTEN and TP53. - Protein kinase B (PKB) has emerged as the focal
point for many signal transduction pathways,
regulating multiple cellular processes such as
glucose metabolism, transcription, apoptosis,
cell proliferation, angiogenesis, and cell
motility.
4Problems that come up
- Anaphora resolution Anaphora
- Pronominals It activates HMBA.
- Sortal anaphora Both enzymes are
phosphorylated. - Event anaphora This reaction acts in a mediated
environment. - Multiple interactions - Complex sentences
Most of the tumor-suppressive properties of Pten
are dependent on its lipid phosphatase activity,
which inhibits the phosphatidylinositol-3'-kinase
(PI3K)/Akt signaling pathway through
dephosphorylation of phosphatidylinositol-(3,4,5)
-triphosphate
5Our solution Pronoun resolution
- Pronouns in abstracts third person
- It, itself, them, themselves.
- Replace pronouns with first noun group that
matches the number. - References in the absence of pronouns handled
by Link Grammar.
6Pronoun Resolution walkthrough
- Ku loads onto dsDNA ends and it can diffuse along
the DNA in an energy-independent manner.
Ku loads onto dsDNA ends and Ku can diffuse along
the DNA in an energy-independent manner.
When breast cancers were examined for NGAL mRNA
and protein levels, they were found to exhibit
heterogeneous expression.
When breast cancers were examined for NGAL mRNA
and protein levels , breast cancers were found
to exhibit heterogeneous expression .
7Complex Sentence Structures
- Independent clauses with connectives
- Many dependent clauses with one independent
clause with / without connectives - Multiple agents and goals in a single clause
Gene14 binds to Gene15 in response to 1-b-Gene16
or methylmethanesulfonate this interaction does
not require Gene17-Gene18-Gene19.
Gene57-Gene58-Gene59-Gene60 is blocked by Gene61,
which binds to Gene62-Gene63-Gene64-Gene65.
Gene96 or Gene97 competes with Gene98 for binding
to Gene99 and Gene100 or Gene101 stimulates
Gene102-Gene103-Gene104 in vitro in the absence
of Gene105.
8Our Solution Complex Sentences
- Identify clauses in complex sentences.
- Build simple sentences from the clauses.
- Tool used Link Grammar Parser Link
- Clause Format.
- Subject Verb Object Modifying phrase
(Adverbial Phrase/ Prepositional Phrase)
9CSP Goal
- Upon growth factor stimulation of quiescent
cells, Gene100 declines - late in Gene101 and Gene102 is replaced by
Gene103, which is absent - in quiescent cells.
Upon growth factor stimulation of quiescent
cells, Gene100 declines late in Gene101.
Gene102 is replaced by Gene103.
Gene103 is absent in quiescent cells.
10Complex Sentence Processor
- E18Upon growth factor stimulation of quiescent
cells, Gene100 declines late in Gene101 and
Gene102 is replaced by Gene103, which is absent
in quiescent cells. - C2In Gene11-Gene12, Gene13 stimulates
Gene14-Gene15-Gene16-Gene17.
CSP
E18upon growth factor stimulation of quiescent
cells , Gene100declineslatein
Gene101 E18Gene102is replacedby Gene103 ,
which E18Gene103 is absentin quiescent
cells C2in Gene11-Gene12 , Gene13stimulatesG
ene14-Gene15-Gene16-Gene17
Subject Verb Objects Modifying Phrases
Upon declines late in Gene101
11CSP Data Flow
Pronoun Resolution module
Prolog
Abstracts
Gene Tagger
Pre-Processor
Link Grammar, Java
Complex Sentence Processor
Sentence database
12Illustration
13Partial List of References
- Link Daniel Sleator and Davy Temperley. 1991.
Parsing English with a Link Grammar. Carnegie
Mellon University Computer Science technical
report CMU-CS-91-196, October 1991. - Kohn Kohn, K. W. (1999). "Molecular Interaction
Map of the Mammalian Cell Cycle Control and DNA
Repair Systems." Molecular Biology of the cell
10 2703-2734. - Locuslink Pruitt, K. D. and D. R. Maglott
(2001). "RefSeq and LocusLink NCBI gene-centered
resources." Nucleic Acids Res 29(1) 137-140.
(http//www.ncbi.nlm.nih.gov/LocusLink/ ) - Anaphora Castano, J., Zhang, J., Pustejovsky,
J., Anaphora Resolution in Biomedical Literature