Biological information extraction from natural language text - PowerPoint PPT Presentation

About This Presentation
Title:

Biological information extraction from natural language text

Description:

Biological information extraction from natural language text Chitta Baral Arizona State University Goal Extract `simple information from text. – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 15
Provided by: publicAs5
Category:

less

Transcript and Presenter's Notes

Title: Biological information extraction from natural language text


1
Biological information extraction from natural
language text
  • Chitta Baral
  • Arizona State University

2
Goal
  • Extract simple information from text.
  • This is somewhat simpler than complete natural
    language understanding
  • Examples of simple information (structure is
    anticipated)
  • John was in Phoenix in March
  • at( John, Phoenix, March)
  • Protein-x in presence of enzyme y breaks down to
    components z and w.
  • breaks_in_presence_of( x, y, z , w )
  • Not so simple information (meta-informations,
    unanticipated or untargeted structure)
  • John only visits cities where he has a friend

3
Main approach
  • Use extraction rules that can extract the
    targeted information
  • Extract P(X,Y,Z) from a sentence if in that
    sentence X is a proper noun, Y is a verb that
    immediately follows the noun and Z is a noun
    phrase that immediately follows Y.
  • Coming up with extraction rules
  • Manually
  • Learning extraction rules
  • Develop your own learning program
  • Cast your problem appropriately so as to use
    existing learning programs (such as Progol, FOIL,
    etc.)
  • Take an existing information extraction system
    and make appropriate changes to it so as to make
    it applicable for our case

4
Learning extraction rules
  • Mark the text of what is to be extracted
  • Parse the text (with markings) and do part of
    speech tagging
  • Extract pattern
  • Use the pattern on other text, and add conditions
    or modify pattern to avoid false positives.
  • Repeat the above steps until an acceptable
    performance is achieved.

5
An example
  • HMBA could inhibit the MEC-1 cell proliferation
    by down-regulation of PCNA expression, it could
    also induce apoptosis effectively that might be
    through the way of up-regulation of bax and bcl-2
    gene expression.
  • Interaction(HMBA, inhibit, MEC-1 cell
    proliferation)
  • Interaction(HMBA, down-regulation, PCNA
    expression)

6
Parsing and POS tagging
  • word(tag 'NNP' ,arg(1),'HMBA'),
  • vg(word(tag 'MD','could'),
  • word(tag 'VB'
    ,arg(2),'inhibit')),
  • ng(arg(3), word(tag 'DT','the'),
  • word(tag
    'NNP','MEC-1'),
  • word(tag 'NN','cell'),
  • word(tag
    'NN','proliferation')
  • ),
  • word(tag 'IN','by'),
  • word(tag 'NN','down-regulation'),
  • word(tag 'IN','of'),
  • ng(word(tag 'NNP','PCNA'),
  • word(tag 'NN','expression')
  • ),
  • word(tag ',',','),
  • word(tag 'PRP','it'),
  • vg(word(tag 'MD','could'),
  • word(tag 'RB','also'),
  • word(tag 'NN','apoptosis'),
  • word(tag 'RB','effectively'),
  • word(tag 'WDT','that'),
  • vg(word(tag 'MD','might'),
  • word(tag 'VB','be')),
  • word(tag 'IN','through'),
  • ng(word(tag 'DT','the'),
  • word(tag 'NN','way')
  • ),
  • word(tag 'IN','of'),
  • word(tag 'NN','up-regulation'),
  • word(tag 'IN','of'),
  • word(tag 'NN','bax'),
  • word(tag 'CC','and'),
  • ng(word(tag 'JJ', 'bcl-2'),
  • word(tag 'NN','gene'),
  • word(tag 'NN','expression')
  • )

7
An alternate way to code
  • sentence(s).
  • first(s, p1).
  • next(p1,p2). next(p2,p3). next(p3,p4).
    next(p4,p5).
  • next(p5,p6). next(p6,p7). next(p7,p8).
    next(p8,p9).
  • next(p9,p10). next(p10,p11). next(p11,p12).
    next(p12,p13).
  • next(p13,p14). next(p14,p15). next(p15,p16).
    next(p16,p17).
  • next(p17,p18). next(p18,p19). next(p19,p20).
    next(p20,empty).
  • type(p1, word). tag(p1, nnp). content(p1, hmba).
    marked(p1,arg1).
  • type(p2, vg).

8
POS tags
  • NNP proper noun
  • MD -- modal
  • VB verb base form
  • DT -- determiner
  • NN common noun
  • IN -- preposition
  • PRP
  • RB -- adverb
  • WDT --
  • CC coordinating conjunction
  • JJ -- adjective

9
Extracted interaction rule
  • extract( word(tag NNP,_h18724),
  • word(tag VB,_h18725),
  • ng(_h18726)
  • ,
  • interact(_h18724,_h18725,_h18726),
  • true).

10
Tagged text
  • Interact (HMBA,
  • word (tag MD, could),
  • word (tag VB, inhibit),
  • word (tag DT, the),
  • word (tag NNP,MEC-1),
  • word (tag NN, cell),
  • word (tag NN,
    proliferation)).
  • Interact (HMBA, down-regulation,
  • word (tag NNP,PCNA),
  • word (tag NN,
    expression)).

11
Prolog code for learning extraction rules
  • -import append/3 from basics.
  • learn( S)- find_interact( S,I,P), nl, write( I),
    nl, write( P), write_file( P,I).
  • P extraction pattern
  • I interaction fact
  • S tagged text
  • find_interact(word(T,arg(1),_) R, interact
    (A,B,C), P ) -
  • AX, pattern ( word (T,A)PR,P),
  • find_interact (SR, interact
    (A,B,C),PR).
  • More rules for find_interact.
  • pattern( W,P)- PW.
  • write_file( P,I)- Eextract (P, I, true), open(
    'extract.P', append, F),
    write( F, E), write( F,'.'), nl( F), close(
    F).

12
A set of extraction patterns
  • extract( word (tag 'NNP',_h13664),word(tag
    'VB',_h13665),
  • word (tag
    'NNP',_h13666),interact(_h13664,_h13665,_h13666)
    ,true).
  • extract( word (tag 'NNP',_h62915),vg(_h62916),
    ng(_h62917),
  • interact(_h62915,_h62916,_h6291
    7),true).
  • extract( word (tag 'NNP',_h112469), word
    (tag 'NN',_h112470),
  • ng(_h112471),
    interact(_h112469,_h112470,_h112471),true).
  • extract( word (tag 'NNP',_h161953),word(tag
    'NN',_h161954),
  • word (tag
    'NNP',_h161955),
  • interact(_h161953,_h161954,_h16
    1955),true).
  • extract( word (tag 'VB',_h17857),vg(_h17858)
    ,ng(_h17859),
  • interact(_h17857,_h17858,_h1785
    9),true).
  • extract( word (tag 'NNP',_h42739),word(tag
    'NN',_h42740),ng(_h42741),
  • interact(_h42739,_h42740,_h4274
    1),true).
  • extract( word (tag 'NNP',_h44071),word(tag
    'NN',_h44072),ng(_h44073),
  • interact(_h44071,_h44072,_h4407
    3),true).
  • extract( word (tag 'NNP',_h16431),word(tag
    'NN',_h16432),ng(_h16433),
  • interact(_h16431,_h16432,_h1643
    3),true).

13
Code that extracts patterns
  • - load_dyn( 'extract.P').
  • matcher(_,,_).
  • matcher( SHST,SHPT,_) - matcher(ST,PT,_).
  • matcher( SHST,PHPT,_) - SH \ PH,

  • matcher( ST,PHPT,_).
  • run( S)- process( S).
  • process(S) - extract( P,F,_), matcher( S,P,_),
    write_file(F), fail.
  • process(_).
  • write_file(I)- open( 'interact.P', append,File),
    write(File,I), write(File,'.'),nl(File),
    close(File).

14
Applications of interest
  • Finding interaction between genes and proteins
  • Given a set of genes, say obtained using micro
    array experiments, using such extracted
    information get a rough idea about the various
    genes and proteins that interact with these
    genes.
  • Now build a pathway.
Write a Comment
User Comments (0)
About PowerShow.com