Layers of Annotation for Natural Language Processing - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Layers of Annotation for Natural Language Processing

Description:

G08.520.769 D12.776.124.050.080, D12.776.124.790.106.100. D12.776.124.790.720.100, D12.776.377.715.085.100. D12.776.377.715.647.100 ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 2
Provided by: RandiT3
Category:

less

Transcript and Presenter's Notes

Title: Layers of Annotation for Natural Language Processing


1
Layers of Annotation for Natural Language
Processing Preslav Nakov, Archana Ganapathi,
Ariel Schwartz http//biotext.berkeley.edu
  • Query Variability
  • ltSEN "three-dimensional structure of" ltNOUN
    PHRASEgt "from" ltNOUN PHRASEgtgt
  • ltSEN ltPROTEINgt ltWORDinhibitgt ltPROTEINgt gt
  • ltSEN ltNOUN PHRASEgt ltWORD lexactivationgt ltWORD
    lexofgt ltNOUN PHRASEgt ltWORD lexbygt ltNOUN
    PHRASEgtgt
  • ltSEN ltNOUN PHRASEgt ltWORD lexsuppressesgt ltNOUN
    PHRASEgtgt
  • ltSEN ltNOUN PHRASEgt ltWORD lexdownregulatesgt ltNOUN
    PHRASEgtgt
  • ltSEN ltPROTEINgt ... ltWORDnegativelygt ...
    ltWORDregulategt ... ltPROTEINgtgt
  • ltSEN ltNOUN PHRASEgt ltVERB-AUXILIARY lexisgt
    ltVERB-PAST lexactivatedgt ltPREPOSITION lexbygt
    ltNOUN PHRASEgtgt
  • ltSEN ltNP ltNN1,MeSHgt ltNN2,MeSHgtgtgt
  • ltSEN ltNOUN PHRASEgt ( ltWORD lexagt ltNOUN PHRASEgt
    )gt
  • ltSEN ltPROTEINgt ... ltPOS","gt ltWORDwhichgt ...
    ltWORDtransactivategt ... ltPROTEINgt ltPOS","gtgt

Sample Query ltSEN ltNP ltNN1,MeSHgt ltNN2,MeSHgt
ltPROTEINgt gt gt
ARCHITECTURE
Example Kinase inhibits RAG-1.
SQL
Layered Query Language
ltdocument ltsentence ltshallow_parse
tag_typeNP ltpos tag_typenoun ltmesh
MeSH_numberG07.553gt gt sent sentence
nn1pos mt1mesh ltpos tag_typenoun
sentencesent ltmesh MeSH_numberDgt
mt2mesh gt nn2pos gt ltgene
tag_typeprt sentencesentgt gt gt print
nn1.pmid, nn1.tag_type, nn2.tag_type,
mt1.MeSH_number, mt2.MeSH_number
select nn1.pmid,nn1.tag_type,nn2.tag_type, mt1.tre
e_number,mt2.tree_number from biotext_annotation_4
nn1 join biotext_annotation_4 np on nn1.pmid
np.pmid and nn1.section np.section and
nn1.sentence np.sentence and nn1.start_char_pos
gt np.start_char_pos join biotext_annotation_4
nn2 on nn1.pmid nn2.pmid and nn1.section
nn2.section and nn1.sequence_pos
nn2.sequence_pos - 1 and nn2.sentence
np.sentence and nn2.end_char_pos
np.end_char_pos join biotext_annotation_4 mesh1
on nn1.pmid mesh1.pmid and nn1.section
mesh1.section and nn1.sentence
mesh1.sentence and nn1.start_char_pos
mesh1.start_char_pos and nn1.end_char_pos
mesh1.end_char_pos
join biotext_annotation_4 mesh2 on nn2.pmid
mesh2.pmid and nn2.section mesh2.section and
nn2.sentence mesh2.sentence and
nn2.start_char_pos mesh2.start_char_pos and
nn2.end_char_pos mesh2.end_char_pos join
biotext_annotation_mesh_tree mt1
on mt1.descriptor_ui mesh1.tag_type join
biotext_annotation_mesh_tree mt2
on mt2.descriptor_ui mesh2.tag_type where
nn1.layer_id 1 and nn2.layer_id
1 and np.layer_id 3 and nn1.tag_type in
(27,30) and nn2.tag_type in (27,30) and np.tag_typ
e 31 and mesh1.layer_id 6 and mesh2.layer_id
6 and mt1.tree_number like 'G07.553' and mt2.tr
ee_number like 'D'
  • Language Features
  • Traverse Hierarchy using operator
  • Sequence allow/disallow words in between
  • Action to perform upon pattern matching
  • Match position of parent using and operators

Output
Basic architecture Added for arch 3
Added for arch 2 Added for
arch 4
Query-complexity Variation Each group contains
queries of different patterns.   Group 1
ltPROTEINgt...ltVERBacetylategt...ltPROTEINgtGroup 2
ltPROTEINgt...ltVERBinhibitedgt ltPREPOSITIONbygt...ltP
ROTEINgtGroup 3 ltNPgt ltVERBinteractgt
ltPREPOSITIONwithgt ltNPgtGroup 5
"three-dimensional structure of" ltNPgt "from"
ltNPgtGroup 6 ltNPgt ltWORDtransactivationgt
ltWORDofgt ltNPgt ltWORDbygt ltNPgt
Query-constraint Variation ltwordgt
ltPRTgt...ltWORDinhibitgt...ltPRTgtltposgt
ltPRTgt...ltVERBinhibitgt...ltPRTgt   "Same sentence"
constraint implicit/explicit 
sen.start_char_pos lt vrb.start_char_pos  
sen.end_char_pos gt vrb.end_char_pos
  Dramatically changes resultsoptimizer
discovers far better plans.
Architecture Variation 50 queries rewritten for
architectures 2,3,4 and compared to arch. 1.
Queries use the following patterns   ltNPgt
ltVERBinhibitsgt ltNPgtltNPgt ltVERB-AUX.isgt
ltVERB-PASTinhibitedgt ltPREP.bygt ltNPgtltNPgt
ltVERBinhibitinggt ltNPgtltNPgt ltNOUNinhibitiongt
ltPREPOSITION lexofgt ltNPgt ltPREP.bygt ltNPgtltNPgt
ltNOUNinhibitiongt ltPREPOSITION lexbygt ltNPgt
Write a Comment
User Comments (0)
About PowerShow.com