Title: Layers of Annotation for Natural Language Processing
1Layers of Annotation for Natural Language
Processing Preslav Nakov, Archana Ganapathi,
Ariel Schwartz http//biotext.berkeley.edu
- Query Variability
- ltSEN "three-dimensional structure of" ltNOUN
PHRASEgt "from" ltNOUN PHRASEgtgt - ltSEN ltPROTEINgt ltWORDinhibitgt ltPROTEINgt gt
- ltSEN ltNOUN PHRASEgt ltWORD lexactivationgt ltWORD
lexofgt ltNOUN PHRASEgt ltWORD lexbygt ltNOUN
PHRASEgtgt - ltSEN ltNOUN PHRASEgt ltWORD lexsuppressesgt ltNOUN
PHRASEgtgt - ltSEN ltNOUN PHRASEgt ltWORD lexdownregulatesgt ltNOUN
PHRASEgtgt - ltSEN ltPROTEINgt ... ltWORDnegativelygt ...
ltWORDregulategt ... ltPROTEINgtgt - ltSEN ltNOUN PHRASEgt ltVERB-AUXILIARY lexisgt
ltVERB-PAST lexactivatedgt ltPREPOSITION lexbygt
ltNOUN PHRASEgtgt - ltSEN ltNP ltNN1,MeSHgt ltNN2,MeSHgtgtgt
- ltSEN ltNOUN PHRASEgt ( ltWORD lexagt ltNOUN PHRASEgt
)gt - ltSEN ltPROTEINgt ... ltPOS","gt ltWORDwhichgt ...
ltWORDtransactivategt ... ltPROTEINgt ltPOS","gtgt
Sample Query ltSEN ltNP ltNN1,MeSHgt ltNN2,MeSHgt
ltPROTEINgt gt gt
ARCHITECTURE
Example Kinase inhibits RAG-1.
SQL
Layered Query Language
ltdocument ltsentence ltshallow_parse
tag_typeNP ltpos tag_typenoun ltmesh
MeSH_numberG07.553gt gt sent sentence
nn1pos mt1mesh ltpos tag_typenoun
sentencesent ltmesh MeSH_numberDgt
mt2mesh gt nn2pos gt ltgene
tag_typeprt sentencesentgt gt gt print
nn1.pmid, nn1.tag_type, nn2.tag_type,
mt1.MeSH_number, mt2.MeSH_number
select nn1.pmid,nn1.tag_type,nn2.tag_type, mt1.tre
e_number,mt2.tree_number from biotext_annotation_4
nn1 join biotext_annotation_4 np on nn1.pmid
np.pmid and nn1.section np.section and
nn1.sentence np.sentence and nn1.start_char_pos
gt np.start_char_pos join biotext_annotation_4
nn2 on nn1.pmid nn2.pmid and nn1.section
nn2.section and nn1.sequence_pos
nn2.sequence_pos - 1 and nn2.sentence
np.sentence and nn2.end_char_pos
np.end_char_pos join biotext_annotation_4 mesh1
on nn1.pmid mesh1.pmid and nn1.section
mesh1.section and nn1.sentence
mesh1.sentence and nn1.start_char_pos
mesh1.start_char_pos and nn1.end_char_pos
mesh1.end_char_pos
join biotext_annotation_4 mesh2 on nn2.pmid
mesh2.pmid and nn2.section mesh2.section and
nn2.sentence mesh2.sentence and
nn2.start_char_pos mesh2.start_char_pos and
nn2.end_char_pos mesh2.end_char_pos join
biotext_annotation_mesh_tree mt1
on mt1.descriptor_ui mesh1.tag_type join
biotext_annotation_mesh_tree mt2
on mt2.descriptor_ui mesh2.tag_type where
nn1.layer_id 1 and nn2.layer_id
1 and np.layer_id 3 and nn1.tag_type in
(27,30) and nn2.tag_type in (27,30) and np.tag_typ
e 31 and mesh1.layer_id 6 and mesh2.layer_id
6 and mt1.tree_number like 'G07.553' and mt2.tr
ee_number like 'D'
- Language Features
- Traverse Hierarchy using operator
- Sequence allow/disallow words in between
- Action to perform upon pattern matching
- Match position of parent using and operators
Output
Basic architecture Added for arch 3
Added for arch 2 Added for
arch 4
Query-complexity Variation Each group contains
queries of different patterns. Â Group 1
ltPROTEINgt...ltVERBacetylategt...ltPROTEINgtGroup 2
ltPROTEINgt...ltVERBinhibitedgt ltPREPOSITIONbygt...ltP
ROTEINgtGroup 3 ltNPgt ltVERBinteractgt
ltPREPOSITIONwithgt ltNPgtGroup 5
"three-dimensional structure of" ltNPgt "from"
ltNPgtGroup 6 ltNPgt ltWORDtransactivationgt
ltWORDofgt ltNPgt ltWORDbygt ltNPgt
Query-constraint Variation ltwordgt
ltPRTgt...ltWORDinhibitgt...ltPRTgtltposgt
ltPRTgt...ltVERBinhibitgt...ltPRTgt  "Same sentence"
constraint implicit/explicitÂ
sen.start_char_pos lt vrb.start_char_pos Â
sen.end_char_pos gt vrb.end_char_pos
 Dramatically changes resultsoptimizer
discovers far better plans.
Architecture Variation 50 queries rewritten for
architectures 2,3,4 and compared to arch. 1.
Queries use the following patterns  ltNPgt
ltVERBinhibitsgt ltNPgtltNPgt ltVERB-AUX.isgt
ltVERB-PASTinhibitedgt ltPREP.bygt ltNPgtltNPgt
ltVERBinhibitinggt ltNPgtltNPgt ltNOUNinhibitiongt
ltPREPOSITION lexofgt ltNPgt ltPREP.bygt ltNPgtltNPgt
ltNOUNinhibitiongt ltPREPOSITION lexbygt ltNPgt