Title: Richard Tzong-Han Tsai
1HypertenGene Extracting key hypertension
genesfrom biomedical literature
- Richard Tzong-Han Tsai
- , Po-Ting Lai
- , Hong-Jie Dai
- , Chi-Hsin Huang
- ,Yue-Yang Bow
- Yen-Ching Chang
- ,Wen-Harn Pan
- , Wen-Lian Hsu
2Where are we from?
Institute of Information ScienceAcademia
SinicaTaiwan
3InCoB 2009
Institute of Information ScienceAcademia
Sinica Taiwan
4HypertenGene Extracting key hypertension
genesfrom biomedical literature with position
and automatically-generated template features
- Richard Tzong-Han Tsai
- , Po-Ting Lai
- , Hong-Jie Dai
- , Chi-Hsin Huang
- ,Yue-Yang Bow
- Yen-Ching Chang
- ,Wen-Harn Pan
- , Wen-Lian Hsu
5Outline
- Motivation
- Major tasks
- Dataset
- Evaluation
- Conclusion
6What Causes Hypertension
7GAD Database
Disease View Search for All Record
found 930
About 930 PubMed ID about genes associate to
hypertension recorded in GAD Database update
to 2008
8Articles about Hypertension
Over three hundred thousands abstracts about
hypertension in PubMed
9Key Hypertension Genes
- Genes which cause hypertension genetically
Example
The GNB3 may be considered a genetic marker for hypertension. PMID 14557282
10HG Pair in a Sentence
S
- The GNB3 may be considered a genetic
- marker for hypertension.
G
HG Pair
H
11Outline
- Motivation
- Major tasks
- Dataset
- Evaluation
- Conclusion
12Major Task
- Gene named entity recognition (NER) and gene
normalization (GN) - Hypertension named entity recognition
- Gene-hypertension relation extraction
13Gene Named Entity Recognition
Example
The GNB3 may be considered a genetic marker for hypertension. PMID 14557282
14Gene Named Entity Recognition
Example
The GNB3 may be considered a genetic marker for hypertension. PMID 14557282
15Gene Normalization
Example
The GNB3 may be considered a genetic marker for hypertension. PMID 14557282
Gene ID 2784
guanine nucleotide binding protein (G protein), beta polypeptide 3 guanine nucleotide-binding protein, beta-3 subunit transducin beta chain 3 G protein, beta-3 subunit GTP-binding regulatory protein beta-3 chain GNB3
16Major Task
- Gene named entity recognition (NER) and gene
normalization (GN) - Hypertension named entity recognition
- Gene-hypertension relation extraction
17Disease NER
Example
The GNB3 may be considered a genetic marker for hypertension. PMID 14557282
In conclusion, REN 10631A alleles are significantly associated with EHT in the Emirati population. PMID 16138564
EHT Essential HyperTension
18Disease NEs in Evident Sentences
OBJECTIVE We sought to determine whether
polymorphisms in the transforming growth factor
(TGF)-beta3 gene are associated with risk of
pregnancy-induced hypertension (PIH) in
case-control mother-baby dyads. ... CONCLUSION A
fetal TGF-beta3 polymorphism (rs11466414) is
associated with PIH in a predominantly Hispanic
population.
PMID 19628198
19List of Hypertension Acronym
Original Name Acronym
pregnancy-induced hypertension PIH
Primary pulmonary hypertension PPH
Family history of hypertension FH
Pulmonary hypertension PH
More than 30 pairs were collected by acronym
recognition component
20Major Task
- Gene named entity recognition (NER) and gene
normalization (GN) - Hypertension named entity recognition
- Gene-hypertension relation extraction
21Formulation
Key Relation
Binary Classificationif one target HG pair has relation or not
Not a Key Relation
22Outline
- Motivation
- Major tasks
- Dataset
- Evaluation
- Conclusion
23Datasets
- Our data set consists of 939 sentences from 195
abstracts selected from the GAD - 1395 HG pairs can be extracted from these 939
sentences
Positive HG pair Negative HG pair
Number of HG pairs 349 1046
24Training Testing
- Randomly selected 90 HG pairs for training
set 10 HG pairs for test set - Repeat 30 times
- Calculated the averages to compare their
performance
25Outline
- Motivation
- Major tasks
- Dataset
- Evaluation
- Conclusion
26Scoring Method F-score
- The weighted harmonic mean of precision and
recall
Key Gene Prediction
Dataset HG1HG10
HG4
HG5
HG6
HG1
HG6
HG1
HG7
HG2
HG7
precision 1/5 0.2
HG3
HG8
recall 1/3 0.33
HG4
HG9
F-score (20.20.33)/(0.20.33)0.25
HG5
HG10
27AUC of the iP/R curve
n is the total number of correct HG pairs
rj the recall at that HG pairs
pi is the highest interpolated precision for
the correct HG pairs j at rj
Interpolated precision pi is calculated for
each recall r by taking the highest precision at
r or any r gt r.
28Scoring Method AUC
Key Gene Prediction
1st
HG1
2nd
HG6
3rd
HG7
4th
HG2
5th
HG3
Dataset HG1HG10
HG1
HG6
Precision 0.6, Recall 1, F-score 0.75AUC
0.733333
HG2
HG7
HG3
HG8
Key Gene Prediction
HG4
HG9
1st
HG1
2nd
HG6
3rd
HG2
4th
HG3
5th
HG7
HG5
HG10
Precision 0.6, Recall 1, F-score 0.75 AUC
0.833333
29Select Features for Classification
Features
Binary Classification
30Select Features for Classification
The GNB3 may be considered a genetic marker for hypertension.
Key HG pair or not
Features
Binary Classification
31Features
- Basic Word Features
- Chunk Features
- Parse Tree Path Features
- Template Features
- Position Features
32Basic Word Features
The GNB3 may be considered a genetic marker for hypertension. The GNB3 may be considered a genetic marker for hypertension.
Words between may, be, considered, a, genetic, marker, of, predisposition, for
Words between (bigram) may_be, be_considered, considered_a, a_genetic,genetic_marker, marker_of, of_predisposition,predisposition_for
33Parse Tree Path Features
Parse Tree Path Features NP_S_VP_NP_PP_NP
34Chunk Features
Word The GNB3 may be considered a genetic marker for hypertension
Chunk B-NP I-NP B-VP I-VP I-VP B-NP I-NP I-NP B-PP B-NP
The GNB3 may be considered a genetic marker for hypertension. The GNB3 may be considered a genetic marker for hypertension.
Inter-HG chunk types VP_NP_PP
Inter-HG chunk head words consider_marker_hypertension
35Result of Baseline Features
Config Precision Recall F-score AUC SAUC
Baseline 0.704 0.536 0.603 0.493 0.126
Baseline Basic word Chunk Parse Tree SAUC
Standard Variation of AUC
36Template Features
Especially, a polymorphism in SLC12A was
significantly associated with hypertension in
women even after correction by the Bonferroni
method.
The leptin gene polymorphism was associated with
hypertension independent of obesity .
On analysis of covariance , the interaction
between ND2 - 237 Leu / Met polymorphism and
habitual drinking was significantly associated
with both systolic blood pressure and diastolic
blood pressure.
gene associated with hypertension
37Result of BT Features
Config P R F-score AUC SAUC ?AUC t AUCgtAUCB? (t gt1.67?)
Baseline 0.704 0.536 0.603 0.493 0.126 N/A N/A N/A
BT 0.733 0.540 0.615 0.513 0.105 0.011 0.65 No
B Baseline feature (words feature, chunk
feature, parse tree) T Template features t t
test
38Position Features
- Relative position features
Value 010
- Section featuresDivide an abstract into four
sections
Objective Methods Result Conclusions
39Before Section Categorization
40After Section Categorization
41PubMed EX
42Result
Config P R F-score AUC SAUC ?AUC t AUCgtAUCB? (t gt1.67?)
Baseline 0.704 0.536 0.603 0.493 0.126 N/A N/A N/A
BT 0.733 0.540 0.615 0.513 0.105 0.011 0.65 No
BP 0.825 0.823 0.820 0.814 0.087 0.360 11.44 Yes
BPT 0.815 0.879 0.841 0.818 0.084 0.378 11.75 Yes
B Baseline feature (words feature, chunk
feature, parse tree) P Position features T
Template features
43Outline
- Motivation
- Major tasks
- Dataset
- Evaluation
- Conclusion
44Conclusions-1
- The first systematic study of extracting
hypertension-related genes.
45Conclusions-2
- The first attempt to create a hypertension-gene
relation corpus base on the GAD database.
46Conclusions-3
- Propose a supervised learning approach for
extracting key hypertension-related genes.
47Thanks for your attention