Richard Tzong-Han Tsai - PowerPoint PPT Presentation

About This Presentation
Title:

Richard Tzong-Han Tsai

Description:

Yue-Yang Bow. Yen-Ching Chang ,Wen-Harn Pan , Wen-Lian Hsu. HypertenGene: ... Inter-HG chunk head words. consider_marker_hypertension. Word. The. GNB3. may. be ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 48
Provided by: incobAp
Category:
Tags: bowhead | han | richard | tsai | tzong

less

Transcript and Presenter's Notes

Title: Richard Tzong-Han Tsai


1
HypertenGene Extracting key hypertension
genesfrom biomedical literature
  • Richard Tzong-Han Tsai
  • , Po-Ting Lai
  • , Hong-Jie Dai
  • , Chi-Hsin Huang
  • ,Yue-Yang Bow
  • Yen-Ching Chang
  • ,Wen-Harn Pan
  • , Wen-Lian Hsu

2
Where are we from?
Institute of Information ScienceAcademia
SinicaTaiwan
3
InCoB 2009
Institute of Information ScienceAcademia
Sinica Taiwan
4
HypertenGene Extracting key hypertension
genesfrom biomedical literature with position
and automatically-generated template features
  • Richard Tzong-Han Tsai
  • , Po-Ting Lai
  • , Hong-Jie Dai
  • , Chi-Hsin Huang
  • ,Yue-Yang Bow
  • Yen-Ching Chang
  • ,Wen-Harn Pan
  • , Wen-Lian Hsu

5
Outline
  • Motivation
  • Major tasks
  • Dataset
  • Evaluation
  • Conclusion

6
What Causes Hypertension
7
GAD Database
Disease View        Search for All       Record
found  930
About 930 PubMed ID about genes associate to
hypertension recorded in GAD Database update
to 2008
8
Articles about Hypertension
Over three hundred thousands abstracts about
hypertension in PubMed
9
Key Hypertension Genes
  • Genes which cause hypertension genetically

Example
The GNB3 may be considered a genetic marker for hypertension. PMID 14557282
10
HG Pair in a Sentence
S
  • The GNB3 may be considered a genetic
  • marker for hypertension.

G
HG Pair
H
11
Outline
  • Motivation
  • Major tasks
  • Dataset
  • Evaluation
  • Conclusion

12
Major Task
  1. Gene named entity recognition (NER) and gene
    normalization (GN)
  2. Hypertension named entity recognition
  3. Gene-hypertension relation extraction

13
Gene Named Entity Recognition
Example
The GNB3 may be considered a genetic marker for hypertension. PMID 14557282
14
Gene Named Entity Recognition
Example
The GNB3 may be considered a genetic marker for hypertension. PMID 14557282
15
Gene Normalization
Example
The GNB3 may be considered a genetic marker for hypertension. PMID 14557282
Gene ID 2784
guanine nucleotide binding protein (G protein), beta polypeptide 3 guanine nucleotide-binding protein, beta-3 subunit transducin beta chain 3 G protein, beta-3 subunit GTP-binding regulatory protein beta-3 chain GNB3
16
Major Task
  1. Gene named entity recognition (NER) and gene
    normalization (GN)
  2. Hypertension named entity recognition
  3. Gene-hypertension relation extraction

17
Disease NER
Example
The GNB3 may be considered a genetic marker for hypertension. PMID 14557282
In conclusion, REN 10631A alleles are significantly associated with EHT in the Emirati population. PMID 16138564
EHT Essential HyperTension
18
Disease NEs in Evident Sentences
OBJECTIVE We sought to determine whether
polymorphisms in the transforming growth factor
(TGF)-beta3 gene are associated with risk of
pregnancy-induced hypertension (PIH) in
case-control mother-baby dyads. ... CONCLUSION A
fetal TGF-beta3 polymorphism (rs11466414) is
associated with PIH in a predominantly Hispanic
population.
PMID 19628198
19
List of Hypertension Acronym
Original Name Acronym
pregnancy-induced hypertension PIH
Primary pulmonary hypertension PPH
Family history of hypertension FH
Pulmonary hypertension PH
More than 30 pairs were collected by acronym
recognition component
20
Major Task
  1. Gene named entity recognition (NER) and gene
    normalization (GN)
  2. Hypertension named entity recognition
  3. Gene-hypertension relation extraction

21
Formulation
Key Relation
Binary Classificationif one target HG pair has relation or not
Not a Key Relation
22
Outline
  • Motivation
  • Major tasks
  • Dataset
  • Evaluation
  • Conclusion

23
Datasets
  • Our data set consists of 939 sentences from 195
    abstracts selected from the GAD
  • 1395 HG pairs can be extracted from these 939
    sentences

Positive HG pair Negative HG pair
Number of HG pairs 349 1046
24
Training Testing
  • Randomly selected 90 HG pairs for training
    set 10 HG pairs for test set
  • Repeat 30 times
  • Calculated the averages to compare their
    performance

25
Outline
  • Motivation
  • Major tasks
  • Dataset
  • Evaluation
  • Conclusion

26
Scoring Method F-score
  • The weighted harmonic mean of precision and
    recall

Key Gene Prediction
Dataset HG1HG10
HG4
HG5
HG6
HG1
HG6
HG1
HG7
HG2
HG7
precision 1/5 0.2
HG3
HG8
recall 1/3 0.33
HG4
HG9
F-score (20.20.33)/(0.20.33)0.25
HG5
HG10
27
AUC of the iP/R curve
n is the total number of correct HG pairs
rj the recall at that HG pairs
pi is the highest interpolated precision for
the correct HG pairs j at rj
Interpolated precision pi is calculated for
each recall r by taking the highest precision at
r or any r gt r.
28
Scoring Method AUC
Key Gene Prediction
1st
HG1
2nd
HG6
3rd
HG7
4th
HG2
5th
HG3
Dataset HG1HG10
HG1
HG6
Precision 0.6, Recall 1, F-score 0.75AUC
0.733333
HG2
HG7
HG3
HG8
Key Gene Prediction
HG4
HG9
1st
HG1
2nd
HG6
3rd
HG2
4th
HG3
5th
HG7
HG5
HG10
Precision 0.6, Recall 1, F-score 0.75 AUC
0.833333
29
Select Features for Classification
Features
Binary Classification
30
Select Features for Classification
The GNB3 may be considered a genetic marker for hypertension.
Key HG pair or not
Features
Binary Classification
31
Features
  • Basic Word Features
  • Chunk Features
  • Parse Tree Path Features
  • Template Features
  • Position Features

32
Basic Word Features
The GNB3 may be considered a genetic marker for hypertension. The GNB3 may be considered a genetic marker for hypertension.
Words between may, be, considered, a, genetic, marker, of, predisposition, for
Words between (bigram) may_be, be_considered, considered_a, a_genetic,genetic_marker, marker_of, of_predisposition,predisposition_for
33
Parse Tree Path Features
Parse Tree Path Features NP_S_VP_NP_PP_NP
34
Chunk Features
Word The GNB3 may be considered a genetic marker for hypertension
Chunk B-NP I-NP B-VP I-VP I-VP B-NP I-NP I-NP B-PP B-NP
The GNB3 may be considered a genetic marker for hypertension. The GNB3 may be considered a genetic marker for hypertension.
Inter-HG chunk types VP_NP_PP
Inter-HG chunk head words consider_marker_hypertension
35
Result of Baseline Features
 Config Precision Recall F-score AUC SAUC
Baseline 0.704 0.536 0.603 0.493 0.126
Baseline Basic word Chunk Parse Tree SAUC
Standard Variation of AUC
36
Template Features
Especially, a polymorphism in SLC12A was
significantly associated with hypertension in
women even after correction by the Bonferroni
method.
The leptin gene polymorphism was associated with
hypertension independent of obesity .
On analysis of covariance , the interaction
between ND2 - 237 Leu / Met polymorphism and
habitual drinking was significantly associated
with both systolic blood pressure and diastolic
blood pressure.
gene associated with hypertension
37
Result of BT Features
 Config P R F-score AUC SAUC ?AUC t AUCgtAUCB? (t gt1.67?)
Baseline 0.704 0.536 0.603 0.493 0.126 N/A N/A N/A
BT 0.733 0.540 0.615 0.513 0.105 0.011 0.65 No
B Baseline feature (words feature, chunk
feature, parse tree) T Template features t t
test
38
Position Features
  • Relative position features

Value 010
  • Section featuresDivide an abstract into four
    sections

Objective Methods Result Conclusions
39
Before Section Categorization
40
After Section Categorization
41
PubMed EX
42
Result
 Config P R F-score AUC SAUC ?AUC t AUCgtAUCB? (t gt1.67?)
Baseline 0.704 0.536 0.603 0.493 0.126 N/A N/A N/A
BT 0.733 0.540 0.615 0.513 0.105 0.011 0.65 No
BP 0.825 0.823 0.820 0.814 0.087 0.360 11.44 Yes
BPT 0.815 0.879 0.841 0.818 0.084 0.378 11.75 Yes
 
B Baseline feature (words feature, chunk
feature, parse tree) P Position features T
Template features
43
Outline
  • Motivation
  • Major tasks
  • Dataset
  • Evaluation
  • Conclusion

44
Conclusions-1
  • The first systematic study of extracting
    hypertension-related genes.

45
Conclusions-2
  • The first attempt to create a hypertension-gene
    relation corpus base on the GAD database.

46
Conclusions-3
  • Propose a supervised learning approach for
    extracting key hypertension-related genes.

47
Thanks for your attention
Write a Comment
User Comments (0)
About PowerShow.com