Associating Biomedical Terms: Case Study for Acetylation - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Associating Biomedical Terms: Case Study for Acetylation

Description:

Download articles relevant to acetylation and extract sites. Rank articles in order to elucidate sites quickly ... Science Direct provided1519 articles ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 32
Provided by: aaronbu
Category:

less

Transcript and Presenter's Notes

Title: Associating Biomedical Terms: Case Study for Acetylation


1
Associating Biomedical TermsCase Study for
Acetylation
  • Aaron Buechlein
  • Indiana University School of Informatics
  • Advisor Dr. Predrag Radivojac

2
Overview
  • Background
  • Previous Work
  • Methods
  • Results

3
Central Dogma
Background Previous Work Methods Results
http//www.accessexcellence.org/RC/VL/GG/images/ce
ntral.gif
4
Post-Translational Modifications (PTMs)
Background Previous Work Methods Results
5
Acetylation
  • Acetylation involves the substitution of an
    acetyl group (-COCH3) for hydrogen
  • Typically occurs on N-terminal tails and lysine
    residues (Lys or K)

Background Previous Work Methods Results
6
Previous Predictors
  • Several PTM predictors have been created prior to
    this work
  • There are also acetylation predictors prior
  • NetAcet is a predictor for only N-terminal sites
  • AutoMotif Server is a predictor for various PTMs
    and includes an acetylation portion
  • PAIL is a lysine acetylation predictor

Background Previous Work Methods Results
7
Methods
  • Create Dataset
  • Download articles relevant to acetylation and
    extract sites
  • Rank articles in order to elucidate sites quickly
  • SwissProt and Human Protein Reference Database
    (HPRD)
  • Create Predictors
  • Leave one protein out validation
  • Matlab

Background Previous Work Methods Results
8
Article Retrieval
  • Searched individual journal sites for articles
    relevant to acetylation
  • Saved resultant html pages for each journal
  • These pages were then used as the input for a web
    crawler to download articles
  • Due to varying journal site construction each
    journal required a unique regular expression to
    extract links for articles

Background Previous Work Methods Results
9
Rank Articles
  • First locate occurrences of first phrase phrase
    1
  • A a1, a2, , aA
  • Next locate occurrences of second phrase phrase
    2
  • R r1, r2, rR
  • c and d are constants
  • x is the distance in characters between r and the
    nearest word a

Background Previous Work Methods Results
10
An example acetylation
Background Previous Work Methods Results
1. word acetylat A a1, a2, ,
am
2. regular expression (k ? lys ?
lysine)(space)(digit) R r1, r2, ,
rn
11
An example acetylation
Background Previous Work Methods Results
Score for article S
where
and
12
An example acetylation
Background Previous Work Methods Results
Papers with S gt 100 are rich in sites if S lt 30
twilight zone
13
Elucidate Sites
  • Sites were manually extracted from articles
    beginning with the highest rank
  • The original experimental paper for these sites
    was verified for traceable evidence
  • Sites were extracted from SwissProt
  • Sites were extracted from HPRD

Background Previous Work Methods Results
14
Predictors
  • Support Vector Machine
  • Artificial Neural Network
  • Decision Tree

Background Previous Work Methods Results
15
Predictor Input
  • Positives taken as all lysines found to be
    acetylated
  • Negatives taken as all lysines not found to be
    acetylated
  • Features created based on characteristics
    surrounding lysines
  • Amino acid content, hydrophobicity, charge,
    disorder, etc.

Background Previous Work Methods Results
16
Predictor Input
Background Previous Work Methods Results
Protein Features Features Features Features Features Features Acetylated
1 8 1 0.48609 0.001767 0.48979 0.51508 1
1 7 1 0.92146 0.03019 0.96423 0.79416 1
1 0 0 0.50622 0.015251 0.52335 0.51855 0
2 10 2 0.2008 0.038708 0.25441 0.36071 1
2 1 0 0.62016 0.009772 0.62846 0.67525 0
2 0 0 0.27783 0.028957 0.32162 0.34207 0
3 11 1 0.89239 0.018354 0.91884 0.88125 1
3 12 2 0.87354 0.022307 0.90349 0.87446 1
3 8 1 0.81549 0.025339 0.85289 0.85702 1
3 2 0 0.84588 0.024766 0.88219 0.86599 0
17
Article and Ranking Results
  • 4888 articles from 10 sites were searched
  • Nature provided 2147 articles
  • Science Direct provided1519 articles
  • The highest ranking article was obtained from the
    Journal of Biological Chemistry
  • Score of 151.87
  • Contained 10 acetylation sites
  • The highest ranking article was obtained from
    Nature when histones are excluded
  • Previously ranked at 5
  • score of 116.36
  • Contained 9 unique acetylation sites

Background Previous Work Methods Results
18
Top 25
Rank Score Sites Article Source
1) 151.8667 10 Journal of Biological Chemistry
2) 123.2314 12 Cell / Science Direct
3) 121.9031 6 Nature
4) 117.7988 9 Journal of Proteome Research
5) 116.3582 9 Nature
6) 111.1745 14 Biochemistry
7) 104.4652 6 Cell / Science Direct
8) 104.0166 7 Nature
9) 102.0683 13 Molecular Cell / Science Direct
10) 98.80812 6 Journal of Biological Chemistry
11) 97.64634 6 Biochemistry
12) 96.76536 6 Journal of Biological Chemistry
13) 96.0845 9 International Journal of Mass Spectrometry / Science Direct
14) 88.12967 9 Biochemistry
15) 86.17157 6 Journal of Biological Chemistry
16) 81.78705 5 Nucleic Acids Research
17) 81.30967 6 Biochemistry
18) 81.06128 6 Molecular Cell / Science Direct
19) 80.74899 9 Journal of Biological Chemistry
20) 80.16261 9 Nature
21) 79.65658 6 Molecular Cell / Science Direct
22) 77.9022 4 Cell / Science Direct
23) 77.88304 5 Nucleic Acids Research
24) 77.60087 8 Gene / Science Direct
25) 77.44198 6 Journal of the American Society for Mass Spectrometry
Background Previous Work Methods Results
19
Ranking Results
  • Articles with scores greater than 30 had
    potential for providing at least one site
  • As scores approached 30, articles became less
    fruitful

Background Previous Work Methods Results
20
Dataset Results
  • Dataset included 1442 total sites and 1085
    non-redundant sites
  • HPRD contributed 90 total sites
  • Swiss-Prot contributed 825
  • Our Study contributed 527

Background Previous Work Methods Results
21
Dataset Results
Background Previous Work Methods Results
22
Sensitivity, Specificity, and Precision
  • Sensitivity(sn) -
  • Specificity(sp) -
  • Precision(pr) -

Background Previous Work Methods Results
23
Accuracy and AUC
  • Accuracy(acc) -
  • Area Under Curve(AUC)
  • Refers to the area under the Receiver Operating
    Curve (ROC)
  • ROC is the graphical plot of sensitivity vs.
    1-specificity

Background Previous Work Methods Results
24
SVM Predictor
Background Previous Work Methods Results
Degree Polynomial kernel Polynomial kernel Polynomial kernel Polynomial kernel Polynomial kernel
Degree sn sp pr acc AUC
p 1 52.3 71.0 24.6 61.6 65.2
p 2 46.1 69.8 20.3 57.9 62.8
p 3 31.6 80.8 23.5 56.2 60.3
Degree Gaussian kernel Gaussian kernel Gaussian kernel Gaussian kernel Gaussian kernel
Degree sn sp pr acc AUC
s 10-2 43.8 75.8 24.9 59.8 64.3
s 10-3 54.1 72.1 25.9 63.1 68.1
s 10-6 52.8 70.7 24.6 61.8 65.3
25
Artificial Neural Network
Background Previous Work Methods Results
Hidden Neurons Artificial Neural Network Artificial Neural Network Artificial Neural Network Artificial Neural Network Artificial Neural Network
Hidden Neurons sn sp pr acc AUC
1 68.0 47.7 20.7 57.8 61.9
3 65.2 47.7 19.4 56.4 58.9
5 65.0 47.2 19.1 56.1 57.5
26
Decision Tree
Background Previous Work Methods Results
Algorithm Decision Tree Decision Tree Decision Tree Decision Tree Decision Tree
Algorithm sn sp pr acc AUC
Decision Tree 61.7 45.9 18.3 53.8 42.1
27
Algorithm Comparison
Background Previous Work Methods Results
Algorithm sn sp pr acc AUC
SVM 54.1 72.1 25.9 63.1 68.1
Neural Network 68.0 47.7 20.7 57.8 61.9
Decision Tree 61.7 45.9 18.3 53.8 42.1
28
  • I would like to acknowledge those who have
    helped me throughout the duration of this
    project, Dr. Predrag Radivojac,
    Dr. Haixu Tang, and Wyatt Clark

29
I welcome your questions and/or comments
30
An example acetylation
Background Previous Work Methods Results
1. word acetylat A a1, a2, ,
am
2. regular expression (k ? lys ?
lysine)(space)(digit) R r1, r2, ,
rn
31
An example acetylation
Background Previous Work Methods Results
Score for article S
where
and
Write a Comment
User Comments (0)
About PowerShow.com