CZ5226: Advanced Bioinformatics Lecture 7: Statistical Learning Methods Prof' Chen Yu Zong Tel: 6874 - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

CZ5226: Advanced Bioinformatics Lecture 7: Statistical Learning Methods Prof' Chen Yu Zong Tel: 6874

Description:

A drug or a protein is classified as either belong ( ) or not belong ... R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, ... – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 38
Provided by: dbs7
Category:

less

Transcript and Presenter's Notes

Title: CZ5226: Advanced Bioinformatics Lecture 7: Statistical Learning Methods Prof' Chen Yu Zong Tel: 6874


1
CZ5226 Advanced Bioinformatics Lecture 7
Statistical Learning Methods Prof. Chen Yu
ZongTel 6874-6877Email csccyz_at_nus.edu.sghttp
//xin.cz3.nus.edu.sgRoom 07-24, level 7, SOC1,
National University of Singapore
2
Classification of Drugs or Proteins by SVM
  • A drug or a protein is classified as either
    belong () or not belong (-) to a class
  • Examples of drug class inhibitor of a protein,
    BBB penetrating, genotoxic
  • Examples of protein class enzyme EC3.4 family,
    DNA-binding
  • By screening against all classes, the property of
    a drug or the function of a protein can be
    identified

Class-1 SVM
-
Drug or Protein
Class-2 SVM
-
Class-3 SVM
Drug or Protein belongs to Family-3

-
-
3
Classification of Drugs or Proteins by SVM
  • What is SVM?
  • Support vector machines, a machine learning
    method, learning by examples, statistical
    learning, classify objects into one of the two
    classes.
  • Advantages of SVM
  • Diversity of class members (no racial
    discrimination).
  • Use of structure-derived physico-chemical
    features as basis for drug or protein
    classification (no structure-similarity or
    sequence-similarity required in the algorithm).

4
SVM References
  • C. Burges, "A tutorial on support vector machines
    for pattern recognition", Data Mining and
    Knowledge Discovery, Kluwer Academic
    Publishers,1998 (on-line).
  • R. Duda, P. Hart, and D. Stork, Pattern
    Classification, John-Wiley, 2nd edition, 2001
    (section 5.11, hard-copy).
  • S. Gong et al. Dynamic Vision From Images to
    Face Recognition, Imperial College Pres, 2001
    (sections 3.6.2, 3.7.2, hard copy).
  • Online lecture notes (http//www.cs.unr.edu/bebis
    /MathMethods/SVM/lecture.pdf )
  • Publications of SVM drug prediction
  • J. Chem. Inf. Comput. Sci. 44,1630 (2004)
  • J. Chem. Inf. Comput. Sci. 44, 1497 (2004)
  • Toxicol. Sci. 79,170 (2004).

5
SVM References
  • Publications of SVM protein function prediction
  • Bioinformatics 2002 18, 147
  • Nucleic Acids Res 2003 31, 3692
  • Proteins 2004 55, 66
  • RNA 2004 10, 355
  • J Biol Chem 2004 279, 23262
  • Nucleic Acids Res. 2004 32(21) 6437-6444
  • Virology 2005 331(1)136-143
  • Publications of SVM peptide-binder prediction
  • BMC Bioinformatics. 2002 Sep 113(1)25
  • Bioinformatics. 2003 Oct 1219(15)1978-84
  • Protein Sci. 2004 Mar13(3)596-607
  • Genome Inform Ser Workshop Genome Inform.
    200415(1)198-212

6
Other MHC-Peptide Prediction References
  • J Comput Biol. 200411(4)683-94
  • Methods. 2004 Dec34(4)454-9
  • Methods. 2004 Dec34(4)444-53
  • Methods. 2004 Dec34(4)436-43
  • Org Biomol Chem. 2004 Nov 212(22)3274-83
  • Immunogenetics. 2004 Sep56(6)405-19
  • J Immunol. 2004 Jun 15172(12)7495-502
  • J Immunol. 2004 Jun 1172(11)6783-9
  • Appl Bioinformatics. 20032(1)63-6
  • Appl Bioinformatics. 20032(3)155-8
  • Bioinformatics. 2004 Jun 1220(9)1388-97.
  • Proteins. 2004 Feb 1554(3)534-56
  • Novartis Found Symp. 2003254102-20 discussion
    120-5, 216-22, 250-2
  • Hum Immunol. 2003 Dec64(12)1123-43
  • J Mol Graph Model. 2004 Jan22(3)195-207
  • Neural Comput. 2003 Dec15(12)2931-42
  • Tissue Antigens. 2003 Nov62(5)378-84

7
Other MHC-Peptide Prediction References
  • Bioinformatics. 2003 Sep 2219(14)1765-72
  • Hybrid Hybridomics. 2003 Aug22(4)229-34
  • Nucleic Acids Res. 2003 Jul 131(13)3621-4
  • Bioinformatics. 2003 May 2219(8)1009-14
  • Methods. 2003 Mar29(3)236-47
  • J Proteome Res. 2002 May-Jun1(3)263-72
  • J Mol Biol. 2003 Feb 28326(4)1157-74
  • BMC Bioinformatics. 2002 Sep 113(1)25
  • Hum Immunol. 2002 Sep63(9)701-9
  • J Comput Biol. 20029(3)527-39
  • Mol Med. 2002 Mar8(3)137-48
  • Immunol Cell Biol. 2002 Jun80(3)280-5
  • Immunol Cell Biol. 2002 Jun80(3)270-9
  • BMC Struct Biol. 2002 May 132(1)2
  • Biologicals. 2001 Sep-Dec29(3-4)179-81
  • Bioinformatics. 2001 Dec17(12)1236-7
  • Bioinformatics. 2001 Oct17(10)942-8
  • J Med Chem. 2001 Oct 2544(22)3572-81
  • J Comput Aided Mol Des. 2001 Jun15(6)573-86

8
Machine Learning Method

Inductive learning Example-based learning
9
Machine Learning Method


A(1, 1, 1) B(0, 1, 1) C(1, 1, 1) D(0, 1,
1) E(0, 0, 0) F(1, 0, 1)
10
SVM Method

Feature vectors in input space
Feature vector
A(1, 1, 1) B(0, 1, 1) C(1, 1, 1) D(0, 1,
1) E(0, 0, 0) F(1, 0, 1)

11
SVM Method
12
SVM method
13
SVM Method
14
Best Linear Separator?
15
Best Linear Separator?
16
Find Closest Points in Convex Hulls
d
c
17
Plane Bisect Closest Points
d
c
18
Find using quadratic program
Many existing and new solvers.
19
Best Linear SeparatorSupporting Plane Method
Maximize distance Between two parallel
supporting planes
Distance Margin
20
Best Linear Separator?
21
SVM Method
Border line is nonlinear
22
SVM method
23
SVM method
Non-linear transformation
24
SVM Method
25
SVM Method
26
SVM Method
27
SVM Method
28
SVM for Classification of Drugs
  • How to represent a drug?
  • Each structure represented by specific feature
    vector assembled from structural,
    physico-chemical properties
  • Simple molecular properties (molecular weight,
    no. of rotatable bonds etc. 18 in total)
  • Molecular Connectivity and shape (28 in total)
  • Electro-topological state polarity (84 in total)
  • Quantum chemical properties (electric charge,
    polaritability etc. 13 in total)
  • Geometrical properties (molecular size vector,
    van der Waals volume, molecular surface etc. 16
    in total)
  • J. Chem. Inf. Comput. Sci. 44,1630 (2004)
  • J. Chem. Inf. Comput. Sci. 44, 1497 (2004)
  • Toxicol. Sci. 79,170 (2004).

29
SVM-based drug design and property prediction
software
Useful for inhibitor/activator/substrate
prediction, drug safety and pharmacokinetic
prediction.
Drug
Chemical Structure
Chemical Structure
Your drug structure
Option 2
Option 1
http//jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi
Which class your drug belongs to?
Send structure to classifier
Input structure through internet
Support vector machines classifier for every
Drug class
Computer loaded with SVMProt
Drug designed or property predicted
Identified classes
Input structure on local machine
J. Chem. Inf. Comput. Sci. 44,1630 (2004) J.
Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol.
Sci. 79,170 (2004).
30
SVM Drug Prediction Results
  • Protein inhibitor/activator/substrate prediction
  • 86 of the 129 estrogen receptor activators and
    84 of 101 non-activators correctly predicted.
  • 81 of 116 P-glycoprotein substrates and 79 of
    85 non-substrates correctly predicted
  • Drug Toxicity Prediction
  • 97 of 102 TdP and 84 of 243 TdP- agents
    correctly predicted
  • 73 of 229 genotoxic and 93 of 631 non-genotoxic
    agents correctly predicted
  • Pharmacokinetics prediction
  • 95 of 276 BBB and 82 of 139 BBB- agents
    correctly predicted
  • 90 of 131 human intestine absorption and 80 of
    65 non-absoption agents correctly predicted.
  • J. Chem. Inf. Comput. Sci. 44,1630 (2004)
  • J. Chem. Inf. Comput. Sci. 44, 1497 (2004)
  • Toxicol. Sci. 79,170 (2004).

31
SVM for Classification of Proteins
  • How to represent a protein?
  • Each sequence represented by specific feature
    vector assembled from encoded representations of
    tabulated residue properties
  • amino acid composition
  • Hydrophobicity
  • normalized Van der Waals volume
  • polarity,
  • Polarizability
  • Charge
  • surface tension
  • secondary structure
  • solvent accessibility
  • Three descriptors, composition (C), transition
    (T), and distribution (D), are used to describe
    global composition of each of these properties.
  • Nucleic Acids Res. 2003 31 3692-3697

32
SVM for Classification of Proteins
  • How to represent a protein?
  • From protein sequence

To Feature vector (C_amino acid composition,
T_ amino acid composition, D_ amino acid
composition, C_hydrophobicity,
T_hydrophobicity, D_hydrophobicity, ) Nucleic
Acids Res. 2003 31 3692-3697
33
SVM for Classification of Proteins
  • How to represent a protein?

34
Protein function prediction software SVMProt
Useful for functional prediction of novel
proteins, distantly-related proteins, homologous
proteins of different functions
35
Protein function prediction software SVMProt
Useful for functional prediction of novel
proteins, distantly-related proteins, homologous
proteins of different functions. Protein
families covered 46 enzyme families, 3 receptor
families, 4 transporter and channel families, 6
DNA- and RNA-binding families, 8 structural
families, 2 regulator/factor families. SVMProt
web-version at http//jing.cz3.nus.edu.sg/cgi-bin
/svmprot.cgi
Nucl. Acids Res. 31, 3692-3697 (2003)
36
Protein function prediction software SVMProt
Probability of correct prediction
Prediction score
Nucl. Acids Res. 31, 3692-3697 (2003)
37
SVMProt Protein Functional Family Prediction
Results
  • Overall prediction accuracies
  • 87 of the 34,582 proteins correctly assigend to
    their respective functional family.
  • 97 of the 310,000 non-member proteins correctly
    predicted
  • Novel enzymes
  • 67 of the 12 non-homologous enzymes (having no
    homlogous proteins by PSI-BLAST search of NR
    databases) are correctly assigned
  • 83 of the 29 non-homologous enzymes (having no
    homologous proteins by PSI-BLAST search of
    SwissProt database) are correctly assigned.
  • 70 of the 20 pairs of homologous enzymes of
    different functions are correctly assigned.
  • NR databases include all non-redundant GenBank,
  • CDS translations, PDB, SwissProt, PIR, and PRF
    databases
  • 92 of 12,900 enzymes correctly assigned by BLAST
    in 1997
  • Nucleic Acids Res 2003 31, 3692
Write a Comment
User Comments (0)
About PowerShow.com