Capstone Project Presentation - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Capstone Project Presentation

Description:

SIFT ... SIFT SA B-factor. Friday 17rd December 2004. Stuart Young ... SIFT. S-BLEST (vector contains four sub-shells spreading outward from site) ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 36
Provided by: a15421
Category:

less

Transcript and Presenter's Notes

Title: Capstone Project Presentation


1
Capstone Project Presentation
  • Predicting Deleterious Mutations
  • Young SP, Radivojac P, Mooney SD

2
Predicting Deleterious Mutations
  • Deleterious
  • Hurtful or injurious to life or health noxious
  • (Oxford English Dictionary)
  • Tis pity wine should be so deleterious, For tea
    and coffee leave us much more serious.
  • (BYRON Juan IV, 1821)

3
Predicting Deleterious Mutations
  • SNPs
  • What is an SNP (single nucleotide
    polymorphism)?
  • Why are SNPs important?
  • Some SNPs are nonsynonymous
  • The molecular effects of SNPs vary widely

4
Predicting Deleterious Mutations
  • MOTIVATION
  • Improve on the existing deleterious prediction
    methods
  • Use protein sequence, evolution and structure
    data combined with machine learning to identify
    potentially disease-causing SNPs

5
Predicting Deleterious Mutations
  • SNP data is increasingly available
  • Over 40 major online databases
  • dbSNP is the primary SNP database (contains
    5,000,000 validated human SNPs)
  • Many databases contain potentially
    disease-causing SNPs related to a particular
    disease

6
Predicting Deleterious Mutations
  • Deleterious effects of mutations on proteins
  • Function
  • Stability
  • Expression
  • Protein-Protein Interactions

7
Current Classification Tools
  • Sequence Approaches
  • BLOSUM62
  • An amino acid substitution score matrix
  • SIFT
  • Collects sequence homologues in multiple
    alignments and identifies non-conservative
    changes in amino acids
  • Ng P Henikoff S, 'Predicting Deleterious Amino
    Acid Substitutions. Genome Research, 2001,
    11863-874.

8
Current Classification Tools
  • Structural Approaches
  • Expert rules
  • Uses evolutionary and structural data
  • Sunyaev et al, 'Prediction of deleterious human
    alleles. Human Molecular Genetics, 2001, Vol.
    10, No. 6, 593.
  • Decision Trees
  • Improved performance based on sequence and
    structural data
  • Produces intuitive rules

9
Our foundation for the project
  • Saunders CT Baker D
  • Evaluation of Structural and Evolutionary
  • Contributions to Deleterious Mutation Prediction
  • J. Mol. Biol. (2002) 322, 891901
  • Structural and evolutionary features
  • Trained classifiers based on two data sets -
    experimental mutations and human alleles

10
Predicting Deleterious Mutations
  • S B - Training Sets
  • Experimental mutations (5,000)
  • HIV-1 protease
  • E. Coli Lac repressor
  • T4 Lysozyme
  • Human alleles (350 mutations)
  • 103 hot human genes

11
Predicting Deleterious Mutations
  • Why two training sets?
  • Unbiased human data is hard to get
  • Many disease-associated mutations are
    discovered through genetics association studies
    and may not be causative (i.e., only linked with
    the causative allele)
  • Effect of mutations is hard to measure
  • Experimental whole gene mutagenesis data is
    used considered unbiased

12
Predicting Deleterious Mutations
  • Features used in SB Study
  • SIFT
  • SIFT Solvent Accessibility(SA)
  • SIFT normalized B-factor
  • SIFT Sunyaev expert rules
  • SIFT SA B-factor

13
Predicting Deleterious Mutations

Hypothesis Can we improve on the results of
Saunders and Baker by using more structural and
sequence properties?
14
Predicting Deleterious Mutations
  • Experimental Design
  • Classification algorithm
  • Decision Trees
  • Support Vector
  • Neural Nets
  • Additional Features
  • Amino acid relative frequencies
  • Additional structural properties

15
Predicting Deleterious Mutations
  • Structural Property Values
  • Russ Altman (Stanford) developed a vector
    representation of protein structural sites
  • Spheres (1.875Å ? 7.5Å) centered on C-alpha
    atom of the mutation position
  • 66 features
  • Atom/residue counts within sphere and other
    features, e.g.
  • Solubility
  • Solvent accessibility

16
Predicting Deleterious Mutations
  • Amino Acid Windows
  • AA frequencies within a window on either side of
    the mutation position
  • 20 AAs 20 features
  • LEFT and RIGHT ? 40 features

17
Predicting Deleterious Mutations
  • Amino Acid Windows

18
Predicting Deleterious Mutations
  • Tools
  • Databases
  • PDB - Protein structure data
  • S-BLEST - Structural features
  • Software
  • Perl 5.8.0
  • Matlab (NN, PRTools(DT), SVC)

19
Predicting Deleterious Mutations
  • List of Features Used
  • BLOSUM62, disorder, secondary structure,
    molecular weight
  • Grouped amino acid frequency windows of varying
    widths
  • SIFT
  • S-BLEST (vector contains four sub-shells
    spreading outward from site)
  • Solvent accessibility (C-beta density, i.e., the
    number of C-beta atoms around the site)

20
Predicting Deleterious Mutations
Comparison with SB Results

21
Predicting Deleterious Mutations
  • 1. Human Data Set
  • Human allele dataset as train and test set
  • Ensembles of decision trees for classification
  • 20-fold cross validation
  • Progressively added features to see their affect
    on performance
  • Because structural data was not available for
    all mutation sites, we used a subset of the
    original Saunders and Baker training set

22
Predicting Deleterious Mutations
Best Features

23
Predicting Deleterious Mutations
  • 1. Experimental Data Set
  • Same as human data set but using experimental
    mutations for training and testing

24
Predicting Deleterious Mutations
Evaluation of S-BLEST Using a Random Subset of
the Experimental Training Set

25
Predicting Deleterious Mutations
  • 3. Cross-classification
  • Used the same features described above
  • Trained on one dataset and tested on the other
  • Human to experimental
  • Experimental to human
  • Experimental gene to exp. gene

26
Predicting Deleterious Mutations

27
Predicting Deleterious Mutations

28
Predicting Deleterious Mutations

29
Predicting Deleterious Mutations

30
Predicting Deleterious Mutations
  • Summary of Results
  • Human data set
  • 80 accuracy (up from 70)
  • Experimental data set
  • 87 accuracy (up from 79.5)

31
Predicting Deleterious Mutations
  • Conclusion
  • Prediction tools CAN identify deleterious
    mutations
  • We believe that further study is warranted to
    identify over-fitted classifiers to further
    improve classification accuracy on real world data

32
Acknowledgements
People Andrew Campen (CCBB IT, IUPUI) Brandon
Peters (CCBB, IUPUI) Haixu Tang (Capstone
Coordinator, IUB) Funding This work was funded by
a grant from the Showalter Trust (Sean Mooney,
PI), INGEN, and a IUPUI McNair Scholarship. The
Indiana Genomics Initiative (INGEN) Indiana
University is supported in part by Lilly
Endowment Inc.

33
Predicting Deleterious Mutations

Thank You
34
Predicting Deleterious Mutations

35
Predicting Deleterious Mutations
Write a Comment
User Comments (0)
About PowerShow.com