Introduction to Bioinformatics 10' Machine Learning for Protein Structure Prediction PowerPoint PPT Presentation

presentation player overlay
1 / 25
About This Presentation
Transcript and Presenter's Notes

Title: Introduction to Bioinformatics 10' Machine Learning for Protein Structure Prediction


1
Introduction to Bioinformatics10. Machine
Learning for Protein Structure Prediction 2
  • Course 341
  • Department of Computing
  • Imperial College, London

2
Coursework
  • Link from web page now works
  • Use gap penalty of -1
  • An alignment must have
  • Both sequences the same length after adding gaps

3
Proteins
  • Macromolecules called heteropolymers
  • Complex conformation of amino acids (residues)
  • Below around 40 residues theyre called peptides
  • Conformation Fold 3D Structure
  • Dictates the function of the protein
  • Which dictates how cells behave, etc.
  • 40-50 residues tends to be the lower limit
  • On proteins having a biological function
  • See LESK textbook

4
Why Study Proteins?
  • Major biological macromolecule responsible for
    the machinery and scaffold of cellular life
  • Prof. Michael Sternberg
  • 3D Structure provides insight into
  • Function
  • Evolution
  • Experimental design
  • Systematic design of drugs

5
Amino Acids
  • Amino acids form the basic unit of all proteins
  • A single amino acid always has
  • An amino group NH2
  • A carboxyl group COOH
  • A hydrogen H (sometimes left off diagrams)
  • A chemical group or side chain -"R".
  • These are all joined to a central carbon atom
  • The alpha carbon

6
Amino Acid Picture
7
The Primary Structure of Proteins
  • Fixed main chain backbone
  • Variable side chain
  • One of 20 different amino acid side chains
  • Chirality (left and right handedness)

8
Hydrophobic Interactions
  • Atomic charges dictate how folds occur
  • Groups of C-H atoms have little charge
  • Called hydrophobic or non-polar
  • Hydrophobic groups pack together
  • To avoid contact with solvent (aqueous solution)
  • To minimise energy
  • Hydrophobic and hydrophilic regions are
  • Main driving force behind the folding process

9
Protein Folding in anAqueous Environment
H20
H20
H20
H20
H20
H20
H20
H20
H20
Folded chain Many atoms shielded from water
contact
Unfolded chain
10
Secondary Structures
  • Energetics of the side chain
  • Stop certain fold conformations
  • Protein structures are not random
  • Some larger (secondary) structures are observable
  • Two common structures
  • Alpha-helix
  • Beta-sheet
  • Connected by turns and loops
  • E.g., Beta-turns

11
Alpha Helices
12
Beta Sheets (front and side)
13
Protein Folds
  • Tertiary structure
  • 3D structure of an entire amino acid chain
  • Sequential arrangements of chain sections
  • Particularly alpha-helices and beta-sheets/strands
  • Quaternary structure
  • Proteins with more than one chain
  • Arrangement of these chains

14
Example of Quaternary Structure
15
Structural Classes of Folds
  • Different classes, including
  • Alpha/Alpha
  • Mainly packing of alpha helices
  • Beta/Beta
  • Mainly one or more beta sheets
  • Alpha/Beta
  • Roughly alternate alpha helices and beta sheets
  • AlphaBeta
  • Mixed alpha helices and beta
  • Coil
  • Mainly small proteins (fewer than 50 residues)

16
Examples of Common Classes
17
Example Proteins (OspC protein)
18
Triose Phosphate Isomerase
  • Tim (Beta/Alpha)8 Barrel

19
Hemerythrin
  • 4-Alpha-up-down bundle

20
Homologous Genes
  • Similar sequences produce very similar structures
  • Secondary structures are well preserved

21
Back to Machine Learning
  • Remember the task
  • Automatically learn methods for predicting the
    structure of proteins

Structure Predictor
Learning process for the chosen representation
Machine readable format
22
Predicting Secondary Structurefor Each Residue
  • G A G D G A N A A A

Structure Predictor
a a a a coil coil ß ß ß ß
Alpha Helix
Beta Sheet
Further Processing
23
The CASP Competition
  • Determining actual structure of a protein is hard
  • Its a good idea to predict the structure
  • Many, varied approaches to structure prediction
  • Which one is the best?
  • Comparative Assessment of Protein Structure
    Prediction
  • Blind trial to evaluate the different approaches
  • Chemists do not publish the structures theyve
    found
  • CASP - manual intervention is allowed
  • CAFASP2 - completely automatic prediction only
  • Around 60 targets are used for the evaluation

24
Continuous Server Evaluation
  • Every time a new structure is determined
  • Every method on a server is assessed
  • Regular updates on status are provided
  • Two well known services
  • Livebench (Poland)
  • Eva (Columbia, USA)
  • Roughly 100 structures per annum are tested

25
Current State of the Art
  • Secondary structure prediction
  • Usually in the form of 3-state prediction (Q3)
  • Alpha, Beta, Coil
  • Majority class predictor gets about 40
  • Prediction has gone from 60
  • Using rule-based methods
  • To about 80 using machine learning methods
  • Neural networks, SVMs are particularly good
  • Inductive Logic Programming also very useful
  • Learns explainable rules which add to the science
Write a Comment
User Comments (0)
About PowerShow.com