Title: CS 177 Proteins, part 2 Computational modeling
1CS 177 Proteins, part 2 (Computational
modeling)
Review of protein structures Computational
modeling Three-dimensional structural analysis
in laboratory
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
2Structure-Function Relationships
Recommended readings
A science primer Molecular modeling http//www.nc
bi.nlm.nih.gov/About/primer/molecularmod.html
Brown, S.M. (2000) Bioinformatics, Eaton
Publishing, pp. 99-119 Veeramalai, M.
Gilbert, D. Bioinformatics Tools for protein
structure visualisation and analysis http//www.br
c.dcs.gla.ac.uk/mallika/Publications/scwbiw-artic
le.htm Mount, D.W. (2001) Bioinformatics,Cold
Spring Harbor Lab Press, pp.382-478
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
3Review of protein structure
Primary structure
Proteins are chains of amino acids joined by
peptide bonds
The structure of two amid acids
The N-C?-C sequence is repeated throughout the
protein, forming the backbone
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
The bonds on each side of the C? atom are free to
rotate within spatial constrains,the angles of
these bonds determine the conformation of the
protein backbone
The R side chains also play an important
structural role
4Review of protein structure
Secondary structure
Interactions that occur between the CO and N-H
groups on amino acidsMuch of the protein core
comprises ? helices and ? sheets, folded into a
three-dimensional configuration- regular
patterns of H bonds are formed between
neighboring amino acids- the amino acids have
similar angles- the formation of these
structures neutralizes the polar groups on each
amino acid- the secondary structures are tightly
packed in a hydrophobic environment- Each R side
group has a limited volume to occupy and a
limited number of interactions with other R
side groups
? sheet
? helix
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
5Secondary structure
? helix
? sheet
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
6 Secondary structure
Other Secondary structure elements(no
standardized classification)
- random coil
- loop
- others (e.g. 310 helix, ?-hairpin, paperclip)
Super-secondary structure
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
- In addition to secondary structure elements
that apply to all proteins (e.g. helix, sheet)
there are some simple structural motifs in some
proteins
- These super-secondary structures (e.g.
transmembrane domains, coiled coils,
helix-turn-helix, signal peptides) can give
important hints about protein function
7Secondary structure
Structural classification of proteins (SCOP)
Class 2 mainly beta
Class 1 mainly alpha
Class 3 alpha/beta
Class 4 few secondary structures
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
8Secondary structure
Alternative SCOP
Class ? antiparallel ? sheets
Class ?/? mainly ? sheetswith intervening ?
helices
Class ? only ? helices
Class ?? mainlysegregated ? helices
withantiparallel ? sheets
Membrane structurehydrophobic ? helices
withmembrane bilayers
Multidomain containmore than one class
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
9Review of protein structure
Q If we have all the Psi and Phi angles in a
protein, do we then have enough information
to describe the 3-D structure?
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
10Tertiary structure
The tertiary structure describes the organization
in three dimensionsof all the atoms in the
polypeptide The tertiary structure is
determined by a combination of different types of
bonding(covalent bonds, ionic bonds, h-bonding,
hydrophobic interactions, Van der Waals forces)
between the side chains Many of these bonds
are very week and easy to break, but hundreds or
thousands working together give the protein
structure great stability If a protein
consists of only one polypeptide chain, this
level then describes thecomplete structure
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
11Tertiary structure
Proteins can be divided into two general classes
based on their tertiary structure - Fibrous
proteins have elongated structure with the
polypeptide chains arranged in long strands.
This class of proteins serves as major structural
component of cells Examples silk, keratin,
collagen
- Globular proteins have more compact, often
irregular structures. This class of proteins
includes most enzymes and most proteins
involved in gene expression and regulation
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
12Quaternary structure
The quaternary structure defines the conformation
assumed by a multimeric protein.The individual
polypeptide chains that make up a multimeric
protein are often referred toas protein
subunits. Subunits are joined by ionic, H and
hydrophobic interactions ExampleHaemoglobin(4
subunits)
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
13Structure displays
Common displays are (among others) cartoon,
spacefill, and backbone
spacefill
backbone
cartoon
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
14Summary protein structure
Sequence of amino acids
Interactions that occur betweenthe CO and N-H
groups on amino acids
Organization in three dimensions of all the atoms
in the polypeptide
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
Conformation assumed by a multimeric protein
15Need for analyses of protein structures
A protein performs metabolic, structural, or
regulatory functions in a cell. Cellular
biochemistry works based on interactions between
3-D molecular structures
The 3-D structure of a protein determines its
function
Therefore, the relationship of sequence to
function is primarily concerned with
understanding the 3-D folding of proteins and
inferring protein functions from these 3-D
structures(e.g. binding sites, catalytic
activities, interactions with other molecules)
The study of protein structure is not only of
fundamental scientific interest in terms of
understanding biochemical processes, but also
produces very valuable practical benefits
Medicine The understanding of enzyme function
allows the design of new and improved drugs
Agriculture Therapeutic proteins and drugs for
veterinary purposes and for treatment of plant
diseases Industry Protein engineering has
potential for the synthesis of enzymes to carry
out various industrial processes on a mass scale
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
16Sources of protein structure information
3-D macromolecular structures stored in databases
The most important database the Protein Data
Bank (PDB)The PDB is maintained by the Research
Collaboratory for Structural Bioinformatics
(RCSB) and can be accessed at three different
sites (plus a number of mirror sites outside the
USA) - http//rcsb.rutgers.edu/pdb (Rutgers
University)- http//www.rcsb.org/pdb/ (San Diego
Supercomputer Center)- http//tcsb.nist.gov/pdb/
(National Institute for Standards and
Technology) It is the very first
bioinformatics database ever build
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
17Sources of protein structure information
Experimental structure determination
In practice, most biomolecular structures (gt99
of structures in PDB) are determined using three
techniques- X-ray crystallography (low to very
high resolution) Problem requires crystals
difficult to crystallize proteins by maintaining
their native conformation not all protein can
be crystallized - Nuclear magnetic resonance
(NMR) spectroscopy of proteins in solution
(medium to high resolution) Problem Works only
with small and medium size proteins (50 of
proteins cannot be studied with this method)
requires high solubility - Electron microscopy
and crystallography (low to medium resolution)
Problem (still) relatively low resolution
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
Experimental methods are still very time
consuming and expensive in most cases the
experimental data will contain errors and/or are
incomplete. Thus the initial model needs to be
refined and rebuild
18Sources of protein structure information
Computational Modeling
Researches have been working for decades to
develop procedures for predicting protein
structure that are not so time consuming and not
hindered by size and solubility constrains. As
protein sequences are encoded in DNA, in
principle, it should therefore be possible to
translate a gene sequence into an amino acid
sequence, and topredict the three-dimensional
structure of the resulting chain from this amino
acid sequence
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
19Computational modeling
How to predict the protein structure?
Ab initio prediction of protein structure from
sequence not yet. Problem the information
contained in protein structures lies essentially
in theconformational torsion angles. Even if we
only assume that every amino-acid residuehas
three such torsion angles, and that each of these
three can only assume oneof three "ideal" values
(e.g., 60, 180 and -60 degrees), this still
leaves us with 27possible conformations per
residue.
For a typical 200-amino acid protein, this would
give 27200 (roughly 1.87 x 10286)possible
conformations!
Q Cant we just generate all these
conformations, calculate their energy and
see which conformation has the lowest
energy?
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
20Computational modeling
Solution homology modeling
Homology (comparative) modeling attempts to
predict structure on the strengthof a proteins
sequence similarity to another protein of known
structure
Basic idea a significant alignment of the query
sequence with a target sequence from PDB is
evidence that the query sequence has a similar
3-D structure (current threshold 40 sequence
identity). Then multiple sequence alignment and
pattern analysis can be used to predict the
structure of the protein
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
21Computational modeling summary
Partial or full sequencespredicted through gene
finding
Similarity searchagainst proteins in PDB
Find structures that have a significantlevel of
structural similarity (but notnecessarily
significant sequence similarity)
Alignment can be used to position theamino
acids of the query sequence inthe same
approximate 3-D structure
If member of a family with a predicted
structural fold, multiple alignment can be used
for structural modeling
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory
Infer structural information (e.g. presence of
smallamino acid motifs spacing and arrangement
ofamino acids certain typical amino acid
combinationsassociated with certain types of
secondary structure)can provide clues as to the
presence of active sites andregions of
secondary structure
Structural analyses in the lab(X-ray
crystallography, NMR)
22Computational modeling
Viewing protein structures
A number of molecular viewers are freely
available and run on most computer platforms and
operating systems Examples Cn3D 4.1
(stand-alone) Rasmol (stand-alone) Chime
(Web browser based on Rasmol) Swiss 3D viewer
Spdbv (stand-alone) All these viewers can use
the PDB identification code or the structural
file from PDB
Review of protein structures Computational
Modeling Three-dimensional structural analysis
in laboratory