Title: Globular proteins
1Globular proteins Part II
Lecture 9 BINF 5230
1
2Let us shortly summarize the main points we
discussed in the previous lectures.
Forces controlling Protein Structure
van der Waals Forces There are both attractive
and repulsive van der Waals forces that control
protein folding. Although van der
Waals forces are extremely weak, relative to
other forces governing conformation, it is the
huge number of such interactions that occur in
large protein molecules that make them
significant to the folding of proteins.
3Electrostatic forces Typical charge-charge
interactions that favor protein folding are those
between oppositely charged amino acids as
K (Lys) or R (Arg)
and D (Asp)- or E (Glu) ?
4Covalent bonds between Cystein residues
5Hydrogen Bonding forms within protein chains and
between protein chains and water molecules
6Hydrophobic Forces Proteins are composed of
either hydrophilic or hydrophobic amino acids.
The interaction of the different amino acids
with the aqueous environment plays the major role
in folding.
It is a balance between the H-bonding of
hydrophilic residues and water environment and
the repulsion from the aqueous environment by
the hydrophobic residues. This balance of
interactions is the driving force that restrict
the available structure into which a protein may
fold.
7Bonding interaction with water
8Tertiary structure
9After we discussed the process of protein folding
we understand the existence of the hydrophobic
core in a spatial structure. The hydrophobic core
(or cores) of the proteins is surrounded by
?-helices and ?-sheets, while irregular loops are
moved towards the edge of the globule. The loops
almost never enter the interior
of the protein.
Why? Now we can answer this question.
In the irregular structure of loops
H-bonds dont participate in strands or helix
formation, but form bonds with water, because
otherwise the globule stability will be forced
Three Domains in the structure of the enzyme
pyruvate kinase
10. Â
The hydrophobic core of the protein structure
Conserved hydrophobic residues (orange) are
inside of the structure.
11Detailed structures of proteins are complex, so
we have to look at them in different ways.
Wire-frame model are the most accurate one. It
shows the main chain and amino acid side chains.
12Backbone wire model. It represent a main chain
fold only. Important side chains can be added
(see blue part).
13Space-filling model shows the molecular surface.
They mainly used to depict details of protein
protein and protein substrate interface
interaction. Backbone wire model (right) does not
give information about specific surface features
of a interface.
14Ribbons model mainly shows location strands
(arrows) and helices.
15We frequently use simplified models of protein
structure, i.e. pay attention on
the secondary structures and on their relative
orientation and pay NO attention on loop
structures, size of strands and helices
16Classification of Proteins ? Folding Patterns or
Folds A protein fold describe the order and
spatial relationship of strands and helices which
form the domain.
The number of folds is limited
. Â Â Â Â Â Â
Schematic drawing of protein folds
17Â Â Observation 80 of protein families can be
described by only 20 of observed
folds. Questions
Why do most proteins fit a
limited set of common folds? or
Why doesn't each protein have it own fold? or
Why do not all proteins
have one common fold (like DNA chains)? Â More
Questions
what is behind this limited number of
common folds? common ancestry?
common functions? or the necessity to
meet some general principles of folding for
stable protein structures?
18From history When solved structures was a very
few (70s) each structure was believed to be
absolutely unique. but slowly it appeared that
new structure resemble the old one it became
clear that there is the standard design for
protein architecture. although that the
functions, sequence are utterly different.
Idea the similarity of protein tertiary
structures is causes not only by evolutionary
divergence
and not (or not only) by
functional convergence of proteins,
but simply by restrictions imposed by some
physical regularities (laws) on protein fold.
19 The total number of possible amino acid
sequences is enormous. For a small protein of
100 amino acids, there are 20100 possible
combinations. Probably, only an extremely small
fraction of all possible sequences would be able
to fold into a stable globular protein with a
reasonably well packed hydrophobic core. It is
still possible that there are many stable folds
that do not exist in nature, since the existing
folds are the result of an evolutionary process
that has not tried all these sequence
combinations. The present estimate that there
are about one thousand folds. Of these folds, a
few hundred are known.
20Think about the analogy protein folding and house
building Compare with a constructor company To
be profit they have a set of standard design for
80 of typical people If the companies will
design every order it will take a lot money and
time, price of the house will be very high. Thus
typical people have standard typical house. What
about other 20 ? other or very rich (unique
palace) or very poor ( unique hut). Both these
groups create unique designs.
21 Structural classes of proteins
(?, ? ?-? Typical Architecture
(
barrel, sandwich, bundle ) Typical folds
topologies
(Ig-like, Tim Barrel,
Globin-like)
22Globular domains
23Rates of success of each step in structure
determination If you clone a certain number of
genes into cells, only about 40 will be
expressed. Of the genes which are expressed,
20-30 can be purified. Of the purified
proteins, almost 50 can be crystallized to yield
well crystals Once these crystals have been
obtained, it is normally 80 possible to be able
to determine the structure. So at the end of all
of this you can only determine the structure of
about 3-4 of your original proteins.
24How correctly assign the determined structure to
a proper Fold ? To find fold pattern we need to
understand what is the main real characteristics
of the structures which belong to a given
fold. for example, the number of the strands and
helices ? The problem, however, that not all
strands and helices are taken in account. Small
secondary elements can be disregard. To find a
folding pattern it is a little more art than
technique
25The best way for classification is to Define
pairwise sequence identity of a query protein
with already existed. Hence, the goal is to
determine one structure for each family of
proteins and then align a new protein for
classification.
26Thus, it may seem that all bioinformatics has
to do is to
cluster all proteins into families of proteins
with similar structures, exclude all clusters
with known structures and define the remaining
list as the target list for structural
genomics. Currently over 18 000
targets were found. Target space for structural
genomics revisited J.Liu and B. Rost
BIOINFORMATICS Vol. 18 no. 7 2002 Pages
922933 This is the Structural Genomics Project
27Modelling of protein 3D structure
If two protein sequences are gt35 identical, and
one of the proteins has a known 3D structure, a
model of the second protein can be created by
homology modelling. sequence identity number
of identical residues 100
number of residues in smaller protein
Computational sequence alignment
VLSPADKTNVKAAWGKVLT -VLSPADK-TNVKAAWGKV VHLTPEEKES
AVTALWGKV VHLTPEEKESAVTALWGKV
28Identifying protein relatives
Query ASAVIFG_SEW sequence identity Match
ATAVIFGNWEW 8/10 80 Related protein sequences
can be aligned to give a multiple sequence
alignment. Conserved positions or blocks of
positions may indicate functionally important
residues. VWEELDGLDPNRFNPKTFFILHDINSDGVLDEQELEAL
FTKELEKVYDPK VWEELDGLDPNRFNPKTFFILHDINSDGVLDE
QELEALFTKELEKVYDPK VWEESDHLEKDQYDPKTFFALHDLNGDGFW
NDFELESLFQLELEKMYNET VWEKQDHMDKNDFDPKTFFSIHDVDSNG
YWDEAEVKALFVKELDKVYQSD VWEKQDHMDKNDFDPKTFFSIHDVDS
NGYWDEAEVKALFVKELDKVYQSD
29BLAST best hit gtgi17472322refXP_061555.1
(XM_061555) similar to orphan G protein-coupled
receptor GPR26 Homo sapiens
Length 337 Score 298 bits (762), Expect
8e-80 Identities 168/327 (51) Query 1
MGPGEALLAGLLVMVLAVALLSNALVLLCCAYSAELRTRASGVLLVNLSL
GHLLLAALDM 60 M A LAGLLV
VLLSNALVLLC SAR A NL GLL
M Sbjct 1 MNSWNAGLAGLLVGTIGVSLLSNALVLLCLLHSAD
IRRQAPALFTLNLTCGNLLCTVVNM 60 Query 61
PFTLLGVMRGRTPSAPGACQVIGFLDTFLASNAALSVAALSADQWLAVGF
PLRYAGRLRP 120 P TL GV R P C
FLDTFLAN LSAALS DWAV FPL Y R Sbjct 61
PLTLAGVVAQRQPAGDRLCRLAAFLDTFLAANSMLSMAALSIDRWVAVVF
PLSYRAKMRL 120 Query 121 RYAGLLLGCAWGQSLAFSGAALG
CSWLGYSSAFASCSLRLPPEPERPRFAAFTATLHAVG 180
R A L W L F AAL SWLG ASCL ER
RFA FT HA Sbjct 121 RDAALMVAYTWLHALTFPAAALALS
WLGFHQLYASCTLCSRRPDERLRFAVFTGAFHALS 180
30The accuracy of a comparative model is related to
the sequence identity on which it is based
The accuracy of protein .
structure modeling Predicted structures are
in red, and actual structures are in
blue. Comparative models based on about 60
(A),40 (B), and
30 (C) sequence identity totheir
template structure.
(D and E) Examplesof de novo structure
predictions forthe CASP4 structure prediction
experiment.The accuracy of themodels decrease
significantly in going from(A) to (E), but the
overall structure is still roughly correct.
31Comparing de novo Predictions with X-Ray
Structures ?
?
32Databases of folds The major source of
information about protein structures is the
database in the Protein Data Bank. The
recognition of the folding pattern is based on
structural classification of proteins. Currently
widely use 2 databases
SCOP (Structural Classification of Proteins)
33The genome project what is the next? With the
completion of the sequencing of the genomes of
human and other organisms, attention has focused
on the characterization and function and
structures of proteins, the products of genes.
The objective is to make protein structures
widely available for clinical and basic studies
that will expand the knowledge of the role of
proteins both in normal biological processes and
in disease. The National Institute of General
Medical Sciences (NIGMS) organized a national
program, the Protein Structure Initiative in
1999. http//www.structuralgenomics.org/
34Protein Structure Initiative (PSI) The PSI is
accomplishing its goal by determining unique
protein structures in a high-throughput mode of
operation using X-ray crystallography and NMR
spectroscopy to achieve a systematic sampling of
major protein families and thus create a large
collection of protein structures. These
experimentally determined structures will be used
as templates for computational modeling of
related sequence homologs to produce structural
coverage of a majority of sequenced genes.
The long-range goal of PSI is to make the
three-dimensional atomic-level structures of most
proteins easily obtainable from knowledge of
their corresponding DNA sequences.
35Benefits of PSI ? Structural descriptions will
help researchers illuminate structure-function
relationships and thus formulate better
hypotheses and design better experiments. ? The
PSI collection of structures will serve as the
starting point for structure-based drug
development by permitting faster identification
of lead compounds and their optimization. ? The
design of better therapeutics will result from
comparisons of the structures of proteins that
are from pathogenic and host organisms and from
normal and diseased human tissues. ? The PSI
collection of structures will assist biomedical
investigators in research studies of key
biophysical and biochemical problems, such as
protein folding, evolution, structure prediction,
and the organization of protein families and
folds.
36The Structural Genomics Project aims at
determination of the 3D structure of all
proteins. This aim can be achieved in four steps
Organize known protein sequences into
families. Select family representatives as
targets. Solve the 3D structure of targets by
X-ray crystallography or NMR spectroscopy. Build
models for other proteins by homology to solved
3D structures.
37PSI Pilot Phase Project period September 2000
to June 2005 Funding 270 million Number of
Centers 9 (Northeast Structural Genomics
Consortium, based in New Jersey, focused on
target proteins from various model organisms,
including the fruit fly, yeast, and
roundworm) Solved protein structures gt
1,100 Unique structures solved gt 700
(structures sharing less than 30 percent
of their sequence with other known proteins).
Crystal structure of a bacterium that thrives
in boiling water.
38 Next question Do we see evolution of protein
structures? Â contains two questions 1) Is there
a connection between a change of the entire
organism and a change in the protein
structure? (a microscopic evolution of
proteins) 2) Do the protein structures become
more complex with increasing complexity of the
organism? (a macroscopic evolution of proteins)
39 a microscopic evolution of
proteins? the question Are structures of same
proteins differs in different organisms?
The Answer YES,
there is a microscopic evolution of proteins
The same organisms, which are living in different
conditions, are differed slightly from each
other. organism
protein Llama (mountain animal)
"mountain" Hemoglobin and Animal is
living on the plain "plain" Hemoglobin
"mountain" Hemoglobin binds oxygen stronger. It
was found which microscopic changes in the
hemoglobin structure are responsible for this
strengthening. organism little changed and
proteins slightly changed microscopic evolution
40How can we explain Protein variability For
example Alternative splicing allows a gene to
code for multiple proteins and Evolution
41Alternative Splicing produce different versions
of the mRNA and ultimately, different proteins
The exons are shown as colored boxes,
the introns as lines
. A pre-mRNA can be spliced in two different
ways
On the left, the RNA is splice to include the
exons
S, V, Cm1, Cm2, Cm3, Cm4, and C
(This form is translated as part of a .
secreted antibody) On the right - a splicing
pattern that includes the exons
S, V, Cm1, Cm2, Cm3, Cm4 and M (
This form of the mRNA is translated into a
protein with a transmembrane anchor region (M)
and therefore winds up in the plasma
membrane of the cell that produces it.
42Evolution often occurs through amplification of a
gene with subsequent mutations of its copies,
so
that one copy of this gene keeps maintaining the
previous function, while another copy (or
copies) become free to mutate
In result, the proteins
function is changed to adapt to a biological need.
Evolution often occurs through domain migration.
Domain-encoding gene can migrate as a whole from
one protein to another. (for example calcium
binding domain of calmodulin and parvalbumin
43A macroscopic evolution of proteins. Do the
protein structures become more complex with
increasing complexity of the organism? The
Answer NO, there is no a macroscopic evolution
of proteins. The same folding patterns are
observed both in eukaryotes and prokaryotes In
other words, we do not see that proteins become
more complicated with increasing organism
complexity. Do the eukaryotic and prokaryote
proteins differ from each other?
We do not see difference of folds of globular
proteins of pro- and eukariotes.
44However, The eukaryotic proteins can be larger in
size. It is connected with the fact that higher
organisms have many domains.
Remarks However, there exists one more important
macroscopic structural difference, though not
connected with the chain folds. It is as
follows proteins of eukaryotes, of multicellular
ones in particular have much more
co- and post-translational chemical
modifications (like
glycosylation, iodination, etc.). Modification
sites are marked by the primary structure, while
the modification is carried out by special enzymes
45With a huge number of classified protein
structures available, we found that there are
exist few typical folding patterns. we found that
they are relatively simple and regular we found
the same folding patterns in functionally
different proteins. These observations
arise the questions about folding patterns
What is the physical reason for simplicity and
regularity of typical folding patterns? What
folding patters are most probable in the light of
protein physics laws,(theoretical possible
folding patterns)? How numerous are
these patterns? To what extent they
coincide with folding patterns observed in native
proteins?
46To answer these questions, we will first of all
study stability of various structures. From
structural observation follows that stability of
a domain requires . a close
packing of ?- and ?-sheets
. ?- and
? regions should extend from one edge of the
globule to the other . The irregular regions
should be outside the globule. The reason of it
. The H-bonds are
energetically expensive and
. therefore must
be saturated in any stable protein structure.
Each amino acid has the possibility to form H
bond. It possible either in H-bonds in water or
in secondary structure formation. 1. Thus only
secondary structures (but not irregular loops)
have the right to be out of contact with water
and belong to the proteins interior. 2. The
irregular loops should emerge to the surface.
473. ?-helices and ?-sheets cannot share the same
layer because in this case H-bonds of the ?-sheet
would be lost.
?-sheet
?-helix
?-helix
?-helix
?-helix
?-sheet
?-sheet
Thus separate ? and ?-layers are stable elements
of the globule, while mixed energy defect
The large majority of domains can be represented
by
two-, three- or four layer packing.
48Popular structures are simple
? and ? layers
49 Domains with more than four layers are
extremely rare,
.
and it is clear why. They would contain
too many residue positions screened from water,
But for the 11 ratio of polar
and non-polar residues typical of globular
proteins many polar residues would be brought
into the interior of the globule. This is
energetically most unfavorable, and
.
such a protein would be unstable. That
is why very large (and hence, many layered)
globules must be unstable, so large
proteins have to be divided into several
domains
50We showed how a random sequence affects to the 3D
structure properties.
Globular proteins have no blocks typical for
membrane proteins
have no periodicity characteristics of fibrous
proteins
51What does it mean random or quasi random
sequences It mean that in totality the traces of
selection of protein forming are not seen as
clearly as the traces of selection for
periodicity in fibrous proteins or the
traces of selection for blocking in membrane
proteins. So, spatial structures are usually
stabilized by most common, random sequences.
52 Protein chain has no
self-intersection The loops usually connect
antiparallel but not parallel strands or helices
and
that
they do not intercross, i.e. do not form knots in
the chain
Why? Why is it so important to avoid
intercrossing?
no loop crossing
53What is wrong about loop crossing? In fact, we
dont mean that one loops runs into another, we
only mean that one loop passes over another. then
a lower loop is pressed to the core (the
upper loop screens the lower loop from water) and
loses some of its hydrogen bonds to waters? to
compensate for this loss (the energy defect)
a rare specific sequence is needed.
54The main conclusions 1. A defect-free
structure can be stabilized by
many sequences, a structure with a
defect can be stabilized by only
a small number of selected
sequences. 2. The main condition of protein
stability is a
screening of non-polar groups from water and
H-bonding of all main-chain peptide groups
immersed into the compact globule. 3. The
standard folding patterns have a compact
layered packing of ?-helices and ?-strands. Their
connections avoid intercrossing. This simple and
regular arrangement is most favorable for the
globules stability.Â
554. The number of such standard stable folding
patterns is not large (they are hundreds, while
proteins are many tens of thousands). Therefore,
some of these common structures are shared by
proteins different in all other respects. Â 5.
Defective folding patterns (the patterns which
do not strictly obey all packing rules) are not
prohibited either. They are rare, since only a
small number of sequences can ensure their
stability. The greater defect, the lower
occurrence of "wrong" folds.
56- Home assignment
- The Role of misfolded proteins in the cell?How
does the cell respond to misfolded proteins? - The role of molecular chaperones in the protein
formation ?