Title: Homework grades
1Homework grades
0-14 (D) 15-19 (C) 20-24 (B) 25-30(A)
HW1 5 1 9 5
HW2 0 6 4 9
2Protein folding.Anfinsens experiments.
Assumption amino acid sequence completely and
uniquely determines the protein tertiary
structure. Protein folding problem find native
conformation among the large number of
alternative conformations. Ex polypeptide chain
of 100 residues can have 9100 different
conformations.
3Protein folding step by step.
Disordered globule hydrophobic inside,
hydrophilic - outside
Extended chain
Native highly ordered conformation
- Factors which influence the thermodynamics and
kinetics of protein folding - size, amino acid content, hydrophobic/hydrophilic
content - strength of intramolecular interactions, number
of S-S bonds - domain architecture
4Fold recognition.
- Unsolved problem direct prediction of protein
structure from the physico-chemical principles. - Solved problem to recognize, which of known
folds are similar to the fold of unknown protein. - Fold recognition is based on observations/assumpti
ons - The overall number of different protein folds is
limited (1000-3000 folds) - The native protein structure is in its ground
state (minimum energy)
5Definition of protein folds.
- Protein fold arrangement of secondary
structures into a unique topology/tertiary
structure. - Example of alphabeta proteins
- TIM beta/alpha-barrel contains parallel
beta-sheet barrel, closed n8, S8 - strand order 12345678, surrounded by
alpha-helices
- NAD(P)-binding Rossmann-fold domains core 3
layers, a/b/a parallel beta-sheet of 6 strands, - order 321456
6Protein structure prediction.
- Prediction of three-dimensional structure from
its protein sequence. Different approaches - Homology modeling (predicted structure has a very
close homolog in the structure database). - Fold recognition (predicted structure has an
existing fold). - Ab initio prediction (predicted structure has a
new fold).
7Homology modeling.
- Aims to produce protein models with accuracy
close to experimental and is used for - Protein structure prediction
- Drug design
- Prediction of functionally important sites
(active or binding sites)
8Steps of homology modeling.
- Template recognition initial alignment.
- Backbone generation.
- Loop modeling.
- Side-chain modeling.
- Model optimization.
- Model validation.
91. Template recognition.
- Recognition of similarity between the target and
template. - Target protein with unknown structure.
- Template protein with known structure.
- Main difficulty deciding which template to
pick, multiple choices/template structures. - Template structure can be found by searching for
structures in PDB using sequence-sequence
alignment methods.
10Two zones of sequence alignment.Two sequences
are guaranteed to fold into the same structure if
their length and sequence identity fall into
safe zone.
Sequence identity
100
Safe homology modeling zone
50
Twilight zone
50
100
150
200
Alignment length
112. Backbone generation.
- If alignment between target and template is
ready, copy the backbone coordinates of those
template residues that are aligned. - If two aligned residues are the same, copy their
side chain coordinates as well.
123. Insertions and deletions.
- insertion
- AHYATPTTT
- AH---TPSS
- deletion
- Occur mostly between secondary structures, in the
loop regions. Loop conformations difficult to
predict. - Approaches to loop modeling
- Knowledge-based searches the PDB for loops with
known structure - Energy-based an energy function is used to
evaluate the quality of a loop. Energy
minimization or Monte Carlo.
134. Side chain modeling.
- Side chain conformations rotamers. In similar
proteins - side chains have similar
conformations. -
- If identity is high - side chain conformations
can be copied from template to target. - If identity is not very high - modeling of side
chains using libraries of rotamers and different
rotamers are scored with energy functions.
E2
E3
E1
E min(E1, E2, E3)
145. Model optimization.
- Energy optimization of entire structure.
- Since conformation of backbone depends on
conformations of side chains and vice versa -
iteration approach
Predict rotamers
Shift in backbone
156. Model validation.
- Correct bond length and bond angles
- Correct placement of functionally important sites
-
gtgt 3.8 Angstroms
16Classwork I Homology modeling.
- Go to NCBI Entrez, search for gi461699
- Do Blast search against PDB
- Do CD-search.
17Fold recognition.
- Goal to find in PDB a fold which best matches a
given sequence. - Since similarity between target and the closest
to it template is not high, sequence-sequence
alignment methods fail to find a closest match. - Solution threading sequence-structure
alignment method.
18Threading method for structure prediction.
- Sequence-structure alignment, target sequence is
compared to all structural templates from the
database. - Requires
- Alignment method (dynamic programing, Monte
Carlo,) - Scoring function, which yields relative score for
each alternative alignment
19Scoring function for threading.
Contact-based scoring function depends on the
amino acid types of two residues and distance
between them. Sequence-sequence alignment
scoring function does not depend on the distance
between two residues. If distance between two
non-adjacent residues in the template is less
than 8 Å, these residues make a contact.
20Scoring function for threading.
Ala
Trp
Tyr
Ile
w is calculated from the frequency of amino acid
contacts in protein structures ai amino acid
type of target sequence aligned with the position
i of the template N- number of contacts
21Classwork II calculate the score for target
sequence ATPIIGGLPY aligned to template
structure which is defined by the contact matrix.
A T P Y I G L
A -0.2 -0.1 0 -0.1 0.5 -0.2 0.2
T 0.3 -0.1 -0.2 -0.3 0.1 0
P -0.2 -0.4 -0.1 0.1 -0.2
Y -0.4 -0.2 -0.1 -0.2
I 0.3 0.2 0.4
G 0.4 0.2
L 0.3
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
22GenThreader http//bioinf.cs.ucl.ac.uk/psipred.
- Predicts secondary structures for target
sequence. - Makes sequence profiles (PSSMs) for each template
sequence. - Uses threading scoring function to find the best
matching profile. - Evaluates sequence-structure quality using neural
networs.
23Evaluation of model accuracy
Root Mean Square Deviation
dij (A) - distance between residues i and j in
template structure A dij (B) distance
between residues iand j in predicted structure
of target sequence B. Residues i and j in
template structure A are aligned to residues
iand j in predicted structure B
24CASP prediction competitions.
25Classwork III.
- Go to http//bioinf.cs.ucl.ac.uk/psipred
- Go over the options of protein structure
prediction program - http//bioinf2.cs.ucl.ac.uk/psiout/271020e9dc74fea
d.mgen.html