Title: Protein threading algorithms
1Protein threading algorithms
Presented by Jian Qiu
- GenTHREADER Jones, D. T. JMB(1999) 287, 797-815
- Protein Fold Recognition by Prediction-based
Threading - Rost, B., Schneider, R. Sander, C.
JMB(1997)270,471-480
2Why do we need protein threading?
- To detect remote homologue ? Genome annotation
- Structures are better conserved than
sequences. - Remote homologues with low sequence
similarity may share - significant structure similarity.
- To predict protein structure based on structure
template - Protein A shares structure similarity
with protein B. - We could model the structure of protein
A using the structure - of protein B as a starting point.
3An successful example by GenTHREADER
- ORF MG276 from Mycoplasma genitalium was
predicted to - share structure similarity with 1HGX.
- MG276 shares a low sequence similarity (10
sequence - identity) with 1HGX.
- Supporting Evidence
- MG276 has an annotation of adenine
phosphoribosyltransferase, based on - high sequence similarity to the Escherichia
coli protein - 1HGX is a hypoxanthine-guanine-xanthine
phosphoribosyltransferase from - the protozoan parasite Tritrichomonas foetus.
- Four functionally important residues in 1HGX are
conserved in MG276. - The secondary structure prediction for ORF MG276
agrees very well with - the observed secondary structure of 1HGX.
4Structure of 1HGX
5Functional residue conservation between 1HGX and
MG276
6GenTHREADER Protocol
Sequence alignment
- For each template structure in the fold library,
related sequences were collected by using the
program BLASTP. - A multiple sequence alignment of these sequences
was generated with a simplified version of
MULTAL. - Get the optimal alignment between the target
sequence and the sequence profile of a template
structure with dynamic programming.
7Threading Potentials
Pairwise potential (the pairwise model family)
k sequence separation s
distance interval mab number of pairs ab
observed with sequence separation k s
weight given to each observation fk(s)
frequency of occurrence of all residue pairs
fkab(s) frequency of occurrence of residue pair
ab
8Solvation potential (the profile model family)
r the degree of residue burial the
number of other Cb atoms located within 10 Ã… of
the residue's Cb atom fa(r) frequency of
occurrence of residue a with burial r f (r)
frequency of occurrence of all residues with
burial r
9Variables considered to predict the relationship
- Pairwise energy score
- Solvation energy score
- Sequence alignment score
- Sequence alignment length
- Length of the structure
- Length of the target sequence
10Artificial Neural Network
A node
11Neural network architecture in GenTHREADER
12The effects of sequence alignment score and
pairwise potential on the Network
output
13Confidence level with different network scores
Medium(80)
High (99)
Certain (100)
Low
14Genome analysis of Mycoplasma genitalium
All the 468 ORFs were analyzed within one day.
15Distribution of protein folds in M. genitalium
16PHD Predict 1D structure from sequence
Sequence
MaxHom
Multiple Sequence Alignment
PHDsec
PHDacc
Secondary structure H(helix), E(strand), L(rest)
Solvent accessibility Buried(lt15),
Exposed(gt15)
17Threading Protocol
18Similarity matrix in dynamic programming
- Purely structure similarity matrix
- six states (combination of three secondary
structure states - and two solvent accessibility states)
- Purely sequence similarity matrix
- McLachlan or Blosum62
- Combination of strcture and sequence similarity
matrix -
MijmMij1D structure (100-m)Mijsequence
m0 sequence alignment only m100 1D
structure alignment only
19Performance of the algorithm
20Results on the 11 targets of CASP1
- Correctly detected the remote homologues at
first rank in four cases - Average percentage of correctly aligned
residues 21 - Average shift nine residues.
- Best performing methods in CASP1
- Expert-driven usage of THREADER by David Jones
and colleagues - detected five out of nine proteins correctly
at first rank. -
- Best alignments of the potential-based threading
method by Manfred - Sippl and colleagues were clearly better
than the best ones of this - algorithm.