Protein threading algorithms - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Protein threading algorithms

Description:

Protein Fold Recognition by Prediction-based Threading ... H(helix), E(strand), L(rest) Solvent accessibility: Buried( 15%), Exposed( =15 ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 21
Provided by: jia62
Category:

less

Transcript and Presenter's Notes

Title: Protein threading algorithms


1
Protein threading algorithms
Presented by Jian Qiu
  • GenTHREADER Jones, D. T. JMB(1999) 287, 797-815
  • Protein Fold Recognition by Prediction-based
    Threading
  • Rost, B., Schneider, R. Sander, C.
    JMB(1997)270,471-480

2
Why do we need protein threading?
  • To detect remote homologue ? Genome annotation
  • Structures are better conserved than
    sequences.
  • Remote homologues with low sequence
    similarity may share
  • significant structure similarity.
  • To predict protein structure based on structure
    template
  • Protein A shares structure similarity
    with protein B.
  • We could model the structure of protein
    A using the structure
  • of protein B as a starting point.

3
An successful example by GenTHREADER
  • ORF MG276 from Mycoplasma genitalium was
    predicted to
  • share structure similarity with 1HGX.
  • MG276 shares a low sequence similarity (10
    sequence
  • identity) with 1HGX.
  • Supporting Evidence
  • MG276 has an annotation of adenine
    phosphoribosyltransferase, based on
  • high sequence similarity to the Escherichia
    coli protein
  • 1HGX is a hypoxanthine-guanine-xanthine
    phosphoribosyltransferase from
  • the protozoan parasite Tritrichomonas foetus.
  • Four functionally important residues in 1HGX are
    conserved in MG276.
  • The secondary structure prediction for ORF MG276
    agrees very well with
  • the observed secondary structure of 1HGX.

4
Structure of 1HGX
5
Functional residue conservation between 1HGX and
MG276
6
GenTHREADER Protocol
Sequence alignment
  • For each template structure in the fold library,
    related sequences were collected by using the
    program BLASTP.
  • A multiple sequence alignment of these sequences
    was generated with a simplified version of
    MULTAL.
  • Get the optimal alignment between the target
    sequence and the sequence profile of a template
    structure with dynamic programming.

7
Threading Potentials
Pairwise potential (the pairwise model family)
k sequence separation s
distance interval mab number of pairs ab
observed with sequence separation k s
weight given to each observation fk(s)
frequency of occurrence of all residue pairs
fkab(s) frequency of occurrence of residue pair
ab
8
Solvation potential (the profile model family)
r the degree of residue burial the
number of other Cb atoms located within 10 Ã… of
the residue's Cb atom fa(r) frequency of
occurrence of residue a with burial r f (r)
frequency of occurrence of all residues with
burial r
9
Variables considered to predict the relationship
  • Pairwise energy score
  • Solvation energy score
  • Sequence alignment score
  • Sequence alignment length
  • Length of the structure
  • Length of the target sequence

10
Artificial Neural Network
A node
11
Neural network architecture in GenTHREADER
12
The effects of sequence alignment score and
pairwise potential on the Network
output
13
Confidence level with different network scores
Medium(80)
High (99)
Certain (100)
Low
14
Genome analysis of Mycoplasma genitalium
All the 468 ORFs were analyzed within one day.
15
Distribution of protein folds in M. genitalium
16
PHD Predict 1D structure from sequence
Sequence
MaxHom
Multiple Sequence Alignment
PHDsec
PHDacc
Secondary structure H(helix), E(strand), L(rest)
Solvent accessibility Buried(lt15),
Exposed(gt15)
17
Threading Protocol
18
Similarity matrix in dynamic programming
  • Purely structure similarity matrix
  • six states (combination of three secondary
    structure states
  • and two solvent accessibility states)
  • Purely sequence similarity matrix
  • McLachlan or Blosum62
  • Combination of strcture and sequence similarity
    matrix

MijmMij1D structure (100-m)Mijsequence
m0 sequence alignment only m100 1D
structure alignment only
19
Performance of the algorithm
20
Results on the 11 targets of CASP1
  • Correctly detected the remote homologues at
    first rank in four cases
  • Average percentage of correctly aligned
    residues 21
  • Average shift nine residues.
  • Best performing methods in CASP1
  • Expert-driven usage of THREADER by David Jones
    and colleagues
  • detected five out of nine proteins correctly
    at first rank.
  • Best alignments of the potential-based threading
    method by Manfred
  • Sippl and colleagues were clearly better
    than the best ones of this
  • algorithm.
Write a Comment
User Comments (0)
About PowerShow.com