Query sequence - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Query sequence

Description:

Start with sequence of template peptide: ... Align using MULTAL multiple sequence alignment method. ... It has only 10% sequence identity with our MG276! ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 28
Provided by: science4
Category:
Tags: query | sequence

less

Transcript and Presenter's Notes

Title: Query sequence


1
Structure-Sequence alignment Structure is
better preserved than sequence
Query sequence MTYKLILNGKTKGETTTEAVD
AATAEKVFQYANDNGVDGEWTYTE
Me!
Me!
Me!
Me!
2
How can we match a sequence and a structure?
MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE
Sequence Similar Sequences take this structure
(but remember sequence is less preserved than
structure)
Pair-InteractionHow well do AAs get along
(Positive hate positive? Maybe not?)
  • more
  • 2nd structures prediction.
  • 2nd structures constraints (ß-strands forming ß
    -sheets)
  • etc.

Solvation which AAs are buried?
3
GenTHREADER
An Efficient and Reliable Protein Fold
Recognition Method for Genomic Sequences David
T. Jones (1999)
What a good presentation! B. Raveh (2003)
4
GenTHREADER overview
Query sequence MTYKLILNGKTKGETTTEAVD
AATAEKVFQYANDNGVDGEWTYTE
Templates
  • For each template (in the Brookhaven PDB)
  • Construct a profile sequence
  • Align with query sequence
  • Calculate structural parameters (to be
    continued)
  • send parameters to a well-trained NEURON NETWORK
    (like PSIPred)
  • OUTPUT match confidence alignment

5
STAGE 1 Building a profile for each template
  • Start with sequence of template
    peptideMTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYAN
    DNGVDGVWTYDDATKTFTVTC
  • Run BLASTP on OWL non-redundant protein sequence
    data bank, with sequence as input.
  • Take all sequences with E-Value lt 0.01.
  • Align using MULTAL multiple sequence alignment
    method.
  • Construct a sequence profile based on BLOSUM 50
    matrix.

6
STAGE 2 Align sequence with a profile
MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE
SCORE ?
Length of query sequence ?
Length of alignment itself ?
Length of template profile ?
7
STAGE 3 calculate (some) structural parameters
In stage 2, the sequence was aligned to a profile
of the structure.
The aligned sequence is now imposed on the 3D
structure of the template, and used for ENERGY
POTENTIALS calculation.
8
STAGE 3 structural parameters (cont.)
E-Pair (pair interaction potential)
  • an energy potential for the probability of the
    interactions observed in this structure.
  • Distance and sequence separation between certain
    atoms of two different amino-acids are measured
    (Cß Cß , Cß - N, Cß O, etc.)
  • Statistics of known structures were gathered and
    weighted.
  • The observed interactions are compared to the
    statistics
  • An energy potential is calculated
  • In essence the smaller E-Pair, the better.

aa 39
aa 157
9
STAGE 3 structural parameters (cont.)
E-Solv (solvation potential)
  • Degree of burial (DOB) for an amino acid the
    number of other Cß atoms located within 10Å of
    the residues Cß atom
  • In general, hydrophobic amino acids like to be
    buried, safely away from water.
  • Hydrophilic acids might like the outside world
    better.
  • Each amino acid DOB is calculated.
  • Its compared to statistical occurrence.
  • ?Esolv(AA,r) -RT ln( f(AA,r) / f(r) )

Cß
10Ã…
Cß
Cß
Cß
Cß
Cß
10
STAGE 4 send it all to the (trained) Neuron
Network
Ouput is a score between 0-1 translated to
confidence level (Low, Medium, High Certain)
11
See this page on the web
12
Who trains the Neural network?
  • CAT numbers were used for comparing pairs.
  • 9169 chain pairs
  • 383 pairs shared a common domain fold ( should
    give a positive answer)
  • The network was trained with these pairs.

13
Neural network black box?
14
Confidence assignment
CERTAIN
LOW
MEDIUM
HIGH
15
GenTHREADER what to do with it?
  • Results on a classic test set of 68 proteins
  • High true-positive rate 73.5 correctly
    recognized, 48.5 with CERTAIN.
  • Extremely reliableEvery CERTAIN prediction
    was correct.
  • Fast automatic method.
  • For 22 of 68 proteins, alignment is over 50
    accurate.
  • Lets go analyze the Mycoplasma Genitalium with
    it!

16
Whole Genome Analysis with GenTHREADER
Mycoplasme Genitalium genome analysis ONE DAY
ONLY!
17
ORF MG276 of mycoplasma gen. spotting a remote
homologue
  • MG276 is an Adenine Phospho-ribosyl-transferase
    (but this information is not given to
    GenTHREADER)
  • 1HGX is a template of other Phospho-ribosyl-transf
    erase.
  • It has only 10 sequence identity with our MG276!
  • It was found by GenTHREADER as a certain match
  • E-Pair saved the situation!
  • But how do we know its true?

18
Ligand binding site of 1HGX template
19
ORF MG276 of mycoplasma gen. supporting
evidence for 1HGX as a template
  • We cheated all along

20
ORF MG353 of mycoplasma gen. an ORF with no
known function
  • MG353 no homologues found in databases
  • 1HUE is a template of an Histone-like protein
  • Very low sequence similarity with our MG353.
  • It was found by GenTHREADER as a certain match
  • Striking similarity in DNA Binding regiondespite
    overall low sequence similarity

21
GenTHREADER improvements(McGuffin, Jones - may
2003)
  • PSI-BLAST, PSI-PRED (2nd stuructures), some more
  • Some Results

22
AB-INITIO FOLDING - ROSETTA (Simons et al 1997,
1999, Bystroff Baker 1998, Bonneau et al
2001) Prediction of a protein fold from scratch?
Method I physically simulate protein
folding Problem CPU time Practical for short
peptides
APKFFRGGNWKMNGKRSLGELIHTLGDAKLSADTEVVCGI
APSITEKVVFQETKAIADNKD WSKVEVHESRIYGGSVTNC
K ELASQHDVDGFLVGGASLKPVDGFLHALAEGLGVDINAKH
Method II check probability for all possible
conformations Problem infinite search
space Solution use mother nature decrease
search space
23
Decreasing the search space using elements from
short peptides
  • Take fragments of short peptides (3 residues 9
    residues long).
  • Join them together
  • Keep the 2nd structures constant.
  • Play with the angles of loop residues.
  • RESULT 200,000 decoy structures

24
In addition - I-Sites prediction 13
local-structure 3D motifs with sequence profiles
  • Strong independence of motifs (fold-initiation
    sites?)
  • complements secondary structure

25
Find the correct fold for a given sequence (back
to threading)
  • P(sequence structure)
  • Solvation
  • 2nd structure amino acid (proline in helix,
    etc.)
  • Pair Interaction
  • ISites prediction for this sequence(3D motifs)
    did not contribute to performance
  • Etc.
  • P(structure) sequence independant
  • 2nd structure packing
  • Strand hydrogen bonding
  • Strand assembly in sheets
  • Structure compactness
  • Frequency of I-Sites 3D motifs
  • Etc.

26
RESULTS in CASP 4 Bakers a winner
27
We're done!
Write a Comment
User Comments (0)
About PowerShow.com