Single protein analysis - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Single protein analysis

Description:

Hydrophobic regions (membrane spanning) Coiled coil. Secondary structure ... Consensus pattern: [LIVM]-[ST]-A-[STAG]-H-C [H is the active site residue] ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 22
Provided by: peter217
Category:

less

Transcript and Presenter's Notes

Title: Single protein analysis


1
Single protein analysis
  • Peter Højrup

2
Single protein analysis
  • Physical chemical constants
  • Mass
  • pI
  • Net charge _at_ pH
  • Hydrophobic index
  • Absorption coefficient
  • Etc.. etc..
  • Structure analysis
  • Hydrophobic regions (membrane spanning)
  • Coiled coil
  • Secondary structure
  • Secondary database analysis

3
Physical/chemical constants
Number of amino acids 223 Molecular weight
23475.5 Theoretical pI 8.26 Amino acid
composition Ala (A) 15 6.7 Arg (R) 4 1.8
Asn (N) 18 8.1 .... Total number of negatively
charged residues (Asp Glu) 11 Total number of
positively charged residues (Arg Lys)
14 Atomic composition Carbon C
1020 Hydrogen H 1607 Nitrogen N
287 Oxygen O 321 Sulfur S
14 Formula C1020H1607N287O321S14 Total number
of atoms 3249 276
278 279 280 282
nm nm nm nm nm Ext.
coefficient 34070 34362 34120 33720
32720 Abs 0.1 (1 g/l) 1.451 1.464 1.453
1.436 1.394 Estimated half-life The N-terminal
of the sequence considered is I (Ile). The
estimated half-life is 20 hours (mammalian
reticulocytes, in vitro).
30 min (yeast, in vivo).
gt10 hours (Escherichia coli, in
vivo). Instability index The instability index
(II) is computed to be 32.39 This classifies the
protein as stable. Aliphatic index 83.50 Grand
average of hydropathicity (GRAVY) -0.037
Expasy - ProtParam
GPMAW
4
Sliding window concept
Charge density, window size 7 Sequence K L A W Y
I G D E H N R T P A E V T G H I K F R T E N Q W P
P Charge 1 0 0 0 0 0 0-1-1 1 0 1 0 0 0-1 0 0 0
1 0 1 0 1 0-1 0 0 0 0 0 lt---win1----gt
value 1, residue 4 (W)
lt---win2----gt value -1, residue 5 (Y)
lt---win3----gt value -2, residue 6 (I)
Window size 3
Typical uses Phys/chem properties Dot-plot
analysis Other residue assignable parameters
Window size 13
5
Hydrophobicity
The average hydrophibicity in a sliding window
can tell us about transmembrane regions and
propeptides
Window size 3
Use a window size corresponding to the feature
you are looking for!
Window size 13
6
Prediction of coiled coils
Coiled coil
7
Secondary structure prediction
Trypsin precursor
8
Secondary databases
Secondary databases Secondary databases are
based on the observation that certain residues
are conserved in proteins in a specific pattern
for specific functions and/or families of
proteins. This means that if we can recognize
these patterns, and discreminate them from random
sequences, we have the possibility to identify
function directly from the primary structure
(even without a search for homologous
sequences). Furthermore, functions may be
obscured by homology searches if the search
hits other proteins that only contain parts of
the search sequence.
9
  • Secondary databases
  • Secondary databases are built based on the
    primary databases with the input of alignment of
    proteins of known functions.
  • The record in a secondary database can be
  • A regular expression
  • Fingerprints
  • Blocks
  • Profiles
  • Hidden Markov models (HMM)

10
Pattern databases
11
Patterns databases
  • PROSITEGroups of proteins of similar biochemical
    function on basis of amino acid patterns.
  • BLOCKSDerived from the PROSITE database. Uses
    PSSM position specific scoring matrices.

12
PROSITE ENTRY
PDOC00124 Serine protease, trypsin family,
signatures and profiles -Consensus pattern
LIVM-ST-A-STAG-H-C H is the active site
residue -Sequences known to belong to this
class detected by the pattern ALL, except for
complement components C1r and C1s, pig
plasminogen, bovine protein C, rodent urokinase,
ancrod, gyroxin and two insect trypsins. -Other
sequence(s) detected in SWISS-PROT 18.
-Consensus pattern DNSTAGC-GSTAPIMVQH-x(2)-
G-DE-S-G-GS-SAPHV- LIVMFYWH-LIVMFYSTANQH
S is the active site residue -Sequences known
to belong to this class detected by the pattern
ALL, except for 18 different proteases which have
lost the first conserved glycine. -Other
sequence(s) detected in SWISS-PROT 8.
-Sequences known to belong to this class
detected by the profile ALL. -Other sequence(s)
detected in SWISS-PROT NONE. -Note if a
protein includes both the serine and the
histidine active site signatures, the probability
of it being a trypsin family serine protease is
100 -Note this documentation entry is linked
to both a signature pattern and a profile. As the
profile is much more sensitive than the pattern,
you should use it if you have access to the
necessary software tools to do so.
13
BLOCKS entry
  • Width number of residues in block
  • Seqs number of sequences in the block
  • Parts of the overall alignment are clustered (80
    identity)
  • Last column is position-based sequence weight
    (100 most distant)

14
Regular expressions Regular expressions are a
way to describe sequence patterns. For example
the N-glycosylation pattern of first Ser/Thr/Cys
followed by any residue except Pro followed by
Asn STC-P-N Sequence positions are
separated by dash Completely conserved
residues are written N Choice between multiple
residues are written in sharp brackets
STC Disallowed characters are written in curly
brackets P Any characters are written as
xs Multiple occurrences are numbers after the
specification, e.g. any four residues are x4 two
after each other following positions have to be
Glu or Asp ED2
15
Regular expressions Examples Kringle
region FY-C-R-N-P-DNR Myristylation G-EDRKHP
FYW-x(2)-STAGCNI-Y Glycine has to be
N-terminal this is not recognized by the search
program! Tyrosine kinase phosphorylation
site RK-x(2,3)-DE-x(2,3)-Y Serine protease
active site DACEGDSGGPFV
16
Rules Short patterns that not associated with
any families Examples N-Glycosylation
STC-P-N Cell attachment site RGD ER
retention sequence KDEL
17
Fuzzy regular expressions Fuzzy regular
expressions is a special case used by the MOTIF
system (based on BLOCKS and PRINTS). The
fuzziness comes in when specific amino acid
residues are replaced by fuzzy amino acids, e.g.
D replaced by DENQ, V by VLI. This increases
the sensitivity, but the signal-to-noise ratio is
much worse.
18
Fingerprints Fingerprints are matrices of
alignments populated by residue frequencies
observed at each position of the matrix
(Position-Specific Scoring Matrix
PSSM). Blocks Blocks are ungapped aligned
segments. Scoring is usually with AA substitution
matrices (e.g. BLOSUM). The scoring is increased
by searching for multiple blocks with proper
spacing. Profiles Profiles are complete
alignments that are distilled into scoring
tables, including information leading to
insertions and deletions (INDELs).
19
Hidden Markov Models (HMMs) Hidden Markov Models
are probabilistic models that are used to encode
linear chains of match, insertions and deletions.
20
Hidden Markov Model Part of a Hidden Markov
Model trained for recognition of globin sequences.
Line width indicates extent of use of path. Bar
length indicate amino acid distribution. Deletion
of residue 56-60 happens thus in 50 of the
training sequences.
21
Hidden Markov Models (HMMs) HMMs are used for
predicting a large number of other modifications
(e.g. Activation peptide, phosphorylations,
O-glycosylation, subcellular localization,
proteosomal cleavages etc.). NOTE Even though
HMMs have the most advanced mathematics, they
dont always perform as well as models based on
the PROSITE database which is mainly based on
hand-crafted alignments. Always use several
secondary database models and always check the
definitions of the entry in the database. E.g.
PROSITE has a description of each entry,
including the specificity of the search. Note
also that many of the sites, particularly enzyme
targets like phosphorylation, glycosylation etc.,
only indicates a potential modification site. The
actual modification is dependent on physical
exposure of the residues to enzyme, both in space
and time.
Write a Comment
User Comments (0)
About PowerShow.com