Protein Feature Identification presentation

About This Presentation

Transcript and Presenter's Notes

Title: Protein Feature Identification

1
Protein Feature Identification

Microbiology 343
David Wishart
david.wishart_at_ualberta.ca

2
Objectives

To show that almost everything you do in the lab
or what you need to do to work with a protein can
be done on a computer
Learning methods and algorithms for predicting
composition and sequence features
Learning when to use these tools

3
Proteins

Exhibit far more sequence and chemical
complexity than DNA or RNA
Properties and structure are defined by the
sequence and side chains of their constituent
amino acids
The engines of life
gt95 of all drugs targets are proteins
Favorite topic of post-genomic era

4
The Post-genomic Challenge

How to rapidly identify a protein?
How to rapidly purify a protein?
How to identify post-trans modification?
How to find information about function?
How to find information about activity?
How to find information about location?
How to find information about structure?

Answer Look at Protein Features
5
Protein Features
ACEDFHIKNMF SDQWWIPANMC ASDFDPQWERE LIQNMDKQERT QA
TRPQDS...

Sequence View Structure View
6
Different Types of Features

Composition Features
Mass, pI, Absorptivity, Rg, Volume
Sequence Features
Active sites, Binding Sites, Targeting, Location,
Property Profiles, 2o structure
Structure Features
Supersecondary Structure, Global Fold, ASA,
Volume

7
Where To Go
http//www.expasy.org/tools/
8
Compositional Features

Molecular Weight
Amino Acid Frequency
Isoelectric Point
UV Absorptivity
Solubility, Size, Shape
Radius of Gyration
Free Energy of Folding

9
Molecular Weight
10
Molecular Weight

Useful for SDS PAGE and 2D gel analysis
Useful for deciding on SEC matrix
Useful for deciding on MWC for dialysis
Essential in synthetic peptide analysis
Essential in peptide sequencing (classical or
mass-spectrometry based)
Essential in proteomics and high throughput
protein characterization

11
Molecular Weight

Crude MW calculation MW 110 X Numres
Exact MW calculation MW SAAi x MWi
Remember to add 1 water (18.01 amu) after adding
all res.
Note isotopic weights
Corrections for CHO, PO4, Acetyl, CONH2

12
Amino Acid versus Residue
R
R
C
C
CO
N
COOH
H2N
H
H
H
Amino Acid Residue
13
Protein Identification via MW

MOWSE
http//srs.hgmp.mrc.ac.uk/cgi-bin/mowse
CombSearch
http//ca.expasy.org/tools/CombSearch/
Mascot
http//www.matrixscience.com/search_form_select.ht
ml
AACompSim/AACompIdent
http//ca.expasy.org/tools/

14
Molecular Weight Proteomics
2-D Gel QTOF Mass Spectrometry
15
Amino Acid Frequency

Deviations greater than 2X average indicate
something of interest
High K or R indicates possible nucleoprotein
High Cs indicate stable but hard-to-fold protein
High G, P, Q, or N says lack of stable structure

16
Isoelectric Point (pI)

The pH at which a protein has a net charge0
Q S Ni/(1 10pH-pKi)

Transcendental equation
17
Isoelectric Point

Calculation is only approximate (/- 1 pH)
Does not include 3o structure interactions
Can be used in developing purification protocols
via ion exchange chromatography
Can be used in estimating spot location for
isoelectric focusing gels
Can be used to decide on best pH to store or
analyze protein

18
UV Spectroscopy
19
UV Absorptivity

UV (Ultraviolet light) has a wavelength of 200 to
400 nm
Most proteins and peptides (and all nucleic
acids) absorb UV light quite strongly
UV spectroscopy is the most common form of
spectroscopy performed today
UV spectra can be used to identify or classify
some proteins or protein classes

20
UV Absorptivity

OD280 (5690 x W 1280 x Y)/MW x Conc.
Conc. OD280 x MW/(5690 X W 1280 x Y)

OH
N
21
Hydrophobicity

Indicates Solubility
Indicates Stability
Indicates Location (membrane or cytoplasm)
Indicates Globularity or tendency to form
spherical structure

22
Hydrophobicity

Average Hydrophobicity AH S AAi x Hi
Hydrophobic Ratio RH S H(-)/S H()
Hydrophobic Ratio RHP philic/phobic
Linear Charge Density LIND (KRDEH2)/
Solubility SOL RH LIND - 0.05AH

Average AH 2.5 2.5 Insol gt 0.1 Unstrc lt
-6
Average RH 1.2 0.4 Insol lt 0.8 Unstrc gt
1.9
Average RHP 0.9 0.2 Insol lt 0.7 Unstrc gt 1.4
Average LIND 0.25 Insol lt 0.2 Unstrc gt 0.4
Average SOL 1.6 0.5 Insol lt 1.1 Unstrc gt 2.5

23
Different Types of Features

Composition Features
Mass, pI, Absorptivity, Hydrophobicity
Sequence Features
Active sites, Binding Sites, Targeting, Location,
Property Profiles, 2o structure
Structure Features
Supersecondary Structure, Global Fold, ASA,
Volume

24
Sequence Features
AHGQSDFILDEADGMMKSTVPN HGFDSAAVLDEADHILQWERTY
GGGNDEYIVDEADSVIASDFGH LIVMLIVMDEADLIVM
LIVM (EIF 4A ATP DEPENDENT HELICASE)
25
Sites that Support Pattern Queries

OWL Database
http//umber.sbs.man.ac.uk/dbbrowser/OWL/
PIR Website
http//pir.georgetown.edu/pirwww/search/patmatch.h
tml
SCNPSITE at EXPASY
http//ca.expasy.org/tools/scanprosite/
PattinProt
http//npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?p
agenpsa_pattinprot.html/

26
Regular Expressions

CACGT - Matches CAT, CCT and CGT only
C . T - Matches CAT, CaT, C1T, CXT, not CT
CA?T - Matches CT or CAT only
CT - Matches CT, CCT, CCCT, CCCCT
C(HE)?ATP - Matches CHEAT, CAT, CHEAP, CAP
SA-I,L-Q,T-Z?LKA-I,L-Q,T-Z?A - Matches SLKA

27
PROSITE Pattern Expressions
C - ACG - T - Matches CAT, CCT and CGT only C -
X -T - Matches CAT, CCT, CDT, CET, etc. C - A
-T - Matches every CXT except CAT C - (1,3) - T -
Matches CT, CCT, CCCT C - A(2) - TP - Matches
CAAT, CAAP LIV - VIC - X(2) - G - DENQ - X
- LIVFM (2) -G
28
Sequence Feature Databases

PROSITE - http//ca.expasy.org/prosite/
BLOCKS - http//www.blocks.fhcrc.org/
DOMO - http//www.infobiogen.fr/services/domo/
PFAM - http//pfam.wustl.edu
PRINTS - http//www.bioinf.man.ac.uk/dbbrowser/PRI
NTS/
SEQSITE - PepTool

29
Phosphorylation Sites
pY
pT
pS
PO4
PO4
CH3
PO4
30
Phosphorylation Sites
31
Signaling Sites
32
Protease Cut Sites
33
Binding Sites
34
Family Signature Sequences
35
Enzyme Active Sites
36
Better Methods for Sequence Feature ID

Sequence Profiles/Scoring Matrices
Neural Networks
Hidden Markov Models
Bayesian Belief Nets
Reference Point Logistics

37
What Can Be Predicted?

O-Glycosylation Sites
Phosphorylation Sites
Protease Cut Sites
Nuclear Targeting Sites
Mitochondrial Targ Sites
Chloroplast Targ Sites
Signal Sequences
Signal Sequence Cleav.
Peroxisome Targ Sites

ER Targeting Sites
Transmembrane Sites
Tyrosine Sulfation Sites
GPInositol Anchor Sites
PEST sites
Coil-Coil Sites
T-Cell/MHC Epitopes
Protein Lifetime
A whole lot more.

38
Cutting Edge Sequence Feature Servers

Membrane Helix Prediction
http//www.cbs.dtu.dk/services/TMHMM-2.0/
T-Cell Epitope Prediction
http//syfpeithi.bmi-heidelberg.com/scripts/MHCSer
ver.dll/home.htm
O-Glycosylation Prediction
http//www.cbs.dtu.dk/services/NetOGlyc/
Phosphorylation Prediction
http//www.cbs.dtu.dk/services/NetPhos/
Protein Localization Prediction
http//psort.nibb.ac.jp/

39
Subcellular Localization
http//www.cs.ualberta.ca/bioinfo/PA/Sub/
40
Profiles Motifs are Useful

Helped identify active site of HIV protease
Helped identify SH2/SH3 class of STPs
Helped identify important GTP oncoproteins
Helped identify hidden leucine zipper in HGA
Used to scan for lectin binding domains
Regularly used to predict T-cell epitopes

41
Amino Acid Property Profiles
42
Amino Acid Property Profiles

Intent is to predict proteins physical
properties directly from sequence as opposed to
composition or wet chemistry
Offers a more detailed, graphical view of
sequence-specific properties than compositional
analysis (more powerful?)
Underlying assumption is amino acid properties
are additive

43
Common Property Profiles

Hydrophobicity (Watch Scales!)
Helical Wheel (Not a True Profile)
Hydrophobic Moments (Helix Beta sheet)
Flexibility (Thermal B Factors)
Surface Accessibility (ASA)
Antigenicity (B-cell epitopes/T-cell epitopes)

44
Hydrophobicity Profile

Plotted using ltHgti S Hn/(2k 1)
Shows location of membrane spanning regions,
epitopes, surface exposed AAs, etc.

45
Flexibility

B factors from X-ray crystallography
Potentially identifies antigenic and active sites
from sequence data alone

46
Membrane Spanning Regions
47
Predicting via Hydrophobicity
Bacteriorhodoposin OmpA
48
Predicting via Hydrophobicity
49
Predicting via Neural Nets

PHDhtm http//cubic.bioc.columbia.edu/predictpro
tein/submit_adv.html
TMAP http//www.mbb.ki.se
/tmap/index.html
TMPred http//www.ch.embnet.org/software/TMPRED
_form.html

ACDEGF...
50
Secondary Structure
51
Secondary Structure Prediction
52
Secondary Structure Prediction

Statistical (Chou-Fasman, GOR)
Homology or Nearest Neighbor (Levin)
Physico-Chemical (Lim, Eisenberg)
Pattern Matching (Cohen, Rooman)
Neural Nets (Qian Sejnowski, Karplus)
Evolutionary Methods (Barton, Niemann)
Combined Approaches (Rost, Levin, Argos)

53
Chou-Fasman Statistics
54
Prediction Performance
55
Best of the Best

PredictProtein-PHD (72)
http//cubic.bioc.columbia.edu/predictprotein
Jpred (73-75)
http//www.compbio.dundee.ac.uk/www-jpred/
PSIpred (77)
http//bioinf.cs.ucl.ac.uk/psipred/
Proteus (88)
http//129.128.185.1848080/proteus/

56
Sample Exam Questions

Here is the sequence for protein X, calculate its
molar absorptivity
Here is the sequence for protein Y, try to locate
the likely membrane spanning regions explain
your reasoning
Here is the sequence for protein Z, show the
tryptic cleavage points

Write a Comment

User Comments (0)

About PowerShow.com

Protein Feature Identification PowerPoint PPT Presentation