Protein Expression, Structural Proteomics - PowerPoint PPT Presentation

1 / 76
About This Presentation
Title:

Protein Expression, Structural Proteomics

Description:

arabinose systems (pBAD), phage T7 (pET), Trc/Tac promoters, phage lambda PL or PR ... PEST Finder. http://www.at.embnet.org/embnet/tools/bio/PESTfind/ Lecture ... – PowerPoint PPT presentation

Number of Views:817
Avg rating:3.0/5.0
Slides: 77
Provided by: Comp684
Category:

less

Transcript and Presenter's Notes

Title: Protein Expression, Structural Proteomics


1
Protein Expression, Structural Proteomics
Bioinformatics
  • David Wishart
  • University of Alberta
  • Edmonton, AB
  • david.wishart_at_ualberta.ca

2
Expression Questions
  • Which host cell system?
  • Which expression vector?
  • Which cloning/expression protocols?
  • Is it membrane or water soluble?
  • Is it single domain or multi-domain?
  • How soluble and how stable?
  • Where will this protein be found?
  • How to purify how to identify?

3
Host Cell System?
  • Escherichia coli
  • Other bacteria
  • Pichia pastoris
  • Other yeast
  • Baculovirus
  • Animal cell culture
  • Plants
  • Sheep/cows/humans
  • Cell free

Polyhedra
4
Host Cell System?
  • Choice depends on size and character of protein
  • Large proteins (gt100 kD)? Choose eukaryote
  • Small proteins (lt30 kD)? Choose prokaryote
  • Glycosylation essential? Choose baculovirus or
    mammalian cell culture
  • Isotopic labelling esential? Choose E. coli
  • Post-translational modifications essential?
    Choose yeast, baculovirus or other eukaryote

5
Host Cell System?
  • Try different hosts when optimizing expression
    (protease negative, strains with enhanced
    expression of rare tRNAs)
  • Expression levels can vary by a factor of 10 or
    more depending on strain choice
  • Example E. coli strains
  • MC1061, UT580, GM48, JM101, DH5, MG1065, NM522,
    MC4100, TOP10F, BL21(DE3) BL21-CodonPlus (DE3)

6
Codon Bias
http//www.kazusa.or.jp/codon/
7
Arginine Codon Bias
E. coli M. jannaschii H. sapiens AGA
2.7 AGA 27.5 AGA 11.2 AGG 1.6 AGG 9.9
AGG 11.1
Eubacteria (rare)
Archaebacteria (abundant)
Eukaryote (normal)
8
Host Cell System?
  • American Type Culture Collection
  • http//www.atcc.org
  • Clontech Cell Lines
  • http//www.clontech.com
  • Stratagene Cells (BL21)
  • http//stratagene.com
  • Invitrogen Cell Lines (Pichia)
  • http//www.invitrogen.com

9
Fermentor or Shake Flask?
10
Media Optimization
  • Still using L-broth? Try using T-broth
  • Tryptone - 12 g, Yeast Extract - 24 g, glycerol -
    4 ml, KH2PO4 - 2.3g, K2HPO4 - 12.5g
  • Extra Spicy Media
  • More ATP 10 ml/L glycerol 10g glucose/L
  • More AA Add 10g casamino acids 10mg L-Trp
  • Add more media (30) when you induce
  • Add more antibiotic when you induce
  • prevents overgrowth by cells that lost plasmid

11
Expression Questions
  • Which host cell system?
  • Which expression vector?
  • Which cloning/expression protocols?
  • Is it membrane or water soluble?
  • Is it single domain or multi-domain?
  • How soluble and how stable?
  • Where will this protein be found?
  • How to purify how to identify?

12
Which Vector?
  • Must be compatible with host cell system
    (prokaryotic vectors for prokaryotic cells,
    eukaryotic vectors for eukaryotic cells)
  • Needs a good combination of
  • strong promoters
  • ribosome binding sites
  • termination sequences
  • affinity tag or solubilization sequences
  • multi-enzyme restriction site

13
Which Vector?
  • Promoters
  • arabinose systems (pBAD), phage T7 (pET), Trc/Tac
    promoters, phage lambda PL or PR
  • Tags
  • His6 for metal affinity chromatography (Ni)
  • FLAG epitope tage DYKDDDDK
  • CBP-calmodulin binding peptide (26 residues)
  • E-coil/K-coil tags (poly E35 or poly K35)
  • c-myc epitope tag EQKLISEEDL
  • Glutathione-S-transferase (GST) tags
  • Celluluose binding domain (CBD) tags

14
Which Vector?
  • VectorDB
  • http//vectordb.atcg.com
  • Invitrogen Vectors
  • http//www.invitrogen.com/vectors.html
  • Qiagen Vectors
  • http//www.qiagen.com/literature/vectors.asp
  • Stratagene Vectors
  • http//stratagene.com/vectors/vectors.htm

15
How to Clone?
Echo Cloning
16
How to Clone?
Yeast Cells
17
How to Clone?
Mammalian Cells
18
Gateway System (Invitrogen)
  • No need to design, construct or ID unique
    restriction sites
  • Uses lambda phage site-specific recombination for
    gene/plasmid integration
  • No need for restriction enzyme digestions
  • No need for gel fragment separation and
    purification
  • Ideal for high throughput proteomics efforts

19
Gateway System (Invitrogen)
Entry Vector
Entry Clone

PCR product
Desired Clone
X
Destination Vector
20
Gateway System (Invitrogen)
Gene
-ve selector (anti-gyrase)
attR1
attR2
attL2
attL1
Entry Clone

Kmr
Ampr
Int IHF Xis
Gene
-ve selector (anti-gyrase)
Desired Clone
Dead-end Clone
Ampr
Kmr
21
Gateway Protocol
  • Mix and incubate for 60 _at_ 25 oC
  • Add proteinase K and incubate for 10 at 37 oC
  • Transfer to E. coli (competent) DH5 cells
  • Express for 60 and plate on LB-Amp

Ingredients
  • Clonase reaction buffer 4 mL
  • Destination Vector 300 ng
  • Entry Clone 100 ng
  • Clonase Enzyme mix 4 mL
  • Total volume 20 mL

22
Expression/Cloning -- Which Protocols?
  • Molecular Cloning 3rd Edition (Sambrook and
    Maniatis / Russell)
  • http//www.molecularcloning.com
  • Molecular Biology Protocols
  • http//micro.nwfsc.noaa.gov/protocols/
  • Molecular Biology Shortcuts
  • http//highveld.com/f/fprotocols.html
  • NeeHow Protocols
  • http//www.neehow.org/wonderful/protocols

23
Expression Questions
  • Which host cell system?
  • Which expression vector?
  • Which cloning/expression protocols?
  • Is it membrane or water soluble?
  • Is it single domain or multi-domain?
  • How soluble and how stable?
  • Where will this protein be found?
  • How to purify how to identify?

24
Membrane or Water Soluble?
25
Membrane or Water Soluble?
  • Most protein scientists prefer to work with water
    soluble proteins or domains
  • Membrane proteins are very difficult to clone,
    express and purify and special techniques must be
    used
  • Potential problems can be avoided by knowing
    whether the protein contains one or more membrane
    spanning helices and where these helices are
    located (cleaved?)

26
Predicting via Hydrophobicity
Bacteriorhodoposin OmpA
27
Membrane Helix Prediction
  • Neural Network and HMM methods now claim gt80
    accuracy
  • PredictProtein (PHDhtm)
  • http//cubic.bioc.columbia.edu/predictprotein/
  • TMpred
  • http//www.ch.embnet.org/software/TMPRED_form.html
  • TMHMM
  • http//www.cbs.dtu.dk/services/TMHMM-2.0/

28
TMPred (Principles)
29
TMHMM
30
PredictProtein
31
Expression Questions
  • Which host cell system?
  • Which expression vector?
  • Which cloning/expression protocols?
  • Is it membrane or water soluble?
  • Is it single domain or multi-domain?
  • How soluble and how stable?
  • Where will this protein be found?
  • How to purify how to identify?

32
Single Domain or MultiDomain?
33
Modular Protein Domains
BH PDZ
FYVE PH
DED DEATH
SH3 1433
WW FHA
PTB SH2
34
Single Domain or MultiDomain?
  • Many eukaryotic proteins are multi-domain
  • Size is a good indicator (roughly 1 domain for
    every 15 kD)
  • Small domains behave better (Xray NMR)
  • Limited proteolysis allows experimental
    identification of domains prior to structure
    determination by NMR or X-ray
  • digestion followed by HPLC or MS analysis to
    detect fragments gt 10 kD

35
Domain Prediction
  • Domain Prediction (PredictProtein-GLOBE)
  • http//cubic.bioc.columbia.edu/predictprotein
  • BLAST alignments can be used to detect or predict
    the presence of domains by sequence homology
  • Protein domains can also be predicted using CDD
    (Conserved Domain Database) at http//www.ncbi.nlm
    .nih.gov/Structure/cdd/cdd.shtml

36
(No Transcript)
37
(No Transcript)
38
Expression Questions
  • Which host cell system?
  • Which expression vector?
  • Which cloning/expression protocols?
  • Is it membrane or water soluble?
  • Is it single domain or multi-domain?
  • How soluble and how stable?
  • Where will this protein be found?
  • How to purify how to identify?

39
Predicting Solubility
  • Even if a protein is identified to be a
    non-membrane protein this does not necessarily
    indicate it will be soluble
  • Solubility depends on many factors
  • size (smaller ones are more soluble)
  • hydrophobicity (average and local hphob)
  • 3D structure and ligand interactions
  • overall charge, predicted accessibility
  • distribution and frequency of amino acids

40
Predicting Solubility
  • Solvent accessibility prediction
  • PredictProtein (PHDacc)
  • http//cubic.bioc.columbia.edu/predictprotein/
  • Protein property/scale prediction
  • EXPASY ProtScale
  • http//www.expasy.ch/cgi-bin/protscale.pl
  • PepTool
  • www.biotools.com

41
Accessible Surface Area
Reentrant Surface
Accessible Surface
Solvent Probe
Van der Waals Surface
42
Predicted Accessibility
3 2 1 0
43
Buried Surface Area (BASA) Fractional Burial
(FB)
  • For an average protein
  • ASA (NP) 0.35 x BASA
  • ASA (P) 0.61 x BASA
  • ASA (/-) 0.04 x BASA
  • BASA can be estimated from a proteins amino acid
    composition BASA S AAi x FBi

44
ProtScale
45
ProtScale
46
Solubility (PepTool)
  • Average Hydrophobicity AH S AAi x Hi
  • Hydrophobic Ratio RH S H(-)/S H()
  • Hydrophobic Ratio RHP philic/phobic
  • Linear Charge Density LIND(KRDEH2)/
  • Solubility SOLRH LIND - 0.05AH
  • Average AH 2.5 /- 2.5 Insol gt 0.1 Unstrc lt
    -6
  • Average RH 1.2 /- 0.4 Insol lt 0.8 Unstrc gt
    1.9
  • Average RHP 0.9 /- 0.2 Insol lt 0.7 Unstrc gt
    1.4
  • Average LIND 0.25 Insol lt 0.2 Unstrc gt 0.4
  • Average SOL 1.6 /- 0.5 Insol lt 1.1 Unstrc gt
    2.5

47
Structural Proteomics and Solubility Prediction
  • Global efforts have led to the cloning and
    attempted expression of more than 5000 water
    soluble proteins
  • Data contained on databases such as TargetDB
    allow correlations to be developed between
    sequence and expression levels and solubility
  • Excellent opportunity to used data mining to find
    rules to predict protein solubility

48

49
Binary Decision Trees
  • Used to partition or classify data that is not
    linearly separable
  • Unknown objects are classified by traversing
    the tree
  • Traversing is accomplished by performing tests at
    each node, direction of traversal determined by
    results of the test
  • Decision trees can be trained (test threshold
    cutoff, test order, architecture)

50
Binary Decision Trees
not forming crystals
forming crystals
51
Predicting Protein Solubility
1) Residue frequency ACDEFGHIKLMNPQRSTVWY 2)
Grouped residue frequency KR,NR,DE,ST
LIM,FWY,HKR,AVILM,DENQ,GAVL,SCTM 3
) Predicted secondary structure a,b,c 4)
Presence of signal sequence 5) Length of
polypeptide 6) Number of residues in low
complexity region (L,S) 7) Normalized low
complexity value (SEG/Len) 8) Maximum
hydrophobicity value 9) Length of maximum
hydrophobic region
52
Solubility Decision Tree
Size of black oval that are soluble
53
Binary Decision Trees
  • Have been used to predict protein solubility and
    protein crystallization
  • Somewhat similar to self-organizing feature maps
    (SOFM)
  • Bertone P, Kluger Y, Lan N, Zheng D, Christendat
    D, Yee A, Edwards AM,
    Arrowsmith CH, Montelione GT, Gerstein M. Nucleic
    Acids Res 2001 129(13)2884-98

54
Predicting Stability
  • Even if a protein expresses and remains soluble
    it may turn out to be quite unstable (easily
    proteolyzed)
  • Proteins that are rich in Proline (P), Glutamic
    acid (E), Serine (S) and Threonine (T) or which
    have regions that are rich in these amino acids
    (PEST sequences) tend to have half lives of less
    than 2 hours

55
PEST Finder
http//www.at.embnet.org/embnet/tools/bio/PESTfind
/
56
Expression Questions
  • Which host cell system?
  • Which expression vector?
  • Which cloning/expression protocols?
  • Is it membrane or water soluble?
  • Is it single domain or multi-domain?
  • How soluble and how stable?
  • Where will this protein be found?
  • How to purify how to identify?

57
Protein Localization
  • Is it exported? Does it go to the nucleus? Does
    it go through the ER? Does it localize to
    mitochondria? Chloroplasts? Does it go to the
    membrane? How do you tell?
  • Eukaryotic signal sequences are usually
    incompatible with prokaryotic signal sequences so
    expressing eukaryotic proteins in bacteria can
    lead to problems

58
Location Prediction
http//psort.nibb.ac.jp
59
Proteome Analyst
http//www.cs.ualberta.ca/bioinfo/PA/Sub/
60
PSORT-B (bacteria)http//www.psort.org/psortb/ind
ex.html
61
Location Prediction
http//www.cbs.dtu.dk/services/TargetP/submission
62
Other Sites or Modifications?
  • Phosphorylation
  • NetPhos http//cbs.dtu.dk/services/NetPhos/
  • O-Glycosylation
  • NetOGlyc http/cbs.dtu.dk/services/NetOGlyc/
  • Coil-Coil Dimerization domains
  • www.ch.embnet.org/software/COILS_form.html
  • Tyrosine Sulfation
  • http//ca.expasy.org/tools/sulfinator/

63
NetPhos 2.0
64
Expression Questions
  • Which host cell system?
  • Which expression vector?
  • Which cloning/expression protocols?
  • Is it membrane or water soluble?
  • Is it single domain or multi-domain?
  • How soluble and how stable?
  • Where will this protein be found?
  • How to purify how to identify?

65
Finding and Identifying Your Protein
66
Isoelectric Point
  • The pH at which protein has charge0
  • Q S Ni/(1 10pH-pKi)

67
Isoelectric Point MW Calculation
68
More Help?
  • http//www.abrf.org
  • http//www.abrf.org.JBT/JBTindex.html
  • http//www.BioTechniques.com
  • http//expasy.ch/alinks.html
  • http//www.neehow.org/wonderful/protocols
  • http//research.newfsc.noaa.gov/protocols.html
  • http//www.horizonpress.com/gateway/protocols.html

69
Bioinformatics Structural Proteomics
  • Key to identifying targets
  • Key to reducing time and material wastage in
    protein expression/purification steps
  • Key to tracking and communicating target
    progression (multi-lab LIMS)
  • Key to reducing redundancy and duplication by
    other X-ray or NMR structure labs (TargetDB,
    SPINE)

70
TargetDB
http//targetdb.pdb.org/apps/TargetDB.html
71
Structural Proteomics - Status
  • 18 registered centres (30 organisms)
  • 50330 targets have been selected
  • 25202 targets have been cloned
  • 14728 targets have been expressed
  • 5122 targets are soluble
  • 600 X-ray structures determined
  • 164 NMR structures determined
  • 633 Structures deposited in PDB (03/04/04)

72
Structural Proteomics - Status
  • 135 structures deposited by Riken
  • 117 structures deposited by Mid-West
  • 85 structures deposited by North-East
  • 74 structures deposited by New York
  • 59 structures deposited by JCSG (UCSD)
  • 34 structures deposited by Berkeley
  • 24 structures deposited by Montreal/Kingston

73
Protein Expression in E. coli
good promising unfolded poor precipitated
Proc. Natl. Acad. Sci. USA, Vol. 99,1825-1830,
2002
74
Protein Expression in E. coli
M. th. Methanobacter thermoautotrophicum E. coli
Escherichia coli S. ce. Saccharomyces
cerevisae Myx. Myxoma virus T. ma. Thermotoga
maritima
75
X-ray vs. NMR Results for Methanobacter
76
Conclusions
  • The success of proteomics (structural,
    functional, expressional) hinges almost entirely
    on successful protein production and expression
  • Bioinformatics (web databases, servers, data
    mining tools, NNs, HMMs) can and does play an
    increasingly important role in optimizing or
    improving protein expression and coordinating
    large scale proteomics efforts
Write a Comment
User Comments (0)
About PowerShow.com