Title: The Ontario Structural Genomics Initiative
1The Ontario Structural Genomics Initiative
2REFERENCES
- Nature Structural Biology, 7, 903-909, 2000
- Journal of Molecular Biology, 302, 189-203, 2000
- Nature Structural Biology, SG supplement, Nov
2000 - Structure, 6, 265-267, 1998
- Nature Structural Biology, 6, 11-12, 1999
- Current Opinion in Biotechnology, 11, 25-30, 2000
- Nature Genetics, 23, 151-157, 1999
3STRUCTURAL GENOMICS
- The determination of the three-dimensional
structures of the proteins encoded by the genes
from an entire genome. - The complete DNA sequences of many organisms are
known and there are 100 ongoing genomic
sequencing projects. - The natural extension of sequencing projects is
the determination of the corresponding protein
structures. -
- The goals of current genomics projects are to
understand the cellular and molecular functions
of all the gene products. Ultimately to help in
the design of diagnostics and therapeutics.
4SEQUENCED GENOMESNCBI Genome Database
- A. aeolicus (1522) M. thermoautotrophicum
(1855) - A. fulgidus (2407) M. jannaschii (1715)
- B. subtilis (4100) M. tuberculosis (3918)
- B. burgdorferi (850) M. genitalium (467)
- C. elegans (19 099) M. pneumoniae (677)
- C. trachomatis (1052) P. horikoshii (1979)
- C. pneumoniae (894) R. prowazekii (834)
- E. coli (4289) S. cerevisiae (5885)
- H. influenzae (1709) Synechocystis sp.(3169)
- H. pylori (1566) T. pallidum (1031)
- A. thaliana (15 000 ) H. sapiens (?30 000)
-
- LEGEND Archaea Bacteria Eucarya
5THE PROTEOMICS CHALLENGE
What do all those proteins do?
Similar Function
Known Function
Unknown Function
6FUNCTIONAL PROTEOMICS
uncovering the function of all genes/proteins
- Genome Wide Analysis
- protein-protein interactions
- protein expression/localization
- biochemical assays
- protein structure
Known Function
Unknown Function
7BEYOND SEQUENCING PROJECTS
GENOME
DNA Microarray
Genetic Screens
PROTEOME
Protein Ligand Interactions
Protein-Protein Interactions
Protein Structure
8THE POST-GENOMIC ERA
- Functional proteomics currently exploits
- several complementary technologies
- DNA Microarray Technology
- For genome-wide transcription profiling
- Protein-Ligand Interactions
- To discover small molecule inhibitors of proteins
- To discover function
- Protein-Protein Interactions
- To define the network of regulatory interactions
- To discover function
9PROTEINS WITH 3D HOMOLOGS
of Proteins
10MAKING STRUCTURAL GENOMICSA REALITY
- Initially the rate determining step in SG
- was preparing suitable protein samples.
- - Need faster methods in protein production
- - Must overcome bottleneck of growing crystals
- - Initiated program directed solely at this
issue
11GOALS OF STRUCTURAL GENOMICS
- to develop improved methods that will result in
- high-throughput biology and protein structure
determination - robots, robots, robots
- cloning
- expression
- purification
- crystallization
- to determine new protein folds
- to determine the functions of unknown proteins
12STRUCTURAL GENOMICS
The early years
- A move away from hypothesis driven researcha
system where structures are solved first followed
by asking questions about the protein later. - A large number of targets are required from which
high-throughput methods must be implemented for
such a project to be successful - Cloning, expression and purification are
important!! - What targets?
- What is the priority of targets?
13STRUCTURAL GENOMICS PROJECTS
- A. Edwards U of T 20 M. thermoautotrophicum
- S.H. Kim Berkeley 12 Methanococcus jannaschii
- S. Yokoyama Tokyo U 10 Thermus thermophilus
- J. Moult CARB 10 Haemophilus influenzae
- D. Eisenberg UCLA 8 Pyrobaculum aerophilum
- A. Sali BNL 3 S. cerevisiae
14SG CONSORTIUMS
- The NIH/NIGMS have funded 7 SG centers with each
center obtaining about 4 million US per year in
funding. - New York SG Consortium (www.nysgrc.org)
- Midwest Center for SG (UHN/UofT)
- The Berkeley SG Center
- Northeast SG Consortium (UHN/UofT) (www.nesg.org)
- Tuberculosis SG Consortium (www.doe-mbi.ucla.edu/T
B) - The Southeast Collaboratory for SG
- The Joint Center for SG (www.jcsg.org)
15SG COMPANIES
- Integrative Proteomics Inc. Toronto
- (www.integrativeproteomics.com)
- Structural Genomix Inc. San Diego
- (www.stromix.com)
- Syrrx Inc. La Jolla
- (www.syrrx.com)
- Astex Inc. Cambridge
- (www.astex-technology.com)
- Structure-Function Genomics Piscataway
16CRYSTALLOGRAPHIC DEVELOPMENTS
- Multiwavelength Anomalous Dispersion
- Synchrotron Radiation
- Cryocrystallography
- CCD Detectors and Image Plates
- Software
17STRUCTURAL BIOLOGY OVER THE YEARS
Structural biology on a genomic scale
1998
Target
Sample
Structure
TIME
18Overview of Structural Proteomics
- Genome Analysis and Target Selection
- Cloning, Expression and Purification
- Crystallography NMR
- Structure
- Fold and Functional Analysis
FAST
SLOW
FAST
19STRUCTURE SHOW AND TELL
- The structure will reveal the fold of the
protein. - TIM barrel, Rossmann fold
20STRUCTURE SHOW AND TELL
- The structure will reveal the active site.
- protease (Ser-His-Asp)
21STRUCTURE SHOW AND TELL
- The structure may reveal evolutionary links
- between proteins lacking sequence similarity.
22STRUCTURE SHOW AND TELL
- The structure may reveal the function of the
protein.
23TARGET SELECTION
- Groups are focusing on complete organisms
- thermophilic, mesophilic or halophilic
- eukaryotic or prokaryotic
- classes of proteins from different organisms
- There isnt a coordinated international group
that assigns targets (yet!). - Some groups may solve the same structures
(redundant). - two SG pilot projects solved factor 5A first!!!
- Membrane proteins and proteins whose structures
are already solved are eliminated.
24 TARGET SELECTION
700
600
s
s
Transmembrane
e
e
n
n
500
e
e
Known 3D structure
g
g
f
f
400
Number of ORFs
o
o
Genes not selected
r
r
e
e
300
Genes targeted
b
b
m
m
u
u
200
N
N
100
0
lt16
16 - 31
31 - 51
51 - 71
71 - 100
gt100
Protein size (kDa)
25DRUG DISCOVERYANTIBIOTICS
- Targets in this area of structural genomics are
bacterial proteins that are essential for growth
and survival. - cell wall biosynthesis
- aromatic amino acid biosynthesis
- The development of a broad spectrum antibiotic
would encompass the structures of a single
protein from different bacterial organisms.
26DRUG DISCOVERYHUMAN DISEASE
- Targets in this area of structural genomics are
G-protein coupled receptors, ion channels and
kinases etc. - -GPCRs and ion channels are membrane proteins
- and are more difficult to purify and
crystallize - The development of techniques to allow
over-expression, purification and crystallization
of these targets is required and in progress.
27AIMS OF PILOT PROJECT
- determine feasibility of a Structural Genomics
Project - develop technologies necessary for large-scale
initiatives - develop high-throughput (HTP) cloning
- develop high-throughput expression
- develop high-throughput purification
28Methanobacterium thermoautotrophicum
- isolated in 1971
- thermophile (optimal growth T is 65C)
- methanogen (grows on methane as a carbon source)
- sequenced (Smith, DL et al., 1997, J. Bact., 179,
7135) - 1 751 377 bp and 1855 orfs
- 13 are similar to eucaryal sequences
- proteins in DNA metabolism, transcription and
translation - archaeal proteins are smaller and more stable
- than bacterial and eukaryal homologs
29PROTEIN FUNCTION
Assigned Function Sequence Homology 45
Conserved Function Sequence Homology 28
Unknown Function No Sequence Homology 27
30CLONING OF MT GENES
- PCR amplification of gene of interest
- purification of PCR product
- ligation into pET15b expression vector
- T7 promoter
- induced with IPTG
- cleavable hexahistidine fusion tag
- transformation into DH5? E. coli cells
- plasmid prep
- transformation into BL21(DE3) E. coli cells
- expression and purification
31LIMITED PROTEOLYSIS
- single domain proteins and proteins less
- than 40 kDa can be expressed in E. coli
- multi-domain proteins and proteins greater than
- 40 kDa are quite difficult to express in E. coli
- these proteins may be expressed in yeast or
baculo - OR
- these proteins must be broken down into domains
32PROTEINS DESTINED FOR NMR
Protein lt20 kDa
N15 Label
NMR
Protein-Protein Interactions
Aggregated, Unfolded Folded
Co-Expression
Structure
33COMPARISON OF N15 NMR SPECTRA
Excellent
Poor
34IDENTIFICATION OF A FOLDED DOMAIN
Before
After Proteolysis
35PROTEINS FOR CRYSTALLOGRAPHY
36STRUCTURE DETERMINATION STEPS
- Clone Gene
- Purify Protein
- Crystallize Protein
- Collect X-Ray Diffraction Data
- Identify Selenium Sites
- Calculate Phases using MAD
- Calculate Electron Density Map
- Build Model of Protein in Electron Density
- Refine and Rebuild Protein Model
37PROTEIN CRYSTALLIZATION
- A crystal is an ordered three-dimensional array
of molecules in the same orientation held
together by non-covalent interactions. - Crystals are grown by slow-controlled
precipitation from crystallization conditions
that do not denature the protein. - These conditions can contain precipitants such as
salts (NaCl, AmSO4), organic solvents - (EtOH, MPD) or polymers (PEG), buffers,
- additives and ions.
38PROTEIN CRYSTALLIZATION contd
- Each protein has its own empirically determined
crystallization condition. - pH
- ionic strength
- protein concentration
- temperature
- ions
- precipitant
- We cannot sample complete crystallization
matrices. - We start off with approximately 200 different
crystallization solutions and hope for the best.
39PROTEIN CRYSTALLIZATION contd
CRYSTAL TRIALS Crystallization solutions
used to screen for protein
crystallization conditions
1
2
3
40PROTEIN CRYSTAL
100 microns
41X-Ray DIFFRACTION
42X-RAY DIFFRACTION IMAGE
43PROGRESS TOWARDS HTP CLONING
- Initial Rate
- 24 clones per person per week
- Current rate
- 96 clones per person per week
44PROGRESS TOWARDS HTP PROTEIN EXPRESSION
- Established conditions to maximize number of
soluble clones - bacterial strain
- induction conditions
- magic plasmid
45PROGRESS TOWARDS HTP PURIFICATION
- Initial Rate
- 1 protein/person/week
- Current Rate
- 8 proteins/person/week
- Target Rate
- 16 proteins/person/week
46ACHIEVEMENTS
- We have optimized HTP cloning.
- We have optimized HTP expression
- and purification.
- We are in the process of automating cloning
and purification.
47SUMMARY OF MT PROTEINS
Cloned
Expressed
Soluble
Purified
Microcrystals/Promising HSQC
Well diffracting crystals/excellent HSQC
48KNOWN FUNCTION BUT UNKNOWN STRUCTURE
49UNKNOWN BUT STRUCTURE SUGGESTS FUNCTION
MTH152 FMN-binding protein Ni2 binding
MTH150 Nicotinamide mononucleotide adenylyltransfe
rase
MTH1615 Nucleic acid binding
MTH538 Phosphorylation -independent 2-component
signaling protein
50STILL UNKNOWN
MTH1184
MTH1175
51CONCLUSIONS FROM FEASIBILITY STUDY
Crystallization is now rate limiting NMR can
play a significant role Solubility presents a
major hurdle Small, single domain proteins
behave better Low hanging fruit 20 of
proteome Must develop HTP methods for
recalcitrant proteins
52STRATEGIES FOR TACKLING RECALCITRANT PROTEINS
1. Focus on domains 2. Empirical
bioinformatics 3. Identification of binding
partners (proteins and ligands)
53TAKE HOME LESSON
- think about biology on a genomic scale
54 - PROTEINS Structure, Function and Genetics has
inaugurated a new short format of Structure
Notes designed to provide brief accounts of
structures that contain too little new
information to warrant a full length article - what can you expect from robots!!!
- - Bill L Duax
55THE TEAM
A. Edwards / C. Arrowsmith Steven Beasley Asaph
Engel Brian Li Anthony Semesi Emil Pai
Vivian Saridakis Ning Wu Aiping Dong
Akil Dharamsi
Adelinda Yee
Dinesh Christendat
561999 OCI SUMMER STUDENTS
Stephanie Fung
Joanne Loo
Hedyah Javidni
Gundula Min-Oo
Ashleigh Tuite
Fred Hsu
572000 OCI SUMMER STUDENTS
- Ashleigh Tuite
- Fred Cheung
- Laura Faye
- Toni Davidson
58COLLABORATORS
- Lawrence McIntosh (UBC)
- Cameron Mackereth
- Mike Kennedy (PNNL)
- John Cort
- Mark Gerstein (Yale)
- Yuval Kluger
- Kalle Gehring (McGill)
- G. Kozlov