Title: Structural Genomics and the Protein Folding Problem
1Structural Genomics and the Protein Folding
Problem
- George N. Phillips, Jr.
- University of Wisconsin-Madison
- February 15, 2006
2From DNA to biological function
Modeling Inference
Basic Understanding/ Applications (e.g.
therapeutics)
Gene Model
Functional Assignments
High-throughput DNA Sequencing
Structure Determination Experimental Analysis
3Developing a gene model
Glimmer (Gene Locator and Interpolated Markov
ModelER) GlimmerHMM for eukaryotic genomes (more
advanced)
Genome sequencing Genome assembly Regulatory
elements Identification of ORFs
All but the simplest genomes are works in
progress. It is estimated that 80 of gene
models have errors at present! Comparative
genomics should help the process, as will
sequencing of expressed sequence tags and other
genomics projects
Efficient implementation of a generalized pair
hidden Markov model for comparative gene
finding. W.H. Majoros, M. Pertea, and S.L.
Salzberg. Bioinformatics 219 (2005), 1782-88.
4The sequence-space of proteins
PSI-BLAST HMM
Pfam Many others
Universe of all protein sequences
HYSIELNASLLERGV HLNIEDNPSCNAMGV PLNIELNASLNEPGV
WERIELNASLNER-- HQRIEL--SLMMRG-
HLNIEDNPSCNAMGV PLNIELNASLNEPGV WERIELNASLNER--
HQRIEL--SLMMRG-
HLNIEDNPSCNAMGV PLNIELNASLNEPGV WERIELNASLNER--
HQRIEL--SLMMRG- HYSIELNASLLERGV HLNIEDNPSCNAMGV
PLNIELNASLNEPGV WERIELNASLNER-- HQRIELK-SLMMRG
-
HYSIELNASLLERGV HLNIEDNPSCNAMGV PLNIELNASLNEPGV
WERIELNASLNER-- HQRIEL--SLMMRG-
HYSIELNASLLERGV HLNIEDNPSCNAMGV WERIELNASLNER--
HQRIEL--SLMMRG-
5PFAM domains
Alex Bateman, Lachlan Coin, Richard Durbin,
Robert D. Finn, Volker Hollich, Sam
Griffiths-Jones, Ajay Khanna, Mhairi Marshall,
Simon Moxon, Erik L. L. Sonnhammer, David J.
Studholme, Corin Yeats and Sean R. Eddym Nucleic
Acids Research(2004) Database Issue 32D138-D141
6Flow of information from DNA to functional
understanding
Modeling Inference
Basic Understanding/ Applications (e.g.
therapeutics)
Gene Model
Functional Assignments
High-throughput DNA Sequencing
Structure Determination Experimental Analysis
7X-ray Laboratory
8Crystallography reveals locations of electron
clouds of the atoms And the polypeptide chain
can be traced through space
9The fold-space of proteins
Scop Cath
Universe of all protein structures
10Murzin et al. http//scop.mrc-lmb.cam.ac.uk/scop/
data/scop.b.html
11Glimpes of the fold space of proteins
Hou, Sims, Zhang, and Kim, PNAS 1002386 (2003)
12Flow of information from DNA to functional
understanding
Modeling Inference
Basic Understanding/ Applications (e.g.
therapeutics)
Gene Model
Functional Assignments
High-throughput DNA Sequencing
Structure Determination Experimental Analysis
13Connections between sequence and structure
Universe of sequences
Universe of structures
14Connections between sequence and structure
?
Universe of sequences
Universe of structures
15At what level of homology can one trust a
structural inference?
Redfern, Orengo et al., J. Chromatography B
81597 (2005)
16What is structural genomics?
- Experimental determination of key structures
(target selection is a key part of the idea) - Modeling of family members
- Inferring function (note infer)
- Making direct use of the new structures
17Protein Sequences and Folds
- 100,000 families of proteins that cannot be
reliably modeled at present (modeling families
structure) - 50 of all domain families can be assigned to a
structure under CATH
18Protein Structure Initiative (PSI)Mission
Statement
To make the three-dimensional atomic level
structures of most proteins easily available from
knowledge of their corresponding DNA sequences.
19From John Norvell - NIH
20Genseration of new structures
Chandonia and Brenner, Science 311347 2006.
21Center for Eukaryotic Structural Genomics
- Exclusively eukaryotic targets
- 60 fold-space targets (emphasis on
eukaryote-only families - 20 disease relevant
- 20 outreach targets from the community
- Overall goals are to reduce the costs of
determining structures of proteins from
eukaryotes by refining all steps in the pipeline - Supported by National Institutes of Health
- John Markley- PI, George Phillips/Brian Fox
Co-PIs
22University of Wisconsins Center for Eukaryotic
Structural Genomics (75 total, 3/4 unique)
23How does one clone, express, purify, and solve
structures not previously studied?An
industry-style pipeline
24Pipeline details cell-based and cell-free
protein production for X-ray and NMR
Note project involves sequencing, which aids
gene modeling!
25Sesameintegrated LIMS in use at CESG
Open access to the publicstructures, protocols,
reagents, progress http//www.uwstructuralgenomi
cs.org
Zolnai et al., J. Struct. Func. Genomics 411
(2003)
26At1g18200
- Mis-annotated prior to our work, but structure
led to discovery of function.
27Pfam B 13 and 136 matches to s 7198 and 11634
Alignment of GalP_UDP_transf vs
1Z84APDBIDCHAINSEQUENCE/15-196
-kkfsplDhvhrrynpLtlvwilVsphrakRPikqsqsLidlk
keLwq r p t w
sprakRP 1Z84APDB 15
GDSVENQSPELRKDPVTNRWVIFSPARAKRP----------------
45 gavetpkvptdplhdp.dcysakL
cpg........atratgevNPdyest
k p p pc c g r P
1Z84APDB 46 -TDFKSKSPQNPNPKPsSCP---FCIGreqeca
peLFRVP-DHDPNWKLR 90
yvLkspkkftndFyalseDnpyikvsvSNeaIaknplfqlksvrGhelci
n als
G 1Z84APDB 91
VI-------ENLYPALSRN---LETQ------------STQPETG--TSR
116 VI...CF......SKPehDptlp
alakeeirevvdaWqlcteelGyegre I
F S P h l i a
1Z84APDB 117 TIvgfGFhdvvieS-PVHSIQLSDIDPVGIGDI
LIAYKKRINQIA----- 160
nhpayqnvqIFEmNkGaemGcsnpHPYaYFnEHGQvwatsfiP h qF N Ga G s H H
Q a P 1Z84APDB 161
QHDSINYIQVFK-NQGASAGASMSHS------HSQMMALPVVP
196
http//www.sanger.ac.uk/Software/Pfam/
28Blind prediction of structureCASP and At5g18200
29Flow of information from DNA to functional
understanding
Modeling Inference
Basic Understanding/ Applications (e.g.
therapeutics)
Gene Model
Functional Assignments
High-throughput DNA Sequencing
Structure Determination Experimental Analysis
30Function space of proteins
KEGG Kyoto Encyclopedia of Genes and
Genomes The Gene Ontology project (GO)
Metabolism
Cellular Processes
Enzymes
Signal Processing
Dont forget protein-protein interactions exist
also!
31At2g17340
- Related to a human protein associated with
Hallervorden-Spatz syndrome, a neurological
disorder?
32Parallel Enzyme Activity Testing (Collaboration
with University of Toronto)
81 protein samples sent to Toronto 8 solved
CESG structures, 73 randomly chosen Generalized
assays for phosphatase, esterase,
phospodiesterase, protease, amino acid
dehydrogenase, alcohol dehydrogenase, organic
acid dehydrogenase, amino acid oxidase, alcohol
oxidase, organic acid oxidase, beta-lactamase,
beta-galactosidase, arylsulfatase,
lipase. Results - Solid hits 3 phosphatases,
5 esterases - Weaker hits 9 more esterases, 6
phosphodiesterases - No hits all others A.
Yakuknin et al. Current Opinion in Chemical
Biology, 842 (2004)
33Target At2g17340/JR5670
Initial Assay Wide-spectrum
- Absorbance 0.25 is a tentative signal, 0.5 is
a strong signal.
34Flow of information from DNA to functional
understanding
Modeling Inference
Basic Understanding/ Applications (e.g.
therapeutics)
Gene Model
Functional Assignments
High-throughput DNA Sequencing
Structure Determination Experimental Analysis
35At2g17340
- Enzyme of unknown specificity.
36A functional annotation lesson
37Functional Annotation by Inference
From raw DNA sequences, one looks for genomic
features such as promoters, alternative splicing
of mRNAs, retrotransposons, pseudogenes, tandem
duplications, synteny, and homology. It Is
homology, both from sequence and from structure,
that allow functional inferences to be
made. Prosite, Dali, VAST, FFAS03 Some tool
integrate knowledge from many sources into one
place, acting a meta-servers of clues.
38Connections between structure and function
Universe of functions
Universe of structures
39Connections between structure and function
Convergent evolution
Universe of functions
Universe of structures
40Connections between structure and function
Divergent evolution
Universe of functions
Universe of structures
41At1g18200
- Misleading annotation prior to our work, but
structure led to discovery of function.
42Flow of information from DNA to functional
understanding
Modeling Inference
Basic Understanding/ Applications (e.g.
therapeutics)
Gene Model
Functional Assignments
High-throughput DNA Sequencing
Structure Determination Experimental Analysis
43Summary
- Structural genomics efforts are gaining momentum
and helping to assign new functions to orfs and
to fill in the space of all possible - protein folds.
44The Center for Eukaryotic Structural
Genomics (supported by NIH GM64598 and GM074901)
Administration Madison (Primm,
Troestler, Markley, Phillips, Fox) Cloning/sequenc
ing pipeline Madison (Wrobel, Fox) Expression
pipeline Madison (Frederick, Fox, Riters) E.
coli cell growth pipeline Madison (Sreenath,
Burns, Seder, Fox) Cell-Free System Madison
(Vinarov, Markley, Newman) Protein purification
pipeline Madison (Vojtik, Phillips, Fox,
Ellefson, Jeon) Mass spectrometry Madison
(Aceti, Sabat, Sussman) Madison
NMRFAM (Song, Tyler, Cornilescu, Markley) NMR
spectroscopy Milwaukee MCW (Peterson, Volkman,
Lytle) Crystallization / crystallography
Madison (Bingman, Phillips, Bitto, Han, Bae,
Meske) Argonne (Advanced Photon
Source) Bioinformatics Madison (Bingman, Sun,
Phillips, Wesenberg) Indianapolis
(Dunker) Milwaukee MCW (Twigger, de la
Cruz) Computational support Madison (Bingman,
Ramirez, Phillips) Sesame Madison (Zolnai,
Markley, Lee)