The Future As Defined by Structural Genomics - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

The Future As Defined by Structural Genomics

Description:

Dept. of Pharmacology. University of California San Diego. pbourne_at_ucsd.edu ... Structural genomics is the process of high-throughput determination of the 3 ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 38
Provided by: PhilB51
Category:

less

Transcript and Presenter's Notes

Title: The Future As Defined by Structural Genomics


1
The Future As Defined by Structural Genomics
  • Philip E. Bourne
  • Dept. of Pharmacology
  • University of California San Diego
  • pbourne_at_ucsd.edu

2
Agenda
  • The Data
  • What is structural genomics exactly?
  • What has it achieved thus far?
  • What are its goals going forward?
  • One possible strategy for selecting targets
  • Unsolved Problems
  • New Challenges

3
Structural GenomicsA Broad Working Definition
  • Structural genomics is the process of
    high-throughput determination of the
    3-dimensional structures of biological
    macromolecules

4
Ah Yes, But What is the Goal?
  • The goal of the human genome project was clear
    cut.. The goal of structural genomics is not so
    clear cut Phase I..
  • Provision of enough structural templates to
    facilitate homology modeling of most proteins
  • Structures of all proteins in a complete proteome
  • Structural elucidation of a complete biological
    pathway
  • Structural elucidation of a complete disease

5
Example Goals (PSI Phase I)
The hyperthermophilic bacterium Thermotoga
maritima has been the target of choice for
pipeline development and genome-wide fold
coverage.
207
The SGPP consortium will determine and analyze
the three-dimensional structures of a large
number of proteins from major global pathogenic
protozoa, Leishmania major, Trypanosoma brucei,
Trypanosoma cruzi and Plasmodium falciparum.
35
Structural Genomics of Pathogenic Protozoa
It is aimed at determining structures of proteins
and protein complexes directly relevant to human
health and diseases.
79
6
Ah Yes, But What is the Goal? Phase II
7
Growth in the Number of Folds per Year According
To SCOP
New Folds
Total Folds
http//www.pdb.org/pdb/statistics/contentGrowthCha
rt.do?contentfold-scop from Nov., 2008
8
The Process - X-ray Crystallography
Basic Steps
  • Crystallomics
  • Isolation,
  • Expression,
  • Purification,
  • Crystallization

Target Selection
Data Collection
Structure Solution
Structure Refinement
Functional Annotation
Publish
9
What Has The Process Achieved Thus Far?
10
Much of the Data Discussed Will Come from
http//kb.psi-structuralgenomics.org/
Nucleic Acids Research 2006 34 D302-5
11
Current Status of All Centers 2006/2008
90421/200291 Targets
56626 / 133958
/89229
2479 /6020 (7.5/11.1 of PDB)
Chen et al. 2004 Bioinformatics 20(16) 2860-2
http//targetdb.rcsb.org Oct 20, 2005
12
Total Structures Released per Year
2006 586 2007 792 2008 3483
Chen et al. 2004 Bioinformatics 20(16) 2860-2
http//targetdb.rcsb.org Oct 20, 2005
13
PepcDB http//pepcdb.pdb.org/
Capturing of protocols associated with the
experiment
14
(No Transcript)
15
(No Transcript)
16
What Has The Process Achieved Thus Far?
  • While was only 7.5 of the current PDB (30 year
    history), now contributing 11 of all structures
    in a given year
  • Higher throughput is being achieved traditional
    laboratories benefit too
  • Useful data are being collected more
    systematically, but the situation could still be
    improved

17
Todd, Marsden, Thornton and Orengo 2005 JMB
348(5) 1235-60 provide the following data, but
based on 316 non-redundant structures
  • Quality and size of structures is comparable
  • 29 of domains revealed an evolutionary
    relationship not apparent from sequence
  • 19 and 11 contributed new superfamilies and
    folds, respectively ???
  • 9287 reliable homology models built across 206
    completely sequenced genomes

18
What Should be the Target Selection Strategy
Going Forward?
19
One Approach - Pfam 5000 Chandonia Brenner 2005
Proteins 58(1) 166-179
  • Would provide fold assignment for 68 of
    prokaryotic proteins and 61 of eukaryotic
  • This is significantly greater than would be
    achieved by completing a single genome

20
Our Approach is to Consider Coverage Relative to
the Human Genome
  • What protein structures would tell us most about
    the human condition if determined?

21
Basic Logic of Our Approach to Target Selection
  • Given the functions of proteins currently in the
    PDB
  • And what we can ascertain about the function of
    structural genomics targets
  • And what we know about the functional coverage of
    the human genome
  • What structures should be determined to increase
    our coverage of functional space
  • Which of those structures are most tractable?

22
Coverage of the Human Genome By Structure
PDB
Structural Genomics Targets
GO
Ensembl Human Genome Annotation
Superfamily
EC
Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
23
Drill down to the Appropriate level
Define the level of redundancy
Coverage by domains(s) or structure
24
PDB vs Human Genome Top Level EC Shows Even
Distribution
PDB
607/1141 Structures
9698 Sequences
Ensembl Human Genome Annotation
Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
25
PDB vs Human Genome EC Hydrolases Begins to
Illustrate the Bias in the PDB
PDB
2.5 Transferring alkyl or aryl groups over
represented in PDB 2.4 Glycosyltransferases
under represented in PDB
Ensembl Human Genome Annotation
Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
26
Functional Coverage (GO Molecular Function) of
the Human Genome By Structure, Targets and Models
SG Targets
Human Genome
PDB
Homology Models
  • As expected few structures of unknown function
    in the PDB at this stage. Large number of targets
    of unknown function
  • Enzyme regulation over represented in PDB
    GTPase, kinase regulator, caspase regulator

Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
27
Target Selection Relative to Disease
PDB
Structural Genomics Targets
OMIM
Swiss-Prot
Superfamily
Ensembl Human Genome Annotation
Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
28
Human Disease Coverage
SG Targets
Human Genome
PDB
Homology Models
  • PDB covers 69 of OMIM disease categories
  • Diseases of the CNS are over represented by
    targets
  • Disease of ear nose throat under represented in
    PDB but covered by targets and models
  • Cancers fewer targets at top level, but female
    related cancers over represented, male under
    represented by structures

29
Structural Coverage of the Human Genome
  • Single domains cover 37 of the functional
    classes identified in the genome
  • Whole structures cover 25
  • 37 goes to 56 with homology models
  • 25 goes to 31 with homology models
  • If all current structural genomics targets were
    solved (3x current PDB)
  • 37 goes to 69
  • 25 goes to 44

Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
30
Other Points to Note
  • Coverage by homology models is not even more
    divergent families are less well represented
  • Transporters and receptors (non membrane regions)
    are the most pressing
  • Possible to create a most wanted list of
    structures

Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
31
The Most Wanted List
  • So Far We Have Considered the Functional Coverage
    of Structures, Models and Targets Relative to the
    Human Genome (Based on the Current Level of
    Functional Annotation)
  • What if we turn that round and rather than ask
    what we know, ask what we do not know

Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
32
Bottom Line
  • There are approximately 1800 domains which have
    been functionally recognized in the human genome
    for which no structure exists (hence no homology
    models) and for which no target exists

Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
33
How Do We Get To This List?
  • Start with functional categories without
    structures
  • Select those without Superfamily assignments
    i.e., cant be modeled
  • Prefer those with a disease association
  • Remove those that appear less tractable based on
    prediction of transmembrane segments,
    coiled-coiled and low complexity

34
Examples from the Most Wanted List
  • The most understudied structures are various
    kinds of receptors and transporters
  • For catalytic activity the largest under
    representation is in protein synthesis and gene
    regulation
  • Congenital adrenal hyperplasia appears to have
    tractable domains without structure representation

35
Unsolved Problems
36
Some Problems with Estimators of What has Been
Achieved 2006/2008
  • Basic knowledge of macromolecular structure (40
    - missing temporal view, alternative views eg
    ligand view, rules for molecular recognition)
  • Integrated view of structure as part of a
    biological continuum of data and associated
    knowledge (20/30)
  • Structure representation, comparison and
    classification (60/80)
  • Structure prediction from sequence (30/27)

37
Some Problems with Estimators of What has Been
Achieved The Challenges 2006/2008
  • Inferring function from structure (20/30)
  • Inferring protein interactions (20/21)
  • Macromolecular assemblies (30/40)
  • Docking (20/25)
  • Rational drug discovery (5/6)
  • Structural evolution (1/5)
Write a Comment
User Comments (0)
About PowerShow.com