Title: Bioinformatics Master Course: DNA/Protein Structure-Function Analysis and Prediction
1Bioinformatics Master CourseDNA/Protein
Structure-Function Analysis and Prediction
- Lecture 13 Protein Function
2Sequence-Structure-Function
Ab initio prediction and folding
Sequence Structure Function
impossible but for the smallest structures
Threading
Function prediction from structure
Homology searching (BLAST)
very difficult
3Metabolomics fluxomics
4Systems Biology
- is the study of the interactions between the
components of a biological system, and how these
interactions give rise to the function and
behaviour of that system (for example, the
enzymes and metabolites in a metabolic pathway).
The aim is to quantitatively understand the
system and to be able to predict the systems
time processes - the interactions are nonlinear
- the interactions give rise to emergent
properties, i.e. properties that cannot be
explained by the components in the system - Biological processes include many time-scales,
many compartments and many interconnected network
levels (e.g. regulation, signalling,
expression,..)
5Systems Biology
- understanding is often achieved through modeling
and simulation of the systems components and
interactions. - Many times, the four Ms cycle is adopted
- Measuring
- Mining
- Modeling
- Manipulating
6The silicon cell (some people think
silly-con cell)
7(No Transcript)
8A system response
Apoptosis programmed cell death Necrosis
accidental cell death
9Human
Yeast
Comparative metabolomics
We need to be able to do automatic pathway
comparison (pathway alignment)
This pathway diagram shows a comparison of
pathways in (left) Homo sapiens (human) and
(right) Saccharomyces cerevisiae (bakers yeast).
Changes in controlling enzymes (square boxes in
red) and the pathway itself have occurred (yeast
has one altered (overtaking) path in the graph)
10The citric-acid cycle
http//en.wikipedia.org/wiki/Krebs_cycle
11The citric-acid cycle
Fig. 1. (a) A graphical representation of the
reactions of the citric-acid cycle (CAC),
including the connections with pyruvate and
phosphoenolpyruvate, and the glyoxylate shunt.
When there are two enzymes that are not
homologous to each other but that catalyse the
same reaction (non-homologous gene displacement),
one is marked with a solid line and the other
with a dashed line. The oxidative direction is
clockwise. The enzymes with their EC numbers are
as follows 1, citrate synthase (4.1.3.7) 2,
aconitase (4.2.1.3) 3, isocitrate dehydrogenase
(1.1.1.42) 4, 2-ketoglutarate dehydrogenase
(solid line 1.2.4.2 and 2.3.1.61) and
2-ketoglutarate ferredoxin oxidoreductase (dashed
line 1.2.7.3) 5, succinyl- CoA synthetase
(solid line 6.2.1.5) or succinyl-CoAacetoacetate
-CoA transferase (dashed line 2.8.3.5) 6,
succinate dehydrogenase or fumarate reductase
(1.3.99.1) 7, fumarase (4.2.1.2) class I (dashed
line) and class II (solid line) 8,
bacterial-type malate dehydrogenase (solid line)
or archaeal-type malate dehydrogenase (dashed
line) (1.1.1.37) 9, isocitrate lyase (4.1.3.1)
10, malate synthase (4.1.3.2) 11,
phosphoenolpyruvate carboxykinase (4.1.1.49) or
phosphoenolpyruvate carboxylase (4.1.1.32) 12,
malic enzyme (1.1.1.40 or 1.1.1.38) 13, pyruvate
carboxylase or oxaloacetate decarboxylase
(6.4.1.1) 14, pyruvate dehydrogenase (solid
line 1.2.4.1 and 2.3.1.12) and pyruvate
ferredoxin oxidoreductase (dashed line 1.2.7.1).
M. A. Huynen, T. Dandekar and P. Bork Variation
and evolution of the citric acid cycle a genomic
approach'' Trends Microbiol, 7, 281-29 (1999)
12The citric-acid cycle
b) Individual species might not have a complete
CAC. This diagram shows the genes for the CAC for
each unicellular species for which a genome
sequence has been published, together with the
phylogeny of the species. The distance-based
phylogeny was constructed using the fraction of
genes shared between genomes as a similarity
criterion29. The major kingdoms of life are
indicated in red (Archaea), blue (Bacteria) and
yellow (Eukarya). Question marks represent
reactions for which there is biochemical evidence
in the species itself or in a related species but
for which no genes could be found. Genes that lie
in a single operon are shown in the same color.
Genes were assumed to be located in a single
operon when they were transcribed in the same
direction and the stretches of non-coding DNA
separating them were less than 50 nucleotides in
length.
M. A. Huynen, T. Dandekar and P. Bork Variation
and evolution of the citric acid cycle a genomic
approach'' Trends Microbiol, 7, 281-29 (1999)
13Experimental
- Structural genomics
- Functional genomics
- Protein-protein interaction
- Metabolic pathways
- Expression data
14Communicability Functional Genomics
- Interpretation of genome-scale gene expression
data
External Program
DNA-chip data
- Cluster of coregulated genes
- gene 1
- gene 2
- ...
- gene n
PFMP query
- Pathways affected
- pathway 1
- pathway 2
15Communicability Functional Genomics
- Interpretation of genome-scale gene expression
data
External Programs
DNA-chip data
- Cluster of coregulated genes
- gene 1
- gene 2
- ...
- gene n
- Pattern discovery
- gene 1
- gene 2
- ...
- (putative regulatory sites)
- Similarities with known regulatory sites
- site 1 Factor 1
- site 2 Factor 2
- ...
PFMP query
16Other Issues
- Partial information (indirect interactions) and
subsequent filling of the missing steps - Negative results (elements that have been shown
not to interact, enzymes missing in an organism) - Putative interactions resulting from
computational analyses
17Protein function categories
- Catalysis (enzymes)
- Binding transport (active/passive)
- Protein-DNA/RNA binding (e.g. histones,
transcription factors) - Protein-protein interactions (e.g.
antibody-lysozyme) (experimentally determined by
yeast two-hybrid (Y2H) or bacterial two-hybrid
(B2H) screening ) - Protein-fatty acid binding (e.g. apolipoproteins)
- Protein small molecules (drug interaction,
structure decoding) - Structural component (e.g. ?-crystallin)
- Regulation
- Signalling
- Transcription regulation
- Immune system
- Motor proteins (actin/myosin)
18Catalytic properties of enzymes
Vmax S V -------------------
Km S
Michaelis-Menten equation
Vmax
- Km kcat
- E S ES E P
- E enzyme
- S substrate
- ES enzyme-substrate complex (transition state)
- P product
- Km Michaelis constant
- Kcat catalytic rate constant (turnover number)
- Kcat/Km specificity constant (useful for
comparison)
Moles/s
Vmax/2
Km
S
19Protein interaction domains
http//pawsonlab.mshri.on.ca/html/domains.html
20Energy difference upon binding
- Examples of protein interactions (and functional
importance) include - Protein protein (pathway analysis)
- Protein small molecules (drug interaction,
structure decoding) - Protein peptides, DNA/RNA (function analysis)
- The change in Gibbs Free Energy of the
protein-ligand binding interaction can be
monitored and expressed by the following - ?G ? H T ?S
- (HEnthalpy, SEntropy and TTemperature)
21Protein function
- Many proteins combine functions
- Some immunoglobulin structures are thought to
have more than 100 different functions (and
active/binding sites) - Alternative splicing can generate (partially)
alternative structures
22Protein function Interaction
Active site / binding cleft
Shape complementarity
23Protein function evolution
Chymotrypsin
24How to infer function
- Experiment
- Deduction from sequence
- Multiple sequence alignment conservation
patterns - Homology searching
- Deduction from structure
- Threading
- Structure-structure comparison
- Homology modelling
25Cholesterol Biosynthesis
- Cholesterol biosynthesis primarily occurs in
eukaryotic cells. It is necessary for membrane
synthesis, and is a precursor for steroid hormone
production as well as for vitamin D. While the
pathway had previously been assumed to be
localized in the cytosol and ER, more recent
evidence suggests that a good deal of the enzymes
in the pathway exist largely, if not exclusively,
in the peroxisome (the enzymes listed in blue in
the pathway to the left are thought to be at
least partly peroxisomal). Patients with
peroxisome biogenesis disorders (PBDs) have a
variable deficiency in cholesterol biosynthesis
26Cholesterol Biosynthesis from acetyl-Coa to
mevalonate
Mevalonate plays a role in epithelial cancers
it can inhibit EGFR
27Epidermal Growth Factor as a Clinical Target in
Cancer
- A malignant tumour is the product of uncontrolled
cell proliferation. Cell growth is controlled by
a delicate balance between growth-promoting and
growth-inhibiting factors. In normal tissue the
production and activity of these factors results
in differentiated cells growing in a controlled
and regulated manner that maintains the normal
integrity and functioning of the organ. The
malignant cell has evaded this control the
natural balance is disturbed (via a variety of
mechanisms) and unregulated, aberrant cell growth
occurs. A key driver for growth is the epidermal
growth factor (EGF) and the receptor for EGF (the
EGFR) has been implicated in the development and
progression of a number of human solid tumours
including those of the lung, breast, prostate,
colon, ovary, head and neck.
28Energy housekeeping
- Adenosine diphosphate (ADP) Adenosine
triphosphate (ATP)
29Chemical Reaction
30Enzymatic Catalysis
31Gene Expression
32Inhibition
33Metabolic Pathway Proline Biosynthesis
34Transcriptional Regulation
35Methionine Biosynthesis in E. coli
36Shortcut Representation
37High-level Interaction
38Levels of Resolution
39Cholesterol Biosynthesis
40SREBP Pathway
41Signal Transduction
Important signalling pathways Map-kinase (MapK)
signalling pathway, or TGF-? pathway
42Transport
43Phosphate Utilization in Yeast
44Multiple Levels of Regulation
- Gene expression
- Protein activity
- Protein intracellular location
- Protein degradation
- Substrate transport
45Graphical Representation Gene Expression
46Experimental Data Gene Expression
47Experimental Data Transcriptional Regulation
48Experimental Data Transcriptional Regulation
49Transcriptional RegulationIntegrated View
50Pathways and Pathway Diagrams
- Pathways
- Set of nodes (entities) and edges (associations)
- Pathway Diagrams
- XY coordinates
- Node splitting allowed
- Multiple views of the same pathway
- Different abstraction levels
51 Metabolic networksGlycolysis and
Gluconeogenesis
Kegg database (Japan)
52Gene Ontology (GO)
- Not a genome sequence database
- Developing three structured, controlled
vocabularies (ontologies) to describe gene
products in terms of - biological process
- cellular component
- molecular function
- in a species-independent manner
53The GO ontology
54Gene Ontology Members
- FlyBase - database for the fruitfly Drosophila
melanogaster - Berkeley Drosophila Genome Project (BDGP) -
Drosophila informatics GO database software,
Sequence Ontology development - Saccharomyces Genome Database (SGD) - database
for the budding yeast Saccharomyces cerevisiae - Mouse Genome Database (MGD) Gene Expression
Database (GXD) - databases for the mouse Mus
musculus - The Arabidopsis Information Resource (TAIR) -
database for the brassica family plant
Arabidopsis thaliana - WormBase - database for the nematode
Caenorhabditis elegans - EBI GOA project annotation of UniProt
(Swiss-Prot/TrEMBL/PIR) and InterPro databases - Rat Genome Database (RGD) - database for the rat
Rattus norvegicus - DictyBase - informatics resource for the slime
mold Dictyostelium discoideum - GeneDB S. pombe - database for the fission yeast
Schizosaccharomyces pombe (part of the Pathogen
Sequencing Unit at the Wellcome Trust Sanger
Institute) - GeneDB for protozoa - databases for Plasmodium
falciparum, Leishmania major, Trypanosoma brucei,
and several other protozoan parasites (part of
the Pathogen Sequencing Unit at the Wellcome
Trust Sanger Institute) - Genome Knowledge Base (GK) - a collaboration
between Cold Spring Harbor Laboratory and EBI) - TIGR - The Institute for Genomic Research
- Gramene - A Comparative Mapping Resource for
Monocots - Compugen (with its Internet Research Engine)
- The Zebrafish Information Network (ZFIN) -
reference datasets and information on Danio rerio