For your exam preparation: relevant topics from lectures 13-26 - PowerPoint PPT Presentation

1 / 174
About This Presentation
Title:

For your exam preparation: relevant topics from lectures 13-26

Description:

For your exam preparation: relevant topics from lectures 13-26 – PowerPoint PPT presentation

Number of Views:161
Avg rating:3.0/5.0
Slides: 175
Provided by: Volk73
Category:

less

Transcript and Presenter's Notes

Title: For your exam preparation: relevant topics from lectures 13-26


1
For your exam preparationrelevant topics from
lectures 13-26
2
V13 Protein Networks / Protein Complexes
  • Protein networks could be defined in a number of
    ways
  • - Co-regulated expression of genes/proteins
  • Proteins participating in the same metabolic
    pathways
  • Proteins sharing substrates
  • Proteins that are co-localized
  • Proteins that form permanent supracomplexes
    protein machineries
  • Proteins that bind eachother transiently
  • (signal transduction, bioenergetics ... )
  • Please describe different types of cellular
    networks.

3
Relation between lethality and function as
centers in protein networks
  • Likehood p(k) of finding proteins in yeast that
    interact with exactly k other proteins.
  • Probability has power law dependence.
  • (Similar plot for bacterium Heliobacter pylori.)
  • ? network of protein-protein interactions is a
    very inhomogenous scale-free network where a few,
    highly connected, proteins play central roles of
    mediating the interactions among other, less
    strongly connected, proteins.

Jeong, Mason, Barabási, Oltvai, Nature 411, 41
(2001)
4
Relation between lethality and function as
centers in protein networks
  • Computational analysis of the tolerance of
    protein networks for random errors (gene
    deletions).
  • Random mutations dont have an effect on the
    total topology of the network.
  • When hub proteins with many interactions are
    eliminated, the diameter of the network decreases
    quickly.

The degree of proteins being essential (gene
knock-out is lethal for cell) depends on the
connectivity in the yeast protein
network. Strongly connected proteins with central
roles in the architecture of the network are 3
times as essential as proteins with few
connections.
Jeong, Mason, Barabási, Oltvai, Nature 411, 41
(2001)
5
Analyis of protein complexes in yeast (S.
cerevisae)
Identify proteins by scanning yeast
protein database for protein composed of
fragments of suitable mass. Here, the
identified proteins are listed according to
their localization (a). (b) lists the number
of proteins per complex.
Gavin et al. Nature 415, 141 (2002)
6
V14 Protein Networks / Protein Complexes
  • Protein networks could be defined in a number of
    ways
  • - Co-regulated expression of genes/proteins
  • Proteins participating in the same metabolic
    pathways
  • Proteins sharing substrates
  • Proteins that are co-localized
  • Proteins that form permanent supracomplexes
    protein machineries
  • Proteins that bind eachother transiently
  • (signal transduction, bioenergetics ... )

7
Overview
Statistical analysis of protein-protein
interfaces in crystal structures
of protein-protein complexes residues in
interfaces have significantly different amino
acid composition that the rest of the protein. ?
predict protein-protein interaction sites from
local sequence information Conservation at
protein-protein interfaces interface regions are
more conserved than other regions on the protein
surface ? identify conserved regions on protein
surface e.g. from solvent accessibility Interacti
ng residues on two binding partners often show
correlated mutations (among different organisms)
if being mutated ? identify correlated
mutations Surface patterns of protein-protein
interfaces interface often formed by
hydrophobic patch surrounded by ring of polar or
charged residues. ? identify suitable patches on
surface if 3D structure is known
8
1 Analysis of interfaces
PDB contains 1812 non-redundant protein
complexes (less than 25 identity). Results
dont change significantly if NMR structures,
theoretical models, or structures at lower
resolution (altogether 50) are excluded. Most
interesting are the results for transiently
formed complexes. How many PDB structures of
protein-protein complexes are known? How many
residues are typically at an interface? (ca. 10-
20)
Ofran, Rost, J. Mol. Biol. 325, 377 (2003)
9
1 Properties of interfaces
Amino acid composition of six interface types.
The propensities of all residues found in
SWISS-PROT were used as background. If the
frequency of an amino acid is similar to its
frequency in SWISS-PROT, the height of the bar is
close to zero. Over-representation results in a
positive bar, and under-representation results in
a negative bar.
Ofran, Rost, J. Mol. Biol. 325, 377 (2003)
10
1 Pairing frequencies at interfaces
Residueresidue preferences. (A) Intra-domain
hydrophobic core is clear (B) domaindomain, (C)
obligatory homo-oligomers (homo-obligomers), (D)
transient homo-oligomers (homo-complexes), (E)
obligatory hetero-oligomers (hetero-obligomers),
and (F) transient hetero-oligomers
(hetero-complexes). A red square indicates that
the interaction occurs more frequently than
expected a blue square indicates that it occurs
less frequently than expected. The amino acid
residues are ordered according to hydrophobicity,
with isoleucine as the most hydrophobic and
arginine as the least hydrophobic.
Ofran, Rost, J. Mol. Biol. 325, 377 (2003)
11
3 Correlated mutations at interface
Pazos, Helmer-Citterich, Ausiello, Valencia J Mol
Biol 271, 511 (1997) correlation information is
sufficient for selecting the correct structural
arrangement of known heterodimers and protein
domains because the correlated pairs between the
monomers tend to accumulate at the contact
interface. Use same idea to identify interacting
protein pairs.
12
Correlated mutations at interface
Correlated mutations evaluate the similarity in
variation patterns between positions in a
multiple sequence alignment. Similarity of those
variation patterns is thought to be related to
compensatory mutations. Calculate for each
positions i and j in the sequence a rank
correlation coefficient (rij)
where the summations run over every possible pair
of proteins k and l in the multiple sequence
alignment. Sikl is the ranked similarity between
residue i in protein k and residue i in protein
l. Sjkl is the same for residue j. Si and Sj are
the means of Sikl and Sjkl.
Pazos, Valencia, Proteins 47, 219 (2002)
13
Correlated mutations at interface
Generate for protein i multiple sequence
alignment of homologous proteins (HSSP
database). Compare MSAs of two proteins, reduce
them by leaving only sequences of coincident
species (delete rows).
Pazos, Valencia, Proteins 47, 219 (2002)
14
i2h method
Schematic representation of the i2h method. A
Family alignments are collected for two different
proteins, 1 and 2, including corresponding
sequences from different species (a, b, c, ).
B A virtual alignment is constructed,
concatenating the sequences of the probable
orthologous sequences of the two proteins.
Correlated mutations are calculated. C The
distributions of the correlation values are
recorded. We used 10 correlation levels. The
corresponding distributions are represented for
the pairs of residues internal to the two
proteins (P11 and P22) and for the pairs composed
of one residue from each of the two proteins
(P12).
Pazos, Valencia, Proteins 47, 219 (2002)
15
Predictions from correlated mutations
Results obtained by i2h in a set of 14 two domain
proteins of known structure proteins with two
interacting domains. Treat the 2 domains as
different proteins. A Interaction index for the
133 pairs with 11 or more sequences in common.
The true positive hits are highlighted with
filled squares. B Representation of i2h
results, reminiscent of those obtained in the
experimental yeast two-hybrid system. The
diameter of the black circles is proportional to
the interaction index true pairs are highlighted
with gray squares. Empty spaces correspond to
those cases in which the i2h system could not be
applied, because they contained lt11 sequences
from different species in common for the two
domains. In most cases, i2h scored the correct
pair of protein domains above all other possible
interactions.
Pazos, Valencia, Proteins 47, 219 (2002)
16
4 Coevolutionary Analysis
Idea if co-evolution is relevant, a
ligand-receptor pair should occupy related
positions in phylogenetic trees. Observe that
for ligand-receptor pairs that are part of most
large protein families, the correlation between
their phylogenetic distance matrices is
significantly greater than for uncorrelated
protein families (Goh et al. 2000, Pazos,
Valencia, 2001). Finer analysis (Goh Cohen,
2002) shows that within these correlated
phylogenetic trees, the protein pairs that bind
have a higher correlation between their
phylogenetic distance matrices than other
homologs drawn drom the ligand and receptor
families that do not bind.
Goh, Cohen J Mol Biol 324, 177 (2002)
17
Summary
There exists now a small zoo of promising
experimental and theoretical methods to analyze
cellular interactome which proteins interact
with each other. Problem 1 each method detects
too few interactions (as seen by the fact that
the overlap between predictions of various
methods is very small) Problem 2 each method has
an intrinsic error rate producing false
positives and false negatives). Ideally,
everything will converge to a big picture
eventually. Solving Problem 1 will help
solving problem 2 by combining predictions.
Problem 1 can be partially solved by producing
more data -) In the mean time, the value of
network analysis (e.g. the identification of
isolated modules) is questionable to some
extent.
18
V15
19
Modularity in molecular networks?
A functional module is, by definition, a discrete
entity whose function is separable from those of
other modules. This separation depends on
chemical isolation, which can originate from
spatial localization or from chemical
specificity. E.g. a ribosome concentrates the
reactions involved in making a polypeptide into a
single particle, thus spatially isolating its
function. A signal transduction system is an
extended module that achieves its isolation
through the specificity of the initial binding of
the chemical signal to receptor proteins, and of
the interactions between signalling proteins
within the cell. Please give 3 reasons for the
occurrence of functional modules in biological
cells.
Hartwell et al. Nature 402, C47 (1999)
20
Modularity in molecular networks
Modules can be insulated from or connected to
each other. Insulation allows the cell to carry
out many diverse reactions without cross-talk
that would harm the cell. Connectivity allows
one function to influence another. The
higher-level properties of cells, such as their
ability to integrate information from multiple
sources, will be described by the pattern of
connections among their functional modules.
Hartwell et al. Nature 402, C47 (1999)
21
Organization of large-scale molecular networks
  • Organization of molecular networks revealed by
    large-scale experiments
  • power-law distribution P(k) ? exp-?
  • similar distribution of the node degree k (i.e.
    the number of edges of a node)
  • small-world property (i.e. a high clustering
    coefficient and a small shortest path between
    every pair of nodes)
  • anticorrelation in the node degree of connected
    nodes (i.e. highly interacting nodes tend to be
    connected to low-interacting ones)
  • These properties become evident when hundreds or
    thousands of molecules and their interactions are
    studied together.
  • On the other end of the spectrum recently
    discovered motifs that consist of 3-4 nodes.

22
Mesoscale properties of networks
Most relevant processes in biological networks
correspond to the mesoscale (5-25 genes or
proteins). It is computationally enormously
expensive to study mesoscale properties of
biological networks. e.g. a network of 1000 nodes
contains 1 ? 1023 possible 10-node sets. Spirin
Mirny analyzed combined network of protein
interactions with data from CELLZOME, MIPS,
BIND 6500 interactions.
23
Identify connected subgraphs
The network of protein interactions is typically
presented as an undirected graph with proteins
as nodes and protein interactions as undirected
edges. Aim identify highly connected subgraphs
(clusters) that have more interactions within
themselves and fewer with the rest of the
graph. A fully connected subgraph, or clique,
that is not a part of any other clique is an
example of such a cluster. In general, clusters
need not to be fully connected. Measure density
of connections by where n is the number of
proteins in the cluster and m is the number of
interactions between them.
Spirin, Mirny, PNAS 100, 12123 (2003)
24
(method I) Identify all fully connected subgraphs
(cliques)
Generally, finding all cliques of a graph is an
NP-hard problem. Because the protein interaction
graph is sofar very sparse (the number of
interactions (edges) is similar to the number of
proteins (nodes), this can be done quickly. To
find cliques of size n one needs to enumerate
only the cliques of size n-1. The search for
cliques starts with n 4, pick all (known) pairs
of edges (6500 ? 6500 protein interactions)
successively. For every pair A-B and C-D check
whether there are edges between A and C, A and D,
B and C, and B and D. If these edges are present,
ABCD is a clique. For every clique identified,
ABCD, pick all known proteins successively. For
every picked protein E, if all of the
interactions E-A, E-B, E-C, and E-D are known,
then ABCDE is a clique with size 5. Continue
for n 6, 7, ... The largest clique found in
the protein-interaction network has size 14.
Spirin, Mirny, PNAS 100, 12123 (2003)
25
(I) Identify all fully connected subgraphs
(cliques)
These results include, however, many redundant
cliques. For example, the clique with size 14
contains 14 cliques with size 13. To find all
nonredundant subgraphs, mark all proteins
comprising the clique of size 14, and out of all
subgraphs of size 13 pick those that have at
least one protein other than marked. After all
redundant cliques of size 13 are removed, proceed
to remove redundant twelves etc. In total, only
41 nonredundant cliques with sizes 4 - 14 were
found. Describe an algorithm to detect all
fully-connected subgraphs (cliques) in the
network of protein-protein interactions.
Spirin, Mirny, PNAS 100, 12123 (2003)
26
(method III) Monte Carlo Simulation
Use MC to find a tight subgraph of a
predetermined number of nodes M. At time t 0,
a random set of M nodes is selected. For each
pair of nodes i,j from this set, the shortest
path Lij between i and j on the graph is
calculated. Denote the sum of all shortest paths
Lij from this set as L0. At every time step one
of M nodes is picked at random, and one node is
picked at random out of all its neighbors. The
new sum of all shortest paths, L1, is calculated
if the original node were to be replaced by this
neighbor. If L1 lt L0, accept replacement with
probability 1. If L1 gt L0, accept replacement
with probability where T is the effective
temperature.
Spirin, Mirny, PNAS 100, 12123 (2003)
27
(III) Monte Carlo Simulation
Every tenth time step an attempt is made to
replace one of the nodes from the current set
with a node that has no edges to the current set
to avoid getting caught in an isolated
disconnected subgraph. This process is repeated
(i) until the original set converges to a
complete subgraph, or (ii) for a predetermined
number of steps, after which the tightest
subgraph (the subgraph corresponding to the
smallest L0) is recorded. The recorded clusters
are merged and redundant clusters are removed.
Spirin, Mirny, PNAS 100, 12123 (2003)
28
Optimal temperature in MC simulation
For every cluster size there is an optimal
temperature that gives the fastest convergence to
the tightest subgraph.
Time to find a clique with size 7 in MC steps per
site as a function of temperature T. The region
with optimal temperature is shown in Inset. The
required time increases sharply as the
temperature goes to 0, but has a relatively wide
plateau in the region 3 lt T lt 7. Simulations
suggest that the choice of temperature T ? M
would be safe for any cluster size M.
Spirin, Mirny, PNAS 100, 12123 (2003)
29
Merging Overlapping Clusters
A simple statistical test shows that nodes which
have only one link to a cluster are statistically
insignificant. Clean such statistically
insignificant members first. Then merge
overlapping clusters For every cluster Ai find
all clusters Ak that overlap with this cluster by
at least one protein. For every such found
cluster calculate Q value of a possible merged
cluster Ai U Ak . Record cluster Abest(i)
which gives the highest Q value if merged with
Ai. After the best match is found for every
cluster, every cluster Ai is replaced by a merged
cluster Ai U Abest(i) unless Ai U Abest(i) is
below a certain threshold value for QC. This
process continues until there are no more
overlapping clusters or until merging any of the
remaining clusters witll make a cluster with Q
value lower than QC.
Spirin, Mirny, PNAS 100, 12123 (2003)
30
Statistical significance of complexes and modules
Number of complete cliques (Q 1) as a function
of clique size enumerated in the network of
protein interactions (red) and in randomly
rewired graphs (blue, averaged gt1,000 graphs
where number of interactions for each protein is
preserved). Inset shows the same plot in
log-normal scale. Note the dramatic enrichment in
the number of cliques in the protein-interaction
graph compared with the random graphs. Most of
these cliques are parts of bigger complexes and
modules.
Draw the distrubution of cliques in the plot.
Spirin, Mirny, PNAS 100, 12123 (2003)
31
Statistical significance of complexes and modules
Distribution of Q of clusters found by the MC
search method. Red bars original network of
protein interactions. Blue cuves randomly
rewired graphs. Clusters in the protein network
have many more interactions than their
counterparts in the random graphs.
Spirin, Mirny, PNAS 100, 12123 (2003)
32
Discovered functional modules
Examples of discovered functional modules. (A) A
module involved in cell-cycle regulation. This
module consists of cyclins (CLB1-4 and CLN2) and
cyclin-dependent kinases (CKS1 and CDC28) and a
nuclear import protein (NIP29). Although they
have many interactions, these proteins are not
present in the cell at the same time. (B)
Pheromone signal transduction pathway in the
network of proteinprotein interactions. This
module includes several MAPK (mitogen-activated
protein kinase) and MAPKK (mitogen-activated
protein kinase kinase) kinases, as well as other
proteins involved in signal transduction. These
proteins do not form a single complex rather,
they interact in a specific order. Assuming that
functional modules are identified from analyzing
protein-protein interaction networks will all
protein belonging to a functional module always
be present simultaneously?
33
Robustness of clusters found
Model effect of false positives in experimental
data randomly reconnect, remove or add 10-50 of
interactions in network. Cluster recovery
probability as a function of the fraction of
altered links. Black curves correspond to the
case when a fraction of links are rewired. Red,
removed green, added. Circles represent the
probability to recover 75 of the original
cluster triangles represent the probability to
recover 50.
Noise in the form of removal or addions lf links
has less deteriorating effect than random
rewiring. About 75 of clusters can still be
found when 10 of links are rewired.
Spirin, Mirny, PNAS 100, 12123 (2003)
34
Summary
Here analysis of meso-scale properties
demonstrated the presence of highly connected
clusters of proteins in a network of protein
interactions. Strong support for suggested
modular architecture of biological
networks. Distinguish 2 types of clusters
protein complexes and dynamic functional
modules. Both complexes and modules have more
interactions among their members than with the
rest of the network. Dynamic modules are elusive
to experimental purification because they are not
assembled as a complex at any single point in
time. Computational analysis allows detection of
such modules by integrating pairwise molecular
interactions that occur at different times and
places. However, computational analysis alone,
does not allow to distinguish between complexes
and modules or between transient and simultaneous
interactions.
35
Evolution of the yeast protein interaction network
How do biological networks develop? Sofar,
protein interaction network of yeast is one of
the best characterized networks. Parts of this
network should be inherited from the last common
ancestor of the three domains of life
Eubacteria, Archaea, and Eukaryotes. Use again
graph theory to model the yeast protein
interaction network. Proteins nodes, pairwise
interactions link between two nodes. Evolution
can be inferred by analyzing the growth pattern
of the graph. Classify all nodes (proteins) into
isotemporal categories based on each proteins
orthologous hits in COG data base.
Qin et al. PNAS 100, 12820 (2003)
36
Evolution of the yeast protein interaction network
Isotemporal categories are designed through a
binary (b) coding scheme. The b code represents
the distribution of each yeast protein's
orthologs in the universal tree of life. Bit
value 1 indicates the presence of at least one
orthologous hit for a yeast protein in a
corresponding group of genomes, and bit value 0
indicates the absence of any orthologous hit. The
presented example is 110011 in the b format and
51 in the d format. Orthologous identifications
are based on COGs at NCBI and in von Mering et
al. (2002).
Previously, phylogenetic profiles were used to
detect protein interaction partners. Here, use
phylogenetic profiles to detect modules.
Qin et al. PNAS 100, 12820 (2003)
37
Evolution of the yeast protein interaction network
Interaction patterns. Z scores for all possible
interactions of the isotemporal categories in the
protein interaction network. For categories i
and j, Zi,j (Fi,jobs Fi,jmean)/?i,j where
Fi,jobs is the observed number of interactions,
and Fi,jmean and ?i,j are the average number of
interactions and the SD, respectively, in 10,000
MS02 null models.
The diagonal distribution of large positive Z
scores indicates that yeast proteins tend to
interact with proteins from the same or closely
related isotemperal categories.
Qin et al. PNAS 100, 12820 (2003)
38
Evolution of the yeast protein interaction network
The observed intracategory association tendencies
are consistent with the intuitive notion that a
new function likely requires a group of new
proteins, and that the growth of the protein
interaction network is under functional
constraints. Although the turnover rate of the
protein interaction network is suggested to be
very fast, these results suggest that many
isotemporal clusters can still remain well
preserved during evolution. The formation and
conservation of isotemporal clusters during
evolution may be the consequence of selection for
the modular organization of the protein
interaction network. The progressive nature of
the network evolution and significant isotemporal
clustering may have contributed to the
hierarchical organization of modularity in
biological networks in general.
Qin et al. PNAS 100, 12820 (2003)
39
V16
40
V17, V18
look at Tihamers overheads basics of network
types are important for exam
41
V19
42
Computational Studies of Metabolic Networks -
Introduction
Different levels for describing metabolic
networks - classical biochemical pathways
(glycolysis, TCA cycle, ... - stoichiometric
modelling (flux balance analysis) theoretical
capabilities of an integrated cellular process,
feasible metabolic flux distributions -
automatic decomposition of metabolic networks
(elementary nodes, extreme pathways ...) -
kinetic modelling (E-Cell ...) problem general
lack of kinetic information on the dynamics and
regulation of cellular metabolism As a primer
today EcoCyc Global Properties of the Metabolic
Map of E. coli, Ouzonis, Karp, Genome Research
10, 568 (2000)
43
EcoCyc Database
Genetic complement of E.coli 4.7 million DNA
bases. How can we characterize the functional
complement of E.coli and according to what
criteria can we compare the biochemical networks
of two organisms? EcoCyc contains the metabolic
map of E.coli defined as the set of all known
pathways, reactions and enzymes of E.coli
small-molecule metabolism. Analyze - the
connectivity relationships of the metabolic
network - its partitioning into pathways - enzyme
activation and inhibition - repetition and
multiplicity of elements such as enzymes,
reactions, and substrates.
Ouzonis, Karp, Genome Res. 10, 568 (2000)
44
Reactions
The number of reactions (744) and the number of
enzymes (607) differ in EcoCyc. WHY??
(1) there is no one-to-one mapping between
enzymes and reactions some enzymes catalyze
multiple reactions, and some reactions are
catalyzed by multiple enzymes.
(2) for some reactions known to be catalyzed by
E.coli, the enzyme has not yet been identified.
Ouzonis, Karp, Genome Res. 10, 568 (2000)
45
Compounds
The 744 reactions of E.coli small-molecule
metabolism involve a total of 791 different
substrates. On average, each reaction contains
4.0 substrates.
Number of reactions containing varying numbers of
substrates (reactants plus products).
Ouzonis, Karp, Genome Res. 10, 568 (2000)
46
Pathways
EcoCyc describes 131 pathways energy
metabolism nucleotide and amino acid
biosynthesis secondary metabolism Pathways vary
in length from a single reaction step to 16
steps with an average of 5.4 steps. Fill the
distribution of pathway lengths in EcoCyc into
the plot
Length distribution of EcoCyc pathways
Ouzonis, Karp, Genome Res. 10, 568 (2000)
47
Pathways
However, there is no precise biological
definition of a pathway. The partitioning of the
metabolic network into pathways (including the
well-known examples of biochemical pathways) is
somehow arbitrary. These decisions of course
also affect the distribution of pathway lengths.
Ouzonis, Karp, Genome Res. 10, 568 (2000)
48
Protein Subunits
A unique property of EcoCyc is that it explicitly
encodes the subunit organization of
proteins. Therefore, one can ask questions such
as Are protein subunits encoded by neighboring
genes? Interestingly, this is the case for gt 80
of known heteromeric enzymes.
Ouzonis, Karp, Genome Res. 10, 568 (2000)
49
Reactions Catalyzed by More Than one Enzyme
Diagram showing the number of reactions that are
catalyzed by one or more enzymes. Most reactions
are catalyzed by one enzyme, some by two, and
very few by more than two enzymes. For 84
reactions, the corresponding enzyme is not yet
encoded in EcoCyc. What may be the reasons for
isozyme redundancy?
(1) the enzymes that catalyze the same reaction
are homologs and have duplicated (or were
obtained by horizontal gene transfer), acquiring
some specificity but retaining the same mechanism
(divergence)
(2) the reaction is easily invented therefore,
there is more than one protein family that is
independently able to perform the catalysis
(convergence).
Ouzonis, Karp, Genome Res. 10, 568 (2000)
50
Enzymes that catalyze more than one reaction
Genome predictions usually assign a single
enzymatic function. However, E.coli is known to
contain many multifunctional enzymes. Of the 607
E.coli enzymes, 100 are multifunctional, either
having the same active site and different
substrate specificities or different active
sites. Number of enzymes that catalyze one or
more reactions. Most enzymes catalyze one
reaction some are multifunctional. The
enzymes that catalyze 7 and 9 reactions are
purine nucleoside phosphorylase and nucleoside
diphosphate kinase. Take-home message The high
proportion of multifunctional enzymes implies
that the genome projects significantly
underpredict multifunctional enzymes!
Ouzonis, Karp, Genome Res. 10, 568 (2000)
51
Reactions participating in more than one pathway
The 99 reactions belonging to
multiple pathways appear to be the
intersection points in the complex network of
chemical processes in the cell. E.g. the
reaction present in 6 pathways corresponds to the
reaction catalyzed by malate dehydrogenase, a
central enzyme in cellular metabolism.
Ouzonis, Karp, Genome Res. 10, 568 (2000)
52
Implications of EcoCyc Analysis
Although 30 of E.coli genes remain unidentified,
enzymes are the best studied and easily
identifiable class of proteins. Therefore, few
new enzymes can be expected to be discovered. The
metabolic map presented may be 90
complete. Implication for metabolic maps derived
from automatic genome annotation automatic
annotation does generally not identify
multifunctional proteins. The network complexity
may therefore be underestimated. EcoCyc results
often cannot be obtained from protein or nucleic
acid sequence databases because they store
protein functions using text descriptions. E.g.
sequence databases dont include precise
information about subunit organization of
proteins.
Ouzonis, Karp, Genome Res. 10, 568 (2000)
53
Stoichiometric matrix
Stoichiometric matrix A matrix with reaction
stochio-metries as columns and metabolite
participations as rows. The stochiometric matrix
is an important part of the in silico model.
With the matrix, the methods of extreme pathway
and elementary mode analyses can be used to
generate a unique set of pathways P1, P2, and P3
(see future lecture).
Papin et al. TIBS 28, 250 (2003)
54
Flux balancing
mass conservation. Therefore one may analyze
metabolic systems by requiring mass
conservation. Only required knowledge about
stoichiometry of metabolic pathways and metabolic
demands For each metabolite Under steady-state
conditions, the mass balance constraints in a
metabolic network can be represented
mathematically by the matrix equation S v
0 where the matrix S is the m ? n stoichiometric
matrix, m the number of metabolites and n the
number of reactions in the network. The vector v
represents all fluxes in the metabolic network,
including the internal fluxes, transport fluxes
and the growth flux.
Any chemical reaction requires
55
Flux balance analysis
Since the number of metabolites is generally
smaller than the number of reactions (m lt n) the
flux-balance equation is typically
underdetermined. Therefore there are generally
multiple feasible flux distributions that satisfy
the mass balance constraints. The set of
solutions are confined to the nullspace of matrix
S. To find the true biological flux in cells
(? e.g. Heinzle, Huber, UdS) one needs additional
(experimental) information, or one may impose
constraints on the magnitude of each individual
metabolic flux. The intersection of the
nullspace and the region defined by those linear
inequalities defines a region in flux space the
feasible set of fluxes.
56
Feasible solution set for a metabolic reaction
network
(A) The steady-state operation of the metabolic
network is restricted to the region within a
cone, defined as the feasible set. The feasible
set contains all flux vectors that satisfy the
physicochemical constrains. Thus, the feasible
set defines the capabilities of the metabolic
network. All feasible metabolic flux
distributions lie within the feasible set, and
(B) in the limiting case, where all constraints
on the metabolic network are known, such as the
enzyme kinetics and gene regulation, the feasible
set may be reduced to a single point. This single
point must lie within the feasible set.
57
E.coli in silico
Define ?i 0 for irreversible internal fluxes,
?i -? for reversible internal fluxes (use
biochemical literature) Transport fluxes for
PO42-, NH3, CO2, SO42-, K, Na was
unrestrained. For other metabolites except for
those that are able to leave the metabolic
network (i.e. acetate, ethanol, lactate,
succinate, formate, pyruvate etc.) Find
particular metabolic flux distribution with
feasible set by linear programming. LP finds a
solution that minimizes a particular metabolic
objective (subject to the imposed constraints) Z
where
In fact, the method finds the solution that
maximizes fluxes gives maximal biomass.
Edwards Palsson, PNAS 97, 5528 (2000)
58
Interpretation of gene deletion results
The essential gene products were involved in the
3-carbon stage of glycolysis, 3 reactions of the
TCA cycle, and several points within the
PPP. The remainder of the central metabolic
genes could be removed while E.coli in silico
maintained the potential to support cellular
growth. This suggests that a large number of the
central metabolic genes can be removed without
eliminating the capability of the metabolic
network to support growth under the conditions
considered.
Edwards Palsson PNAS 97, 5528 (2000)
59
Summary
FBA analysis constructs the optimal network
utilization simply using stoichiometry of
metabolic reactions and capacity
constraints. For E.coli the in silico results
are consistent with experimental data. FBA shows
that in the E.coli metabolic network there are
relatively few critical gene products in central
metabolism. However, the the ability to adjust to
different environments (growth conditions) may be
dimished by gene deletions. FBA identifies the
best the cell can do, not how the cell actually
behaves under a given set of conditions. Here,
survival was equated with growth. FBA does not
directly consider regulation or regulatory
constraints on the metabolic network. This can be
treated separately (see future lecture).
Edwards Palsson PNAS 97, 5528 (2000)
60
V20
61
Extreme Pathways
introduced into metabolic analysis by the lab of
Bernard Palsson (Dept. of Bioengineering, UC San
Diego). The publications of this lab are
available at http//gcrg.ucsd.edu/publications/ind
ex.html Extreme pathway technique is based on
the stoichiometric matrix representation of
metabolic networks. All external fluxes
are defined as pointing outwards. Schilling,
Letscher, Palsson, J. theor. Biol. 203, 229 (2000)
62
Extreme Pathways algorithm - setup
The algorithm to determine the set of extreme
pathways for a reaction network follows the
pinciples of algorithms for finding the extremal
rays/ generating vectors of convex polyhedral
cones. Combine n ? n identity matrix (I) with
the transpose of the stoichiometric matrix ST. I
serves for bookkeeping. Schilling,
Letscher, Palsson, J. theor. Biol. 203, 229 (2000)
S
I
ST
63
separate internal and external fluxes
Examine contraints on each of the exchange fluxes
as given by ?j ? bj ? ?j If the exchange flux is
constained to be positive do nothing, if the
exchange flux is constrained to be negative
multiply the corresponding row of the initial
matrix by -1. If the exchange flux is
unconstrained move the entire row to a temporary
matrix T(E). This completes the first tableau
T(0). T(0) and T(E) for the example reaction
system are shown on the previous slide. Each
element of this matrices will be designated
Tij. Starting with x 1 and T(0) T(x-1) the
next tableau is generated in the following
way Schilling, Letscher, Palsson, J. theor.
Biol. 203, 229 (2000)
64
idea of algorithm
(1) Identify all metabolites that do not have an
unconstrained exchange flux associated with them.
The total number of such metabolites is denoted
by ?. For the example, this is only the case for
metabolite C (? 1). What is the main idea? -
We want to find balanced extreme pathways that
dont change the concentrations of metabolites
when flux flows through (input fluxes are
channelled to products not to accumulation of
intermediates). - The stochiometrix matrix
describes the coupling of each reaction to
the concentration of metabolites X. - Now we need
to balance combinations of reactions that leave
concentrations unchanged. Pathways applied to
metabolites should not change their
concentrations ? the matrix entries need to be
brought to 0.
Schilling, Letscher, Palsson, J. theor. Biol.
203, 229 (2000)
65
keep pathways that do not change concentrations
of internal metabolites
(2) Begin forming the new matrix T(x) by
copying all rows from T(x 1) which contain a
zero in the column of ST that corresponds to the
first metabolite identified in step 1, denoted
by index c. (Here 3rd column of
ST.) Schilling, Letscher, Palsson, J.
theor. Biol. 203, 229 (2000)
1 -1 1 0 0 0
1 0 -1 1 0 0
1 0 1 -1 0 0
1 0 0 -1 1 0
1 0 0 1 -1 0
1 0 0 -1 0 1
T(0)
T(1)
1 -1 1 0 0 0

66
balance combinations of other pathways
(3) Of the remaining rows in T(x-1) add
together all possible combinations of rows which
contain values of the opposite sign in column c,
such that the addition produces a zero in this
column. Schilling, et al. JTB 203, 229
1 -1 1 0 0 0
1 0 -1 1 0 0
1 0 1 -1 0 0
1 0 0 -1 1 0
1 0 0 1 -1 0
1 0 0 -1 0 1
T(0)
1 0 0 0 0 0 -1 1 0 0 0
0 1 1 0 0 0 0 0 0 0 0
0 1 0 1 0 0 0 -1 0 1 0
0 1 0 0 0 1 0 -1 0 0 1
0 0 1 0 1 0 0 1 0 -1 0
0 0 0 1 1 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 -1 1
T(1)
67
remove non-orthogonal pathways
(4) For all of the rows added to T(x) in steps 2
and 3 check to make sure that no row exists that
is a non-negative combination of any other sets
of rows in T(x) . One method used is as
follows let A(i) set of column indices j for
with the elements of row i 0. For the example
above Then check to determine if there
exists A(1) 2,3,4,5,6,9,10,11 another row
(h) for which A(i) is a A(2)
1,4,5,6,7,8,9,10,11 subset of A(h). A(3)
1,3,5,6,7,9,11 A(4) 1,3,4,5,7,9,10 If A(i)
? A(h), i ? h A(5) 1,2,3,6,7,8,9,10,11 where A
(6) 1,2,3,4,7,8,9 A(i) j Ti,j 0, 1 ?
j ? (nm) then row i must be eliminated
from T(x) Schilling et al. JTB 203, 229
68
repeat steps for all internal metabolites
(5) With the formation of T(x) complete steps 2
4 for all of the metabolites that do not have an
unconstrained exchange flux operating on the
metabolite, incrementing x by one up to ?. The
final tableau will be T(?). Note that the number
of rows in T (?) will be equal to k, the number
of extreme pathways. Schilling et
al. JTB 203, 229
69
balance external fluxes
(6) Next we append T(E) to the bottom of T(?).
(In the example here ? 1.) This results in the
following tableau Schilling et
al. JTB 203, 229
1 -1 1 0 0 0
1 1 0 0 0 0 0
1 1 0 -1 0 1 0
1 1 0 -1 0 1 0
1 1 0 1 0 -1 0
1 1 0 0 0 0 0
1 1 0 0 0 -1 1
1 -1 0 0 0 0
1 0 -1 0 0 0
1 0 0 0 -1 0
1 0 0 0 0 -1
T(1/E)
70
balance external fluxes
(7) Starting in the n1 column (or the first
non-zero column on the right side), if Ti,(n1)
? 0 then add the corresponding non-zero row from
T(E) to row i so as to produce 0 in the n1-th
column. This is done by simply multiplying the
corresponding row in T(E) by Ti,(n1) and adding
this row to row i . Repeat this procedure for
each of the rows in the upper portion of the
tableau so as to create zeros in the entire upper
portion of the (n1) column. When finished,
remove the row in T(E) corresponding to the
exchange flux for the metabolite just
balanced. Schilling et al. JTB 203, 229
71
balance external fluxes
(8) Follow the same procedure as in step (7) for
each of the columns on the right side of the
tableau containing non-zero entries. (In this
example we need to perform step (7) for every
column except the middle column of the right side
which correponds to metabolite C.) The final
tableau T(final) will contain the transpose of
the matrix P containing the extreme pathways in
place of the original identity matrix. Sc
hilling et al. JTB 203, 229
72
pathway matrix
1 -1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 1 -1 1 0 0 0 0 0 0
1 1 -1 1 0 0 0 0 0 0
1 1 1 -1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 1 -1 1 0 0 0 0 0 0
T(final) PT Schilling et al. JTB
203, 229
v1 v2 v3 v4 v5 v6 b1 b2 b3
b4
p1 p7 p3 p2 p4 p6 p5
1 0 0 0 0 0 -1 1 0 0
0 1 1 0 0 0 0 0 0 0
0 1 0 1 0 0 0 -1 1 0
0 1 0 0 0 1 0 -1 0 1
0 0 1 0 1 0 0 1 -1 0
0 0 0 1 1 0 0 0 0 0
0 0 0 0 1 1 0 0 -1 1
73
Extreme Pathways for model system
2 pathways p6 and p7 are not shown (right below)
because all exchange fluxes with the exterior
are 0. Such pathways have no net overall effect
on the functional capabilities of the
network. They belong to the cycling of reactions
v4/v5 and v2/v3.
Schilling et al. JTB 203, 229
v1 v2 v3 v4 v5 v6 b1 b2 b3
b4
p1 p7 p3 p2 p4 p6 p5
1 0 0 0 0 0 -1 1 0 0
0 1 1 0 0 0 0 0 0 0
0 1 0 1 0 0 0 -1 1 0
0 1 0 0 0 1 0 -1 0 1
0 0 1 0 1 0 0 1 -1 0
0 0 0 1 1 0 0 0 0 0
0 0 0 0 1 1 0 0 -1 1
74
How reactions appear in pathway matrix
In the matrix P of extreme pathways, each column
is an EP and each row corresponds to a reaction
in the network. The numerical value of the i,j-th
element corresponds to the relative flux level
through the i-th reaction in the j-th EP.
Papin, Price, Palsson, Genome Res. 12, 1889
(2002)
75
Properties of pathway matrix
A symmetric Pathway Length Matrix PLM can be
calculated where the values along the diagonal
correspond to the length of the EPs.
The off-diagonal terms of PLM are the number of
reactions that a pair of extreme pathways have in
common.
Papin, Price, Palsson, Genome Res. 12, 1889 (2002)
76
Properties of pathway matrix
One can also compute a reaction participation
matrix PPM from P where the diagonal
correspond to the number of pathways in which the
given reaction participates.
Papin, Price, Palsson, Genome Res. 12, 1889 (2002)
77
Extreme Pathway Analysis
Calculation of EPs for increasingly large
networks is computationally intensive and results
in the generation of large data sets. Even for
integrated genome-scale models for microbes under
simple conditions, EP analysis can generate
thousands of vectors! Interpretation - the
metabolic network of H. influenza has an order of
magnitude larger degree of pathway redundancy
than the metabolic network of H. pylori Found
elsewhere the number of reactions that
participate in EPs that produce a particular
product is poorly correlated to the product yield
and the molecular complexity of the
product. Possible way out?
Papin, Price, Palsson, Genome Res. 12, 1889 (2002)
78
Diagonalisation of pathway matrix?
http//mathworld.wolfram.com
79
Single Value Decomposition of EP matrices
For a given EP matrix P ?? n?p, SVD decomposes P
into 3 matrices
where U ?? n?n is an orthonormal matrix of the
left singular vectors, V ??p?p is an analogous
orthonormal matrix of the right singular vectors,
and ? ??r?r is a diagonal matrix containing the
singular values ?i1..r arranged in descending
order where r is the rank of P. The first r
columns of U and V, referred to as the left and
right singular vectors, or modes, are unique and
form the orthonormal basis for the column space
and row space of P. The singular values are the
square roots of the eigenvalues of PTP. The
magnitude of the singular values in ? indicate
the relative contribution of the singular vectors
in U and V in reconstructing P. E.g. the second
singular value contributes less to the
construction of P than the first singular value
etc.
Price et al. Biophys J 84, 794 (2003)
80
Single Value Decomposition of EP Interpretation
The first mode (as the other modes) corresponds
to a valid biochemical pathway through the
network. The first mode will point into the
portions of the cone with highest density of EPs.
Price et al. Biophys J 84, 794 (2003)
81
SVD applied for Heliobacter systems
Cumulative fractional contributions for the
singular value decomposition of the EP matrices
of H. influenza and H. pylori. This plot
represents the contribution of the first n modes
to the overall description of the system.
Price et al. Biophys J 84, 794 (2003)
82
Summary
Extreme pathway analysis provides a
mathematically rigorous way to dissect complex
biochemical networks. The matrix products PT ? P
and PT ? P are useful ways to interpret pathway
lengths and reaction participation. However, the
number of computed vectors may range in the
1000sands. Therefore, meta-methods (e.g.
singular value decomposition) are required that
reduce the dimensionality to a useful number that
can be inspected by humans. Single value
decomposition may be one useful method ... and
there are more to come.
Price et al. Biophys J 84, 794 (2003)
83
V21
84
Metabolic Pathway Analysis Elementary Modes
The technique of Elementary Flux Modes (EFM) was
developed prior to extreme pathways (EP) by
Stephan Schuster, Thomas Dandekar and
co-workers Pfeiffer et al. Bioinformatics, 15,
251 (1999) Schuster et al. Nature Biotech. 18,
326 (2000) The method is very similar to the
extreme pathway method to construct a basis for
metabolic flux states based on methods from
convex algebra. Extreme pathways are a subset of
elementary modes, and for many systems, both
methods coincide. Are the subtle differences
important?
85
Review Metabolite Balancing
For analyzing a biochemical network, its
structure is expressed by the stochiometric matrix
S consisting of m rows corresponding to the
substances (metabolites) and n rows corresponding
to the stochiometric coefficients of the
metabolites in each reaction. A vector v denotes
the reaction rates (mmol/g dry weight hour) and
a vector c describes the metabolite
concentrations. Due to the high turnover of
metabolite pools one often assumes pseudo-steady
state (c(t) constant) leading to the
fundamental Metabolic Balancing
Equation (1) Flux distributions v satisfying
this relationship lie in the null space of S and
are able to balance all metabolites. Klamt et
al. Bioinformatics 19, 261 (2003)
86
Review Metabolic flux analysis
Metabolic flux analysis (MFA) determine
preferably all components of the
flux distribution v in a metabolic network during
a certain stationary growth experiment. Typically
some measured or known rates must be provided to
calculate unknown rates. Accordingly, v and S are
partioned into the known (vb, Sb) and unknown
part (va, Sa). (1) leads to the central
equation for MFA describing a flux scenario 0
S ? v Sa ? va Sb ? vb. The rank of Sa
determines whether this scenario is redundant
and/or underdetermined. Redundant systems can be
checked on inconsistencies. In underdetermined
scenarios, only some element of va are uniquely
calculable. Klamt et al. Bioinformatics 19, 261
(2003)
87
Review structural network analysis (SNA)
Whereas MFA focuses on a single flux
distribution, techniques of Structural
(Stochiometric, Topological) Network Analysis
(SNA) address general topological properties,
overall capabilities, and the inherent pathway
structure of a metabolic network. Basic
topological properties are, e.g., conserved
moieties. Flux Balance Analysis (FBA9 searches
for single optimal flux distributions (mostly
with respect to the synthesis of biomass)
fulfilling S ? v 0 and additionally
reversibility and capacity restrictions for each
reaction (?i ? vi ? ?i). Klamt et al.
Bioinformatics 19, 261 (2003)
88
Review Metabolic Pathway Analysis (MPA)
Metabolic Pathway Analysis searches for
meaningful structural and functional units in
metabolic networks. The most promising, very
similar approaches are based on convex analysis
and use the sets of elementary flux modes
(Schuster et al. 1999, 2000) and extreme pathways
(Schilling et al. 2000). Both sets span the
space of feasible steady-state flux distributions
by non-decomposable routes, i.e. no subset of
reactions involved in an EFM or EP can hold the
network balanced using non-trivial fluxes. MPA
can be used to study e.g. - routing
flexibility/redundancy of networks -
functionality of networks - idenfication of
futile cycles - gives all (sub)optimal pathways
with respect to product/biomass yield - can be
useful for calculability studies in MFA Klamt
et al. Bioinformatics 19, 261 (2003)
89
Elementary Flux Modes
Start from list of reaction equations and a
declaration of reversible and irreversible
reactions and of internal and external
metabolites. E.g. reaction scheme of
monosaccharide Fig.1 metabolism. It includes
15 internal metabolites, and 19 reactions. ? S
has dimension 15 ? 19. It is convenient to
reduce this matrix by lumping those reactions
that necessarily operate together. ?
Gap,Pgk,Gpm,Eno,Pyk, ? Zwf,Pgl,Gnd Such
groups of enzymes can be detected
automatically. This reveals another two sequences
Fba,TpiA and 2 Rpe,TktI,Tal,TktII. Schuster
et al. Nature Biotech 18, 326 (2000)
90
Elementary Flux Modes
Lumping the reactions in any one sequence gives
the following reduced system Construct initial
tableau by combining S with identity matrix
Ru5P FP2 F6P GAP R5P
Pgi Fba,TpiA Rpi reversible 2Rpe,TktI,Tal,Tkt
II Gap,Pgk,Gpm,Eno,Pyk Zwf,Pgl,Gnd Pfk ir
reversible Fbp Prs_DeoB
1 0 ... 0 0 0 1 0 0
0 1 ... 0 0 -1 0 2 0
0 0 ... 0 -1 0 0 0 1
0 0 ... 0 -2 0 2 1 -1
0 0 ... 0 0 0 0 -1 0
0 0 ... 0 1 0 0 0 0
0 0 ... 0 0 1 -1 0 0
0 0 ... 0 0 -1 1 0 0
0 0 ... 1 0 0 0 0 -1
T(0)
Schuster et al. Nature Biotech 18, 326 (2000)
91
Elementary Flux Modes
Aim again bring all entries of right part of
matrix to 0. E.g. 2row3 - row4 gives
reversible row with 0 in column 10 New
irreversible rows with 0 entry in column 10 by
row3 row6 and by row4 row7. In general,
linear combinations of 2 rows corresponding to
the same type of directio- nality go into the
part of the respective type in the tableau.
Combinations by different types go into the
irreversible tableau because at least 1
reaction is irreversible. Irreversible
reactions can only combined using positive
coefficients.
1 0 0 1 0 0
1 0 -1 0 2 0
1 -1 0 0 0 1
1 -2 0 2 1 -1
1 0 0 0 -1 0
1 1 0 0 0 0
1 0 1 -1 0 0
1 0 -1 1 0 0
1 0 0 0 0 -1
T(0)
1 0 0 1 0 0
1 0 -1 0 2 0
2 -1 0 0 -2 -1 3
1 0 0 0 -1 0
1 0 1 -1 0 0
1 0 -1 1 0 0
1 0 0 0 0 -1
1 1 0 0 0 0 1
1 2 0 0 2 1 -1
T(1)
Schuster et al. Nature Biotech 18, 326 (2000)
92
Elementary Flux Modes
Aim zero column 11. Include all possible
(direction-wise allowed) linear combinations of
rows. continue with columns 12-14.
1 0 0 1 0 0
1 0 -1 0 2 0
2 -1 0 0 -2 -1 3
1 0 0 0 -1 0
1 0 1 -1 0 0
1 0 -1 1 0 0
1 0 0 0 0 -1
1 1 0 0 0 0 1
1 2 0 0 2 1 -1
T(1)
1 0 0 1 0 0
2 -1 0 0 -2 -1 3
1 0 0 0 -1 0
1 0 0 0 0 -1
1 1 0 0 0 0 1
1 2 0 0 2 1 -1
1 1 0 0 -1 2 0
-1 1 0 0 1 -2 0
1 1 0 0 0 0 0
T(2)
Schuster et al. Nature Biotech 18, 326 (2000)
93
Elementary Flux Modes
In the course of the algorithm, one must avoid -
calculation of nonelementary modes (rows that
contain fewer zeros than the row already
present) - duplicate modes (a pair of rows is
only combined if it fulfills the condition
S(mi(j)) ? S(mk(j)) ? S(ml(j1)) where
S(ml(j1)) is the set of positions of 0 in this
row. - flux modes violating the sign restriction
for the irreversible reactions. Final
tableau T(5) This shows that the
number of rows may decrease or increase in the
course of the algorithm. All constructed
elementary modes are irreversible.
1 1 0 0 2 0 1 0 0 0 ... ... 0
-2 0 1 1 1 3 0 0 0 ... ...
0 2 1 1 5 3 2 0 0
0 0 1 0 0 1 0 0 1
5 1 4 -2 0 0 1 0 6
-5 -1 2 2 0 6 0 1 0 ... ...
0 0 0 0 0 0 1 1 0 0 ... ... 0
Schuster et al. Nature Biotech 18, 326 (2000)
94
Two approaches for Metabolic Pathway Analysis?
The pathway P(v) is an elementary flux mode if it
fulfills conditions C1 C3. (C1) Pseudo
steady-state. S ? e 0. This ensures that none
of the metabolites is consumed or produced in the
overall stoichiometry. (C2) Feasibility rate ei
? 0 if reaction is irreversible. This demands
that only thermodynamically realizable fluxes are
contained in e. (C3) Non-decomposability there
is no vector v (unequal to the zero vector and to
e) fulfilling C1 and C2 and that P(v) is a proper
subset of P(e). This is the core characteristics
for EFMs and EPs and supplies the decomposition
of the network into smallest units (able to hold
the network in steady state). C3 is often called
genetic independence because it implies that
the enzymes in one EFM or EP are not a subset of
the enzymes from another EFM or EP. Klamt
Stelling Trends Biotech 21, 64 (2003)
95
Two approaches for Metabolic Pathway Analysis?
The pathway P(e) is an extreme pathway if it
fulfills conditions C1 C3 AND conditions C4
C5. (C4) Network reconfiguration Each reaction
must be classified either as exchange flux or as
internal reaction. All reversible internal
reactions must be split up into two separate,
irreversible reactions (forward and backward
reaction). (C5) Systemic independence the set
of EPs in a network is the minimal set of EFMs
that can describe all feasible steady-state flux
distributions. Klamt Stelling Trends
Biotech 21, 64 (2003)
96
Two approaches for Metabolic Pathway Analysis?
A(ext)
B(ext)
C(ext)
R1
R2
R3
B
R4
R8
R7
R5
A
C
P
R9
D
R6
Klamt Stelling Trends Biotech 21, 64 (2003)
97
Reconfigured Network
A(ext)
B(ext)
C(ext)
R1
R2
R3
B
R4
R8
R7b
R7f
A
C
P
R5
R9
D
R6
3 EFMs are not systemically independent EFM1
EP4 EP5 EFM2 EP3 EP5 EFM4 EP2 EP3
Klamt Stelling Trends Biotech 21, 64 (2003)
98
Property 1 of EFMs
The only difference in the set of EFMs emerging
upon reconfiguration consists in the two-cycles
that result from splitting up reversible
reactions. However, two-cycles are not considered
as meaningful pathways. Valid for any network
Property 1 Reconfiguring a network by splitting
up reversible reactions leads to the same set of
meaningful EFMs.
Klamt Stelling Trends Biotech 21, 64 (2003)
99
Software FluxAnalyzer
What is the consequence of when all exchange
fluxes (and hence all reactions in the network)
are irreversible?
EFMs and EPs always co-incide!
Klamt Stelling Trends Biotech 21, 64 (2003)
100
Property 2 of EFMs
Property 2 If all exchange reactions in a network
are irreversible then the sets of meaningful EFMs
(both in the original and in the reconfigured
network) and EPs coincide.
Klamt Stelling Trends Biotech 21, 64 (2003)
101
Reconfigured Network
A(ext)
B(ext)
C(ext)
R1
R2
R3
B
R4
R8
R7b
R7f
A
C
P
R5
R9
D
R6
3 EFMs are not systemically independent EFM1
EP4 EP5 EFM2 EP3 EP5 EFM4 EP2 EP3
Klamt Stelling Trends Biotech 21, 64 (2003)
102
Comparison of EFMs and EPs
Problem EFM (network N1) EP (network
N2) Recognition of 4 genetically indepen- Set
of EPs does not contain operational modes dent
routes all genetically independent routes for
converting (EFM1-EFM4) routes. Searching for
EPs exclusively A to P. leading from A to P
via B, no pathway would be
found. Interpret the property of the EFM and EP
network to recognize operational modes, finding
all the routes, ... In the exam I would give you
the desired property and ask you for the
interpretation.
Klamt Stelling Trends Biotech 21, 64 (2003)
103
Comparison of EFMs and EPs
Problem EFM (network N1) EP (network
N2) Finding all the EFM1 and EFM2 are One
would only find the optimal routes optimal
because they suboptimal EP1, not the optimal
pathways for yield one mole P per optimal routes
EFM1 and synthesizing P during mole substrate
A EFM2. growth on A alone. (i.e. R3/R1
1), whereas EFM3 and EFM4 are only
sub- optimal (R3/R1 0.5).
Klamt Stelling Trends Biotech 21, 64 (2003)
104
Comparison of EFMs and EPs
EFM (network N1) 4 pathways convert A to P
(EFM1-EFM4), whereas for B only one route (EFM8)
exists. When one of the internal reactions
(R4-R9) fails, for production of P from A 2
pathways will always survive. By contrast,
removing reaction R8 already stops the production
of P from B alone.
EFM (network N1) Only 1 EP exists for producing
P by substrate A alone, and 1 EP for synthesizing
P by (only) substrate B. One might suggest that
both substrates possess the same redundancy of
pathways, but as shown by EFM analysis, growth on
substrate A is much more flexible than on B.
Problem Analysis of network flexibility
(structural robustness, redundancy) relative
robustness of exclusive growth on A or B.
Klamt Stelling Trends Biotech 21, 64 (2003)
105
Comparison of EFMs and EPs
EFM (network N1) R8 is essential for producing P
by substrate B, whereas for A there is no
structurally favored reaction (R4-R9 all occur
twice in EFM1-EFM4). However, considering the
optimal modes EFM1, EFM2, one recognizes the
importance of R8 also for growth on A.
EFM (network N1) Consider again biosynthesis of
P from substrate A (EP1 only). Because R8 is not
involved in EP1 one might think that this
reaction is not important for synthesizing P from
A. However, without this reaction, it is
impossible to obtain optimal yields (1 P per A
EFM1 and EFM2).
Problem Relative importance of single
reactions relative importance of reaction R8.
Klamt Stelling Trends Biotech 21, 64 (2003)
106
Comparison of EFMs and EPs
EFM (network N1) R6 and R9 are an enzyme subset.
By contrast, R6 and R9 never occur together with
R8 in an EFM. Thus (R6,R8) and (R8,R9) are
excluding reaction pairs. (In an arbitrary
composable steady-state flux distribution they
might occur together.)
EFM (network N1) The EPs pretend R4 and R8 to be
an excluding reaction pair but they are not
(EFM2). The enzyme subsets would be correctly
identified. However, one can construct simple
examples where the EPs would also pretend wrong
enzyme subsets (not shown).
Problem Enzyme subsets and excluding reaction
pairs suggest regulatory structures or rules.
Klamt Stelling Trends Biotech 21, 64 (2003)
107
Comparison of EFMs and EPs
EFM (network N1) The shortest pathway from A to
P needs 2 internal reactions (EFM2), the longest
4 (EFM4).
EFM (network N1) Both the shortest (EFM2) and
the longest (EFM4) pathway f
Write a Comment
User Comments (0)
About PowerShow.com