Title: Application of MCMC in the Bayesian Inference of Metabolic Pathways
1Application of MCMC in the Bayesian Inference of
Metabolic Pathways
2Bayesian parameter estimation
Using Bayesian inference, we estimate the
parameters in a principled manner that
incorporates these uncertainties, and that can
show the credible interval of the estimation.
Have incomplete kinetics/ equations of the
systems
3Posterior distribution
MCMC is used to estimate this distribution
4Brief Summary of MCMC
Postulates The level of happiness H is related
to the income I by the formula
Bayus, an economist, sharing the same ancestor
with Bayes
He was asked to estimate the average happiness
of people in USA given a density function of
income
Using his theory, he only needs to compute
5Metropolis-Hastings Algorithm
- Initialize I0, set t0
- For t0,1,2,3,, do
- Sample y from a proposal distribution q(.It)
- Sample u from a uniform distribution on (0,1)
- If where then set It1y otherwise set
It1It
6The happiness level is estimated using
Three simulations give the estimate 6.1345,
6.1315, 6.1183. Hence, he inferred that people
in USA are, in average, happy.
7Common Approach of Parameter Estimation
Find parameter values which minimize an objective
function, typically, the magnitude of the
estimation error with some constraints.
This can give many different solutions.
Vmax of yeast glycolysis pathways from Pritchard
and Kell (2002)
Vmax of yeast glycolysis pathways from Pritchard
and Kell (2002)
Vmax of yeast glycolysis pathways from Wilkinson,
Benson, Kell (2007)
Vmax of yeast glycolysis pathways from Wilkinson,
Benson, Kell (2007)
8Which One Should Be Used?
Using three different parameter set from these
papers, MCA control coefficients give a
conflicting result.
9Metabolic Pathways
- Let the kinetic equations be given by
- Denote by xss, the steady state value which
depends on Vmax and x0. - We can define a mappingwith a convention that
yss 8 if no steady-state value exists for some
Vmax and x0.
10Experimental Data
- Assume that the measurement data are corrupted by
Gaussian noise with zero mean and variance s. - Given a measured datum z, it is reasonable to
assume thatwhere w is the measurement noise.
11Conditional Distribution
- Now, given a measured datum z, prior distribution
of Vmax and x0, we are interested to get the
conditional distributionwhere - In Bayesian terms, the posterior distribution
p(Vmaxz) has the likelihood function
12Glycolysis example
Consider again the glycolysis metabolic pathway
Simulate the systems with known parameters and
evaluate 30 steady-state samples with different
initial states and with measurement noise.
13Histogram of Posterior distribution
14Median and Credible Interval
15(No Transcript)
16(No Transcript)
17Perturbation Analysis
Perturbed
Normal
18Comparing the posterior distributions
From both figures, it can be expected that
19Further examples
- Five different scenarios are considered.
20(No Transcript)
21Simulation setup
- We use simulated data (30 steady-state samples
for each case). - Assume measurement of internal glucose, ATP, G6P,
ADP, F6P, F16BP, AMP, DHAP, GAP, NAD, BPG, NADH,
P3G, P2G, PEP, PYR, acetaldehyde and the fluxes
of glucose, glycerol, succinate, pyruvate,
glycogen, trehalose.
22Case A and Case B
23Case A and Case C
24Case A and Case D
25Case A and Case E
26Lactic Acid Bacteria
- Takenfrom Hoefnagel et.al. (2002)
- It has 13 limitingrate constants and 12 species
- It contains experimental data from the normal
case, LDH-knockout (reacn 2) and
NOX-overexpression (reacn13). Reacn 10 is acetoin
efflux
27Normal and LDH-knockout Case
28Normal and NOX-overexpression Case
29End
30Markov Chain Monte Carlo analysis
- Given a probability density function p, it is
generally difficult to compute - The Markov Chain Monte Carlo can be used to
estimate this quantity. - Monte Carlo integration draws N samples from p
and estimate by
31Markov Chains
- Suppose a sequence of random variables X0, X1,
X2, where the next state Xt1 is sampled from
a distribution P(Xt1Xt). - Subject to regularity conditions, the
distribution of Xt given X0, Pt(XtX0), will
forget its initial state and Pt(.X0) converges
to a unique stationary distribution p.
32- To ensure that the MCMC samples approximate the
target distribution, the residual effect from
initial conditions should be reduced by deleting
the first few samples. This is called burn-in. - To get less correlated samples, a thinning
procedure of size M can be used by taking only
the samples at every M step.
33Convergence assessment
- One can take one simulation with very large MCMC
samples to ensure the convergence to target
distribution. - Or several parallel simulations with different
initial conditions are run and use a convergence
measure to check when to stop (see also Gilks
et.al. (1996) and Gelman (1996)).
34- The value p(z) is a normalizing constant which
implies - Metropolis-Hastings algorithm is used to draw
samples from p(Vmaxz,x0). By marginalizing the
sequences over x0 we get the samples from
p(Vmaxz).
35Perturbation Analysis
- Let be the conditional
distribution for the untreated (wild-type)
organism. - Let be the conditional
distribution for the drug-treated (mutant)
organism.
36- Perturbation effect is inferred by finding Vmax
that has changed from the normal case. - One alternative is to calculate
- In most cases, it is difficult to evaluate this.
37- Using MCMC, we can draw samples
andfrom both cases and approximate the
quantity by where is an indicator function
given by
38MCA-based Perturbation Analysis
- The control coefficient can describe a
small increment to variable A (flux or
concentration) due to a small increment in the
i-th steady-state rate. - Hence, the vector defines the direction to which
all variables of interest moves due to an
increment in the i-th rate.
39MCA-based Perturbation Analysis
- Let (Aj,normal)j1M and (Aj,perturbed)j1M
denote the steady-state measurement data in
normal and perturbed case. - Then the value would describe the contribution
of i-th rate to the changes in measurement data,
where .
40Extended simulations onGlycolysis
- Taken from Teusink (2000).
- It has 14 limitingrate constants and 17 species.
41Perturbation Analysis in Metabolic Pathways
- There are various conditions which makes it
appealing to adopt Bayesian inference - There are uncertainties in the system equations
and parameters - Initial concentrations are difficult to be
observed - Measurements are prone to systematic errors
- Setting up the mathematical problem.
42Monte Carlo Integration