Title: GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH DIMENSIONAL MODEL REPRESENTATION (RS-HDMR) Herschel Rabitz Department of Chemistry, Princeton University, Princeton, New Jersey 08544
1GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING -
HIGH DIMENSIONAL MODELREPRESENTATION (RS-HDMR)
Herschel RabitzDepartment of Chemistry,
Princeton University,Princeton, New Jersey
08544
2HDMR Methodology
- HDMR expresses a system output as a hierarchical
correlated function expansion of inputs
3HDMR Methodology (Contd.)
- HDMR component functions are optimally defined
as - where
are unconditional and conditional probability
density functions
4RS (Random Sampling) HDMR (Contd.)
- RS-HDMR component functions are approximated by
expansions of orthonormal polynomials -
- Inputs can be sampled independently and/or in a
correlated fashion - Only one set of data is needed to determine all
of the component functions - Statistical analysis (F-test) is used proper
truncation of RS-HDMR expansion
5Global Sensitivity Analysis by RS-HDMR
- Individual RS-HDMR component functions have a
direct statistical correlation interpretation,
which permits the model output variance to be
decomposed into its input contributions - Where are defined as the
covariances of
- with f(x),
respectively
6A Propellant Ignition Model
Calculated profiles of temperature and major mole
fractions for the ignition and combustion of the
M10 solid propellant
7A Propellant Ignition Model
- 10 independent and 44 cooperative contributions
of inputs were identified as significant
8A Propellant Ignition Model
- Nonlinear global sensitivity indexes efficiently
identified all significant contributions of inputs
9Trichloroethylene (TCE) Microenvironmental/Pharmac
okinetic Modeling
Microenvironmental/exposure/dose modeling system
Structure of TCE-PBPK model (adapted from Fisher
et. al., 1998)
10Example Trichloroethylene (TCE)
Microenvironmental/Pharmacokinetic Modeling
- The coupled microenvironmental/pharmacokinetic
model - Three exposure routes (inhalation, ingestion, and
dermal absorption) - Release of TCE from water into the air within the
residence - Activities of individuals and physiological
uptake processes - Seven input variables age (x1), tap water
concentration (x2), shower stall volume (x3),
drinking water consumption rate (x4), shower flow
rate (x5), shower time (x6), time in bathroom
after shower (x7) are used to construct the
RS-HDMR orthonormal polynomials - Target outputs the total internal doses from
intake (inhalation and ingestion) and uptake
(dermal absorption) - The amount inhaled or ingested
- The amount absorbed
- C(t) exposure concentration, IR(t) inhalation
or ingestion rate, Kp permeability coefficient,
SA(t) surface area exposed
11Trichloroethylene (TCE) Microenvironmental/Pharmac
okinetic Modeling
- Inputs (x1, x2, x3, x4) have a uniform
distribution, and inputs (x5, x6, x7) have a
triangular distribution 10,000 input-output data
were generated
The data distributions for the uniformly
distributed variable x1 and the triangularly
distributed variable x5
12Trichloroethylene (TCE) Microenvironmental/Pharmac
okinetic Modeling
- Seven independent, fifteen 2nd order and one 3rd
order cooperative contributions of inputs were
identified as significant
First order sensitivity indexes
13Trichloroethylene (TCE) Microenvironmental/Pharmac
okinetic Modeling
- Nonlinear global sensitivity indexes (2nd order
and above) efficiently identified all significant
contributions of inputs
The ten largest 2nd and 3rd order sensitivity
indexes
14Identification of bionetwork model parameters
- Characteristics of the problem
- System nonlinearity
- Limited number type of experiments
- Considerable biological and measurement noise
Multiple solutions exist !
- Problems with traditional identification methods
- Provide only one or a few solutions for each
parameter - Assume linear propagation from data noise to
- parameter uncertainties
- The closed-loop identification protocol (CLIP)
- Extract the full parameter distribution by
global identification - Iteratively look for the most informative
experiments for - minimizing parameter uncertainty
15General operation of CLIP
Pre-lab analysis and design of the most
informative experiments
Iterative experiment optimization and data
acquisition
Global parameter identification
16Isoleucyl-tRNA synthetase proofreading
valyl-tRNAIle
Rate constants to be identified
Okamoto and Savageau, Biochemistry, 231701-1709
(1984)
17The inversion module identifying the rate
constant distribution
- The Genetic Algorithm (GA)
- Mutation
- 1101 11111100 0010
- 1101 11011100 0110
- Crossover
- 1101 1100 1111 0010
- 1101 0010 1111 1100
The inversion cost function
Typical rate constant distribution after random
perturbation/control
Q
Inversion quality index Q
18- The analysis module estimating the most
informative experiments - Estimate the best species for monitoring system
behavior - Determine the best species for perturbing the
system - Nonlinear sensitivity analysis by
Random-Sampling High Dimensional Model
Representation (RS-HDMR)
19Optimally controlled identification squeezing on
the rate constant distribution
- The control cost function
Inversion quality
Non-
Feng and Rabitz, Biophys. J., 861270-1281
(2004) Feng, Rabitz, Turinici, and LeBris, J.
Phys. Chem. A, 1107755-7762 (2006)
20- Network property optimization
- Identifying the best targeted
- network locations for intervention
- B. Identifying the optimal network control
Observed Response
Biological System
Learning Algorithm
Control Objective
Control Design
Optimal Network Performance
Optimal Controls
Initial Guess/ Random Control
21A. Molecular target identification for network
engineering
Random-sampling high dimensional model
representation (RS-HDMR)
Randomly sample k
- Advantages of RS-HDMR
- Global sensitivity analysis
- Nonlinear component functions
- Physically meaningful representation
- Favorable scalability
Li, Rosenthal, and Rabitz, J. Phys. Chem. A,
1057765-7777 (2001)
22Laboratory data on the mutants
k10 - k13 fixed
k6 fixed
k6
k10 - k13
Feng, Hooshangi, Chen, Li, Weiss, and Rabitz,
Biophys. J., 872195-2202 (2004)
23Example Biochemical multi-component formulation
mapping
- Allosteric regulation of aspartate
transcarbamoylase (ATcase) in vitro by all four
ribonucleotide triphosphates (NTPs) - ATcase activity (output) was measured for 300
random NTP concentration combinations (inputs) in
the laboratory - A second order RS-HDMR as an input -gt output map
was constructed. Its accuracy is comparable with
the laboratory error
The absolute error of repeated measurements
24Biochemical multi-component formulation mapping
The comparison of the laboratory data and the 2nd
order RS-HDMR approximation for used and
test data
Note The two parallel lines are absolute error
0.2
25The s-space network identification procedure
(SNIP)
Laboratory data on the transcriptional cascade
aTc x1 IPTG x2 EYFP y(x1,x2)
Encode x1?x1m1(s) x2?x2m2(s)
Response measurement y?y(s)
Decode Fourier transform
26Nonlinear property prediction by SNIP
Unmeasured region correctly predicted
Nonlinear, cooperative behavior revealed
Feng, Nichols, Mitra, Hooshangi, Weiss, and
Rabitz, In preparation
27SNIP application to an intracellular signaling
network
Laboratory single cell measurement data
Sachs, et al., Science, 308523-529 (2005)
28Identified network with predictive capability
Network connections identified by SNIP and
Bayesian analysis
Reliable SNIP prediction of Akt levels
29Example Ionospheric measured data
- The ionospheric critical frequencies determined
from ground-based ionosonde measurements at
Huancayo, Peru from years 1957 - 1987 (8694
points) - Input year, day, solar flux (f10.7), magnetic
activity index (kp), geomagnetic field index
(dst), previous day's value of foE - Output ionospheric critical frequencies foE
- The inputs are not controllable and not
independent the pdf of the inputs is not
separable, and was not explicitly known
30Ionospheric measured data
The dependence of foE on the input day
Ionosonde data distribution the dependences
between normalized input variables year and
f10.7, kp and dst for the data at 12 UT
31Ionospheric measured data
The accuracy of the 2nd order RS-HDMR expansion
for the output, foE
32Quantitative molecular property prediction
Standard QSAR
General strategy Molecular activity is a
function of its chemical/physical/structural
descriptors
- Problems
- Overfitting (choice of descriptors)
- Underlying physics
A simple solution yf(x1,x2), x11,2,,N1,
x21,2,,N2
Descriptor-free quantitative molecular property
interpolation
33Descriptor-free property prediction from an
arbitrary substituent order
34Property prediction from the optimal substituent
order
Cost function
Complexity of the search N1!N2!14!8!1015
Shenvi, Geremia, and Rabitz, J. Phys. Chem. A,
1072066 (2003)
35Application to a chromophore transition metal
complex library
Before reordering
After reordering
Cost function
Outliers captured by the reordering algorithm
Liang, Feng, Lowry, and Rabitz J. Phys. Chem. B,
1095842-5854 (2005)
36Application to a drug compound library
15 of data
gt14,000 compounds
Cost function
Reorder
Prediction
37THE MODERN WAY TO DO SCIENCE
Adaptively under high duty cycle and automated
You should understand the physics, write down
the correct equations, and let nature do the
calculations. Peter Debye