Title: From Data to Differential Equations
1From Data to Differential Equations
- Jim Ramsay
- McGill University
- With inspirations from
- Paul Speckman and Chong Gu
2The themes
- Differential equations are powerful tools for
modeling data. - We have new methods for estimating differential
equations directly from data. - Some examples are offered, drawn from chemical
engineering and medicine.
3Differential Equations as Models
- DIFES make explicit the relation between one or
more derivatives and the function itself. - An example is the harmonic motion equation
4Why Differential Equations?
- The behavior of a derivative is often of more
interest than the function itself, especially
over short and medium time periods. - How rapidly a system responds rather than its
level of response is often what matters. - Velocity and acceleration can reflect energy
exchange within a system. Recall equations like f
ma and e mc2.
5- Natural scientists often provide theory to
biologists and engineers in the form of DIFEs. - Many fields such as pharmacokinetics and
industrial process control routinely use DIFEs
as models. - Especially for input/output systems, and for
systems with two or more functional variables
mutually influencing each other. - DIFEs arise when feedback systems must be
developed to control the behavior of systems.
6- The solution to an mth order linear DIFE is an
m-dimensional function space, and thus the
equation can model variation over replications as
well as average behavior. - A DIFE requires that derivatives behave smoothly,
since they are linked to the function itself. - Nonlinear DIFEs can provide compact and elegant
models for systems exhibiting exceedingly complex
behavior.
7The Rössler Equations
- This nearly linear system exhibits chaotic
behavior that would be virtually impossible to
model without using a DIFE
8Stochastic DIFEs
- We can introduce randomness into DIFEs in many
ways - Random coefficient functions.
- Random forcing functions.
- Random initial, boundary, and other constraints.
- Time unfolding at a random rate.
9Deliverables
- If we can model data on functions or functional
input/output systems, we will have a modeling
tool that greatly extends the power and scope of
existing nonparametric curve-fitting techniques. - We may also get better estimates of functional
parameters and their derivatives.
10A simple input/output system
- We begin by looking at a first order DIFE for a
single output function x(t) and a single input
function u(t). (SISO) - But our goal is the linking of multiple inputs to
multiple outputs (MIMO) by linear or nonlinear
systems of arbitrary order m.
11- u(t) is often called the forcing function, and
- is an exogenous functional independent
- variable.
- Dx(t) -ß(t)x(t) is called the homogeneous
- part of the equation.
- a(t) and ß(t) are the coefficient functions
- that define the DIFE.
- The system is linear in these coefficient
- functions, and in the input u(t) and output
- x(t).
12In this simple case, an analytic solution is
possible
However, it is necessary to use numerical
methods to find the solution to most DIFES.
13A simpler constant coefficient example
- We can see more clearly what happens when
- the coefficients a and ß are constants,
- a 1, x0 0, and
- u(t) is a step function, stepping from 0 to 1 at
time t1
14- Constant a/ß is the gain in the system.
- Constant ß controls the responsivity of the
system to a change in input.
15A Real Example Lupus treatment
- Lupus is an incurable auto-immune disease that
mainly afflicts women. - It flares unpredictably, inflicting wide damage
with severe symptoms. - The treatment is prednisone, an immune system
suppressant used in transplants. - But prednisone has serious short- and long-term
side affects, and exposure to it must be
controlled.
16(No Transcript)
17How to Estimate a Differential Equation from Raw
Data
- A previous method, principal differential
analysis, first smoothed the data to get
functions x(t) and u(t), and then estimated the
coefficient functions defining the DIFE. - This two-stage procedure is inelegant and
probably inefficient. Going directly from data
to DIFE would be better.
18Profile Least Squares
- The idea is to replace the function fitting the
raw data, x(t), by the equations defining the fit
to the data conditional on the DIFE. - Then we optimize the fit with respect to only the
unknown parameters defining the DIFE itself. - The fit x(t) is defined as a by-product of the
process, but does not itself require additional
parameters.
19- This profiling process is often used in nonlinear
least squares problems where some parameters are
easily solved for given other parameters. - There we express the conditional estimates of the
these easy-to-estimate parameters as functions of
the unknown hard-to-estimate parameters, and
optimize only with respect to the hard
parameters. - This saves both computational time and degrees of
freedom. - An alternative strategy is to integrate over the
easy parameters, and optimize with respect to the
hard ones this is the M-step in the EM
algorithm.
20The DIFE as a linear differential operator
- We can re-express the first order DIFE as a
linear differential operator
More compactly, suppressing (t), and making
explicit the dependency of L on a and ß,
21Smoothing data with the operator L
- If we know the differential equation, then the
differential operator L defines a data smoother
(Heckman and Ramsay, 2000). - The fitting criterion is
The larger ? is, the more the fitting function
x(t) is forced to be a solution of the
differential equation Laßx(t) 0.
22- Let x(t) be expanded in terms of a set K basis
functions fk(t),
- Let N by K matrix Z contain the values of these
basis functions at time points ti , and - Let y be the vector of raw data.
23- Then the smooth values have the expression Zc,
- where c is the vector of coefficients.
- But these coefficients are easy parameters to
estimate given operator Laß . The expression for
them is
- We therefore remove parameter vector c by
- replacing it with the expression above.
24How to estimate L
- L is a function of weight coefficients a(t) and
ß(t). - If these have the basis function expansions
then we can optimize the profiled error sum of
squares
with respect to coefficient vectors a and b.
25- It is also a simple matter to
- constrain some coefficient functions to be zero
or a constant. - force some coefficient functions to be smooth,
employing specific linear differential operators
to smooth them towards specific target spaces.
We do this by appending penalties to SSE(a,b),
such as
where M is a linear differential operator for
penalizing the roughness of ß.
26And more
- This approach is easily generalizable to
- DIFEs and differential operators of any order.
- Multiple inputs uj(t) and outputs xi(t).
- Replicated functional data.
- Nonlinear DIFEs and operators.
27Adaptive smoothing
- We can also use this approach to have the level
of smoothing vary. We modify the differential
operator as follows
The exponent function ?(t) plays the role of a
log ? that varies with t.
28- Choosing the smoothing parameter ? is always a
delicate matter. - The right value of ? will be rather large if the
data can be well-modeled by a low-order DIFE. - But it should not so large as to smooth away
additional functional variation that may be
important. - Estimating ? by generalized cross-validation
seems to work reasonably well, at least for
providing a tentative value to be explored
further.
29A First Example
- The first example simulates replicated data where
the true curves are a set of tilted sinusoids. - The operator L is of order 4 with constant
coefficients. - How precisely can we estimate these coefficients?
- How accurately can we estimate the curves and
first two derivatives?
30- For replications i1,,N and time values j1,,n,
let
where the ciks and the eijs are N(0,1) and t
0(0.01)1. The functional variation satisfies
the differential equation
where ß0(t) ß1(t) ß3(t)0 and ß2(t) (6p)2
355.3.
31(No Transcript)
32- For simulated data with N 20 replications and
constant bases for ß0(t) ,, ß3(t), we get - L D4 best results are for ?10-10 and the
RIMSEs for derivatives 0, 1 and 2 are 0.32, 9.3
and 315.6, respectively. - L estimated best results are for ?10-5 and the
RIMSEs are 0.18, 2.8, and 49.3, respectively. - giving precision ratios of 1.8, 3.3 and 6.4,
resp. - ß2 was estimated as 353.6 whereas the true value
was 355.3. - ß3 was 0.1, with true value 0.0.
33(No Transcript)
34- In addition to better curve estimates and much
better derivative estimates, note that the
derivative RMSEs do not go wild at the end
points, which is usually a serious problem with
polynomial spline smoothing. - This is because the DIFE ties the derivatives to
the function values, and the function values are
tamed at the end points by the data.
35A decaying harmonic with a forcing function
- Data from a second order equation defining
harmonic behavior with decay, forced by a step
function, is generated by - ß0 4.04, ß1 0.4, a -2.0.
- u(t) 0, t lt 2p, u(t) 1, t 2p.
- Adding noise with std. dev. 0.2.
36(No Transcript)
37With only one replication, using minimum
generalized cross-validation to choose ?, the
results estimated for 100 trials are
Parameter True Value Mean Estimate Std. Error
ß0 4.040 4.041 0.073
ß1 0.400 0.397 0.048
a -2.000 -1.998 0.088
38An oil refinery example
- The single input is reflux flow and the output
is tray 47 level in a distillation column. - There were 194 sampling points.
- 30 B-spline basis functions were used to fit the
output, and a step function was used to model the
input.
39(No Transcript)
40- After some experimentation with first and second
order models, and with constant and varying
coefficient models, the clear conclusion seems to
be the constant coefficient model
The standard errors of ß and a in this model,
as estimated by parametric bootstrapping,
were 0.0004 and 0.0023, respectively. The delta
method yielded 0.0004 and 0.0024,
respectively. Pretty much the same.
41(No Transcript)
42Monotone smoothing
- Some constrained functions can be expressed as
DIFEs. - A smooth strictly monotone function can be
expressed as the second order DIFE
43- We can monotonically smooth data by estimating
the second order DIFE directly. - We constrain ß0(t) 0, and give ß1(t) enough
flexibility to smooth the data. - In the following artificial example, the
smoothing parameter was chosen by generalized
cross-validation. ß1(t) was expanded in terms of
13 B-splines.
44(No Transcript)
45(No Transcript)
46Analyzing the Lupus data
- Weight function ß(t) defining an order 1 DIFE for
symptoms estimated with and without prednisone
dose as a forcing function. - Weight expanded using B-splines with knots at
every observation time. - Weight a(t) for prednisone is constant.
47The forced DIFE for lupus
48The data fit
49- Adding the forcing function halved the LS fitting
criterion being minimized. - We see that the fit improves where the dose is
used to control the symptoms, but not where it is
not used. - These results are only suggestive, and much more
needs to be done. - We want to model treatment and symptom as
mutually influencing each other. This requires a
system of two differential equations.
50Summary
- We can estimate differential equations directly
from noisy data with little bias and good
precision. - This gives us a lot more modeling power,
especially for fitting input/output functional
data. - Estimates of derivatives can be much better,
relative to smoothing methods. - Functions with special properties such as
monotonicity can be fit by estimating the DIFE
that defines them.