DataBased Mechanistic Modelling - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

DataBased Mechanistic Modelling

Description:

Research carried out by Stuart Parkinson and Peter Young (1996) ... Research by Peter Young and Andrew Jarvis (September 2004) ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 62
Provided by: peter557
Category:

less

Transcript and Presenter's Notes

Title: DataBased Mechanistic Modelling


1
Dominant Mode Analysis Simplicity Out of
Complexity
Peter Young Centre for Research on Environmental
Systems and Statistics Lancaster
University and Centre for Resource and
Environmental Studies Australian National
University Prepared for SAMO Summer School,
Venice, 2004
  • Data-Based Mechanistic Modelling
  • Dominant Mode Analysis and Modelling
  • Global Climate Examples
  • Conclusions
  • A simple illustrative simulation example

2
Data-Based Mechanistic (DBM) Modelling
  • An Inductive Approach to Modelling Stochastic,
    Dynamic Systems

3
Data-Based Mechanistic (DBM) Modelling
Data-Based Mechanistic (DBM) modelling has been
developed at Lancaster over the past twenty
years but is derived from my research dating
back to the early 1970s.
  • The general approach of DBM modelling is as
    follows
  • Define the objectives of the modelling exercise.
  • Prior to the acquisition of data (or in
    situations of insufficient data), develop a
    simulation model that satisfies the defined
    objectives. Then, evaluate the sensitivity of the
    model to uncertainty and develop a reduced order,
    dominant mode model that captures the most
    important aspects of its dynamic behaviour.
  • When sufficient data become available identify
    and estimate from these data a parsimonious,
    dominant mode, stochastic model that again
    satisfies the defined objectives reflects the
    information content in the data and can be
    interpreted in physically meaningful terms.
  • Attempt to reconcile the models obtained in 1.
    and 2.

Note not all stages in this modelling process
will be required this will depend on the nature
of the problem the available date and the
modelling objectives.
4
The DBM Approach to Modelling When Data are
Available
  • The most parametrically efficient (parsimonious,
    identifiable) model structure is first inferred
    statistically from the available time series data
    in an inductive manner, based on a generic class
    of dynamic models.
  • After this initial inductive modelling stage is
    complete, the model is interpreted in a
    physically meaningful, mechanistic manner based
    on the nature of the system under study and the
    physical, chemical, biological or socio-economic
    laws that are most likely to control its
    behaviour.
  • This inductive approach can be contrasted with
    the alternative hypothetico-deductive modelling
    approach (including simple grey-box modelling),
    where the physically meaningful model structure
    is based on prior assumptions and the parameters
    that characterise this simplified structure are
    estimated from data only after this structure has
    been specified by the modeller.
  • By delaying the mechanistic interpretation of
    the model in this manner, the DBM modeller
    avoids the temptation to attach too much
    importance to prior, subjective judgement when
    formulating the model equations.

5
Methods of Recursive Time Series Analysis
  • The methodological tools used in DBM modelling
    are largely based
  • on recursive methods (Gauss, 1826 see Young,
    1984) of time series
  • analysis that perform various tasks
  • In the case of time-invariant parameter
    (stationary) models, the statistical
    identification of the Transfer Function model (an
    ODE model in the continuous-time case) structure
    and the optimal estimation of the constant
    parameters that characterise this structure.
  • In the case of nonstationary and nonlinear
    models, the estimation of Time Variable (TVP) or
    State-Dependent (SDP) Parameters that,
    respectively, characterise the model in these
    cases.
  • On-line forecasting and data-assimilation
    (Bayesian recursive estimation).

In addition, MCS-based tools for uncertainty and
sensitivity analysis are used to evaluate the
implications of uncertainty in (i) derived
physically meaningful parameters (e.g. time
constants, residence times, decay rates,
partition percentages, etc. ) and (ii) model
predictions.
6
Dominant Mode Analysis and Modelling
7
Dominant Mode Analysis (DMA) for Linear Systems
The Standard Reduced Order Model Forms

In DMA, the primary identification and estimation
tools in the case of linear systems are (i) the
Refined Instrumental Variable (RIV) estimation
algorithm for discrete-time systems (ii)
the Refined Instrumental Variable for
Continuous-time(RIVC) estimation algorithm for
systems described by continuous-time TFs or
differential equations
8
Dominant Mode Analysis (DMA) for Time variable
Parameter (TVP) and Nonlinear Systems

In the case of TVP systems, the recursive forms
of the RIV and RIVC algorithms, together with
Fixed Interval Smoothing (FIS) versions of these
algorithms, provide the main engines for
identification and estimation In the case of
nonlinear systems, related State Dependent
Parameter (SDP) estimation is used for
identification and estimation, again exploiting
FIS where is the state variable on
which the associated parameter is identified to
be dependent.
9
Modes of Dynamic Behaviour
  • The dynamic behaviour of a TF (i.e. a
    differential equation or its discrete-time
    equivalent) is controlled primarily by its
    dynamic modes. These are defined by the
    eigenvalues (which in turn define time constants
    or natural frequencies) and steady state gains of
    the TF. For instance, a first order, continuous
    time TF can be expressed in the form

  • or
  • where is the SSG and is the time
    constant (or residence time).
  • In the case of nth order processes, the TF can
    be decomposed into n modes, either of this type
    or second order, defining oscillatory behaviour
    (complex eigenvalues).
  • A (normally small) number of these modes will
    dominate the response - these are the Dominant
    Modes.

10
The Identification, Estimation and Validation
Procedures for DMA
  • Identification and estimation of the reduced
    order, dominant mode model is based on simulated
    data obtained from planned experiments performed
    on the large model e.g. applying special inputs,
    such as impulse, step and PRBS excitation or
    using historical measured inputs from the system.
  • It exploits the tools outlined in previous
    slides, together with model structure/order order
    identification statistics, such as the
    Coefficient of Determination based on the
    simulated response error or the YIC and AIC
    statistics that indicate when the model order is
    greater than can be justified by the information
    content of the data.
  • Validation is based on how well the model can
    mimic the response of the high order model when
    the input variable is different from that used
    for identification and estimation.


11
A Comment on Estimation
  • Refined Instrumental Variable (RIV) estimation
    is NOT linear least squares regression in fact,
    methods of dynamic system estimation based on
    linear least squares estimation are very bad for
    DBM modelling and DMA.

  • For example, in the single input, single output
    case, it is based on the Transfer Function (TF)
    model
  • where is a fairly general noise process (see
    next slide)
  • Estimation of the parameters in this model is
    clearly a nonlinear estimation problem.

12
A Comment on Estimation - continued
  • The best known methods of estimation for TF
    models are the Maximum Likelihood (ML) methods,
    such as Box-Jenkins ML method or the Prediction
    Error Minimization (PEM) method in Matlab. These
    assume that the noise has rational spectral
    density i.e.
  • where is a zero mean, serially uncorrelated
    sequence of random variables white noise.
  • A powerful advantage of optimal instrumental
    variable estimation for TF models, in relation to
    these other methods, is that, while it is optimal
    (ML) when the noise has rational spectral
    density, it produces consistent, asymptotically
    unbiased parameter estimates even if the noise
    does not have these properties, provided only
    that it is uncorrelated with .


13
Initial Simulation Example
14
The Simulation Experiment

This initial simulation example is based on a
26th order, discrete-time, linear system with one
sample pure time delay and consists of 26 first
order processes in series, each with unity gain
and lag coefficient ranging from 0.1 in steps of
0.02 to 0.6 (e.g. time constant (residence times)
from 0.43 to 1.96 days). In this sense, it could
represent a lag and route model of flow in a
long river or an ADZ model of solute transport,
also in a long river. Since the model is an all
pole TF (i.e. the numerator TF is a scalar), the
SRIV identification stage in the analysis is
constrained to such models i.e., it considers
models of the general form with
. The input signal is a unit
impulse and there is no additive noise.
15
DMA Identification Results for Unconstrained
Models

A selection of the identification
results which show that the 7 1 2
model is clearly selected. All higher order
models were rejected by the RIV algorithm. The
estimation and validation results for this model
are shown on the next slide.
16
Initial Simulation Example Estimation and
Validation of an Unconstrained 7 1 2 Dominant
Mode Model
17
Constrained Reduced Order Model Estimation
However, the 7 1 2 model has complex roots. So
it is necessary to constrain the estimation to
have real roots, as in the high order simulated
system. Now the constrained estimated 7 1 2
model does not explain the data nearly so well
and it is necessary to search again over a wider
range of models. The best model then has the
following, very special 13th order form with all
poles identical (the Nash Cascade of hydrology,
characterized by only three parameters) where

. For higher order models of this
special form, the model fit deteriorates as the
identifiability becomes increasingly poor
(condition number for 26th order model is
2.74x108 and models with non-identical
eigenvalues are not identifiable at all)

18
Initial Simulation Example Conclusions
  • It is not possible to identify and estimate the
    high order simulation model. Without constraints,
    the SRIV algorithm finds lower order models that
    explain the data very well indeed and model
    orders greater than 9 are not identifiable and
    are rejected.
  • While the unconstrained models are validated
    well and are excellent predictive tools, they
    have complex eigenvalues, so they are not
    physically meaningful and so unacceptable from a
    DBM standpoint.
  • The only model that is estimated well and has
    real eigenvalues is the 13th order Nash cascade
    type with all identical eigenvalues (and only 3
    parameters).
  • So it is clearly impossible to identify and
    estimate the original 26th, non-identical
    eigenvalue system from the experimental data,
    even though these are completely noise-free in
    this experiment.
  • But the reduced order, dominant mode models
    mimic its behaviour very well.


19
Example 2 Uncertainty and Dominant Mode Analysis
of the Enting-Lassey Global Carbon Cycle Model
Analysis I
  • Research carried out by Stuart Parkinson and
    Peter Young (1996)
  • (based on a Simulink version of a E-L simulation
    model developed by Stuart Parkinson)

20
Globally Averaged Carbon Cycle Climate Data
21
The Enting-Lassey Global Carbon Cycle Model
22
Comparison of Uncertainty Bounds for
Deterministic Model Stochastic Uncertainty
Analysis Using MCS
23
Identification and Estimation of the Reduced
order Model

Here, identification and estimation of the
reduced order model is based on the impulse
response of the EL model. This can be in
discrete-time, when the TF model takes the
form or continuous-time, when the TF model
takes the form This can be written in the
differential equation form
24
Estimation Comparison of Reduced 4th Order and
23rd Order EL Model Impulse Responses
25
Validation Comparison of Reduced 4th Order and
23rd Order EL Model Responses to Measured Fossil
Fuel Input
26
Parallel Pathway Decomposition of Reduced 4th
Order Model Physical Interpretation
27
MCS Analysis of Reduced Order Model Showing
Uncertainty in the Residence Time Estimates
28
Example 2 Dominant Mode Analysis of the
Enting-Lassey Global Carbon Cycle Model Analysis
II
  • Research carried out by Peter Young (2004)
  • (based on a Simulink version of a E-L simulation
    model developed by Stuart Parkinson)

29
The Simulated Data Used in the DMA Model
Reduction
30
Identification and Estimation of the Reduced
Order 4 5 0 Model Based on the Differenced
Data
31
Validation of the Reduced Order 4 5 0 Model
32
Feedback Decomposition of Reduced Order 4 5 0
Model Physical Interpretation
Carbon Dioxide Emissions
Atmospheric Carbon Dioxide
Instantaneous

Residence Time
Residence Time
Residence Time
Residence Time
33
Example 3 Dominant Mode Analysis of the Lenton
Global Carbon Cycle (GCC) Simulation Model
  • Research currently being carried out by Andrew
    Jervis, Peter Watkins and Peter Young
  • (based on a Simulink version of a GCC simulation
    model developed by Tim Lenton, CEH, Bush,
    Edinburgh)

34
Lenton Global Carbon Cycle (GCC) Simulation Top
Level of Complete Hierarchical Model
35
Lenton GCC Model, Second Level Sub-Model Inside
Green Block of Complete Model
36
Lenton GCC Model, Third Level Model Inside Green
Block of Sub-Model
37
Second Order DBM Model Estimated from the
Response of the High Order Simulation Model to an
1000 Unit Impulsive Emissions Input
38
Second Order DBM Model Mimics the Behaviour of
the High Order Simulation Model
39
Second Order DBM Model Based on the Impulse
Response Estimation Mimics the Response of the
High Order Simulation Model to the Historic
Emissions Input
40
Block Diagram and Physical Interpretation of
Second Order DBM Reduced Order Model
41
Example 4 Dominant Mode Analysis of the Hadley
HADCM3 Global Circulation Model
  • Research by Peter Young and Andrew Jarvis
    (September 2004)
  • (based on Hadley model simulation data from
    http//unfccc.int/program/mis/brazil/climate.html)

42
Identification and Estimation of the Reduced
Order Model Based on the HADCM3 Simulated Data
43

How complicated is the HADCM3 GCM?
How complicated is the HADCM3 model and long did
it take to run this simulation? I dont know but
there is a web site climateprediction.net that
allows you to participate in a Monte Carlo
experiment running the HADCM3 model. I could not
find out the approximate run time from the web
site. However, below is a comment from one
participant that I found (there were others of a
similar kind) which suggests that it is a
l-o-n-g time!
  • As long as it takes to run one of the CPDN WU's
    there definitely should be one because 6-7 weeks
    is a long time without having to reboot or shut
    your computer off for one reason or another I
    think anyway

44
Block Diagram and Physical Interpretation of
Reduced Order Model Feedback Decomposition
Relative Radiative Forcing
Forward Path
Globally Averaged Temperature Rise

Feedback Path
45
Block Diagram and Physical Interpretation of
Reduced Order Model Parallel Pathway
Decomposition
Relative Radiative Forcing
Fast Pathway
Globally Averaged Temperature Rise
SlowPathway
46
MCS Analysis of Reduced Order Model Showing
Uncertainty in the Gain and Residence Time
Estimates
Quick Pathway Characteristics
47
MCS Analysis of Reduced Order Model Showing
Uncertainty in the Gain and Residence Time
Estimates
Slow Pathway Characteristics
48
DBM Modelling from Real Data
49
DBM Modelling of the Global Carbon Cycle from
Globally Averaged Data
  • Based on recent research by Jervis and Young

50
Globally Averaged Carbon Cycle Climate Data
51
Parameterisation of the Identified Non-Parametric
SDP Model

The initial non-parametric modelling phase
suggests the following SDP differential equation
model form
where is a very small, coloured noise
process and is a pure time delay. A variety of
parameterisations of the SDP
were investigated but linear and sigmoidal
relation-ships in were found to be most
suitable. Note the pure time delay this
suggests that higher order dynamic processes are
possibly operative but their aggregative effect
is to introduce the pure time delay (a common
effect in real systems).
52
Parametric Model with External Temperature as an
Input Estimation and Validation
Note the expanded scale on error plot
53
Wigley (1993, 2000) Model Fitting Results
54
A Controlled Future?The use of DBM models in
control system design
  • Hypothetical but showing how SDP models can
    provide a vehicle for assessing future carbon
    dioxide emissions policy

55
Achieving Stabilization Concentration Profiles
  • One approach to a more discerning definition of
    emission scenarios is to consider the inverse
    problem (e.g. Wigley, 2000) i.e. compute the
    emission scenarios that are able to achieve a
    range of stabilization concentration profiles
    for atmospheric , such as those utilized in
    the studies carried out by the IPCC (Schimel, et
    al 1996).
  • An alternative and more flexible approach that
    we have used is to exploit automatic control
    theory and so generate an emissions scenario
    (control input) that achieves some required
    objective, as defined by a specified optimal
    criterion function. Here, the control system is
    designed to adjust the emissions input so that
    one of the required stabilization concentration
    profiles suggested by Wigley is followed as
    closely as possible.

56
Optimal PIP Control of Emissions for Future
Atmospheric Carbon Dioxide and Global Temperature
Stabilization Simulink Diagram
Notemost conservative linear model
57
PIP Emissions Policy for Future Atmospheric
Carbon Dioxide and Global Temperature
Stabilization using the Linear DBM Model
58
Comparison with Wigleys Stabilization Results
  • We see that the objectives have been realized
    by 2200, with an emissions scenario stabilizing
    at a level of 7.4 Gt y-1, with a 5-95
    percentile range between 6.1 and 8.5 Gt y-1 i.e.
    between 3.6 to 6 Gt y-1 higher than that computed
    by Wigley for the same profile i.e. the same
    result is achieved with a much smaller and more
    practically feasible reduction in the emissions.
    Note also that, unlike Wigley, we are able to
    provide a estimate of the uncert-ainties involved
  • We believe that this is a very important
    result. Wigleys specifications for the
    emissions would be impossible to achieve in
    practice whereas those suggested here are
    reasonably realistic (provided, of course, that
    binding international agreements are negotiated
    and maintained - which is an entirely different
    matter!)

59
Data Assimilation and Forecasting
All of the DMA models can be used for data
assimilation and forecasting Using standard or
modified version of the Kalman Filter. The
standard KF can be used for linear, near linear
or systems with only input nonlinearities. And
the SDP version of the KF can be used for
SDP-type nonlinear systems.
60
CONCLUSIONS
  • Large simulation models are products of the
    scientist's (hopefully inspired) imagination,
    most often constructed within a
    deterministic-reductionist (bottom-up') paradigm
    using a hypothetico-deductive approach to
    modelling.
  • The responses to external stimuli of such models
    (and the systems they seek to simulate) are
    dictated by the dominant modes of the system
    behaviour, which are normally few in number. As
    a result, there is only sufficient information
    in the data (simulated or real) to identify and
    estimate the parameters that characterise the
    dominant modal behaviour i.e. the large model is
    not fully identifiable from the data.
  • Data-Based Mechanistic (DBM) Top-Down' models,
    identified and estimated in relation to
    observational data (simulated or real) using an
    inductive approach, capture the essence of the
    modally-dominant mechanisms.

61
CONCLUSIONS continued
  • Large simulation models can be reduced to their
    modally dominant form by DBM model reduction
    Dominant Mode Analysis (DMA) applied to data
    obtained from experiments conducted on the
    simulation model. This can help the modeller to
    better understand the strengths and limitations
    of the simulation model and what parts of it may
    be estimated from real data.
  • Often, the DBM models obtained from either
    simulated or real data will not have sufficient
    fine detail to allow for what-if' studies.
    However, such detail can be added to the model in
    the same way that it would be in the synthesis of
    a simulation model (e.g. adding a wier to a DBM
    flow-routing model).
  • All of the analyses reported here were carried
    out using our MatlabToolbox CAPTAIN
    http//www.es.lancs.ac.uk/cres/captain/.
Write a Comment
User Comments (0)
About PowerShow.com