Title: Advanced Techniques in Data Assimilation
1Advanced Techniques in Data Assimilation
Richard Ménard Meteorological Service of
Canada Department of Atmospheric and Oceanic
Sciences, McGill University
GCC Summer School, Banff. May 22-28, 2004
2Outline
- Comparing observations with models
- How do we get the error statistics
- Covariance modelling
- Fitting covariance models with innovation data
- 5. Analysis
- 6. When and why do we need advanced data
assimilation - 7. Analogy between numerical modeling and data
assimilation - 8. Kalman filtering
- 9. Computational simplification for the Kalman
filter - 10. Estimation of Q
- 11. Estimation of model error bais
- 12. Some thoughts about distinguishing model
error bias from - observation error bias
- 13. Some remarks and future direction
3Some useful references
- Daley, R., 1991 Atmospheric Data Analysis.
Cambridge University Press. - (very good textbook, but somewhat outdated)
- Daley, R., and E. Barker, 2000 NAVDAS Source
book 2000. - NRL Publication NRL/PU/7530 00 418. Naval
Research Laboratory, - Monterey, CA, 151 pp. (technical description of
an operational 3D-Var - scheme with MPI implementation)
- Ghil, M., et al. (ed). 1997 Special Issue of
the Second International - Symposium on Assimilation of Observations in
Meteorology and - Oceanography. J. Meteorol. Soc. Japan, vol.
75. (Contains a number - of review papers)
- Kasibhatla, P. et al. (ed.) Inverse Methods in
Global Biogeochemical Cycles. - (comes with a CD-ROM with Matlab exercises)
- Course notes of Saroja Polavarapu, University of
Toronto - -Course notes of Ricardo Todling Introduction to
Data Assimilation
4New books Kalnay, E., 2003 Atmospheric Modeling
Data Assimilation and Predictability.
Cambridge University Press. -Bennett, A., 2002
Inverse modeling of the Ocean and Atmosphere.
Cambridge University Press. (computer codes can
be obtained from an ftp site)
51. Comparing observations with models
In trying to validate a model formulation using
observations you have two choices 1- Compare
climate model runs, for which the sensitivity to
initial condition has disappeared. Then we
compare observation climatology with model
climatology. Problem Even if the first
moment and second moment (in practice
only the variance) of the model and observation
agrees, doesnt mean that we have
instantaneous agreement between
observation and model. 2- Attack the problem
of specifying the initial condition using
data assimilation. Problem Even if
an assimilation run stick close to the
truth, how can we disentangle whether the
differences between the forecast and
analysis is due to initial conditions
or modeling error.
62. How do we get the error statistics
2.1 Innovation method
Observation errors are spatially
uncorrelated Forecast error is the spatially
correlated part
7Obs and Forecast error variances
Mitchell et al. (1990)
Mitchell et al. (1990)
8If covariances are homogeneous, variances are
independent of space
Covariances are not homogeneous
If correlations are homogeneous, correlation
lengths are independent of location
Correlations are not homogeneous
9Correlations are not isotropic
Gustafsson (1981)
Daley (1991)
10Advantages Estimates observation error that
includes representativeness error. Provide
reasonably accurate error statistics at
observation locations and
for the observed variables. Disadvantages
Error statistics are incomplete to construct Pf
because lacking
information at all grid points and for all
variables
In practice To complete Pf (on the model grid)
we fit the forecast error
information obtained from innovations to a
forecast error correlation
model. A good fitting procedure to do this
is the maximum likelihood method
which gives unbiased estimates of the
covariance parameters. The drawback is that
most correlation models are homogeneous and
isotropic inhomogeneous
correlation models are hard to come by
and that the few tricks that have been
used to construct then does
not provide significantly superior analyses.
112.2 The lagged-forecast method (NMC method)
0
-48
-24
- Compares 24-h and 48-h forecasts valid at same
time - Provides global, multivariate corr. with full
vertical and spectral resolution - Not used for variances
- Assumes forecast differences approximate forecast
error
Why 24 48 ?
- 24-h start forecast avoids spin-up problems
- 24-h period is short enough to claim similarity
with 0-6 h forecast error. Final difference is
scaled by an empirical factor - 24-h period long enough that the forecasts are
dissimilar despite lack of data to update the
starting analysis - 0-6 h forecast differences reflect assumptions
made in OI background error covariances
12Properties (Bouttier 1994)
- For linear H, and taking the difference between
analyses (0-hour forecast) with - 6-h forecast difference, the
lagged-forecast method measures the difference - between the forecast and analysis error
covariances - NMC-method breaks down if there is no data
between launch of 2 forecasts. With no data the
lagged-forecast variance is zero! - For obs at every gridpoint, and if observation
and background error variances are equal, then
the lagged-forecast method produces estimates
that are half of the - the true value.
13- The error correlation remain close to
- the forecast error correlation for any
- ratio as long as the
observations - are dense
Advantage - Provides global error
statistics on all variables
for Pf. - Estimates of forecast
error can be given in
spectral space Disadvantages - Provides no
information on
observation error statistics.
- Only the information on correlation is
reliable provided the
the observation network is dense.
14In practice We combine both methods. The
innovation method is used to estimate the
observation error variance and the forecast error
variance for the observed variables. The
lagged-forecast method is used to estimate error
correlation for all variables on the global
domain, and the error variance for the
unobserved variables. However, since the
observation density can alter the estimated
error correlation, the lagged-forecast data
is used to fit an homogeneous isotropic
correlation model which can be compactly
represented in spectral space.
153. Covariance modelling
Positive definite matrix (Horn and Johnson 1985,
Chap 7)
A real n x n symmetric matrix A is positive
definite if for any nonzero vector c. A is said
to be positive semi-definite if
- Properties
- The sum of any positive definite matrices of the
same size is also - positive definite
- Each eigenvalue of a positive definite matrix is
a positive real number - The trace and determinant are positive real
numbers.
Covariance matrix
The covariance matrix P of a random vector X
X1, X2, , XnT is the matrix P Pij in which
and E is the mathematical expectation.
16Property A covariance matrix is positive
semi-definite
Remarks 1 - It is often necessary in data
assimilation to invert the covariance matrices,
and thus we need to have positive definite
covariances 2 The positive definite property
is global property of a matrix, and it is not
trivial to obtain
17Examples
a) Truncated parabola
for n4
eigenvalues
18b) Triangle
for n4
eigenvalues
19Covariance functions (Gaspari and Cohn 1998)
Definition 1 A function P(r,r) is a covariance
function of a random field X if
Definition 2 A covariance function P(r,r) is a
function that defines positive
semi-definite matrices when evaluated on any
grid. That is, letting ri and rj be any
two grid points, the matrix P whose
elements are Pi,j P(ri,rj) is defines a
covariance matrix, when P
is a covariance function
The equivalence between definition 1 and 2 can be
found in Wahba(1990, p1-2). The covariance
function is also known as the reproducing kernel.
Remark Suppose a covariance function is defined
in a 3D space, . Restricting the
value of r to remain on an manifold (e.g. the
surface of a unit sphere) will also define a
covariance function, and a covariance matrix
(e.g. a covariance matrix on the surface of a
sphere)
20Correlation function A correlation function
C(r,r) is a covariance function P(r,r)
normalized by the standard deviation at the
points r and r
Homogeneous and isotropic correlation function
If a correlation function is invariant under all
translation and all orthogonal transformation,
then the correlation function become only a
function of the distance between the two points,
- Smoothness properties
- The continuity at the origin determines the
continuity allowed on the rest of the - domain. For example, if the first derivative
is discontinuous at the origin, then - first derivative discontinuity is allowed
elsewhere (see example with triangle)
- Spectral representation
- On a unit circle
- where ? is the angle between the two position
vectors, and where - and all the Fourier coefficients am are
nonnegative - On a unit sphere
- where all the Legendre coefficients bm are
nonnegative.
21Examples of correlation functions
1. Spatially uncorrelated model (black)
3. Second-order auto-regressive model (SOAR)
(cyan)
2. First-order auto-regressive model (FOAR)
(blue)
4. Gaussian model (red)
where L is the correlation length scale
22Exercise Construct a correlation matrix on a
one-dimensional periodic domain
Consider a 2D Gaussian model on a plane
x
x
Along the circle of radius a
Now define a coordinate x along the circle ,
then we get
x
x
23Ways to define new covariances by transformation
24(No Transcript)
25- Examples in Gaspari and Cohn (1999)
- Hadamard product of an arbitrary
- covariance function with a compactly
- supported function is very useful in
- Ensemble Kalman filtering to create
- compactly supported covariances
26- When formulated this way
- state-dependent errors can
- be introduced in estimation
- theory for data assimilation
e.g. To create a relative error
formulation for observation error, the
relative error is scaled by the forecast
interpolated at the observation location
( not the observed value! )
274. Fitting covariance models with innovation
data Application of the maximum likelihood
method
The method of maximum likelihood applies when we
estimate a deterministic variable (no error
statistics) from noisy measurements.
The method works as follows Suppose that the
innovation (O-F) covariance depends on a number
of parameters, ,
such as the error variances, the error
correlation length scales, etc . Assuming that
the actual innovations are uncorrelated in time
and Gaussian distributed, the conditional
probability of the innovations for a given
value of the covariance parameters
28The value of ? which maximizes the probability
p(?n,,?1?) is the maximum likelihood estimate
- In practice, we take the log of p(?n,,?1?)
that is called the log-likelihood - function, L(?) , and search for its minimum.
- The curvature of the log-likelihood function is
a measure of the estimation - accuracy. A pronounced curvature indicates a
reliable estimate, and vice versa.
29example Doppler imager HRDI
Coincident measurements in two channels B
band , ? band
305. Analysis
Definition Best estimate (e.g. minimum error
variance, maximum a posteriori) of an atmospheric
model state using observations at a given time
Estimators
- Best Linear Unbiased Estimator
- Assumes that the estimate is a linear
combination of observations and model state - Assumes that the observation error is
uncorrelated with model state error - Doesnt assume any knowledge of the
probability density functions (e.g. Gaussian)
Using
We get
minimizing the analysis error variance
31? Bayesian estimators MV and MAP
Analytical expressions for p(xy) can be found
for Gaussian probabilities, Lognormal
probabilities, Alpha-stable distributions
(generalized Gaussians), Poisson
distributions. Analytical expressions may not be
obtained with a mixture of p.d.f.
- MAP estimator
This estimator is the one used in variational
data assimilation schemes, such as 3D-Var and
4D-Var.
- MV estimator min Tr(Pa)
This estimator is the one used in Kalman
filtering, OI, and Ensemble Kalman filtering.
32Properties 1- reduce the error variance 2-
interpolate the observation information to the
model grid 3- to filter out the small scale noise
Example1 One observation
Let Pf be the periodic Gaussian correlation model
of the previous example
variance
correlation
x
x
33variance
varying the correlation length scale
variance
- Two observations
The reduction of error at one point can be help
reduce the error further at neighboring points
This is what happens in 3D-Var vs. 1D-Var or
retrievals
34Analysis of positive quantities
- Even if the forecast error correlation is
everywhere positive, and that - the observed and forecast quantities are
positive, there is a possibility - that the analysis value (away from the
observation location) be - negative
Example Consider two grid points located at x1
and x2 , and a single observation yo
coinciding with grid point x1. The forecast
error between the two grid points can be written
as
where ?1 and ?2 are the error standard deviation
at the grid points 1 and 2, and ? is the error
correlation, assumed to be positive.
If is close to zero, then if the innovation
is negative , the analysis at grid
point 2 can be negative.
35Is using the log x (the logarithm of mixing
ratio of concentration) in the analysis equation
is a reasonable solution ?
The analysis in logs will produce a best
estimate of the logarithm of the concentration
but
we would have , only if x was
not a random variable (or its variance would be
very small). If w is Gaussian distributed, then
x exp w is lognormally distributed
36Letting w log x , the analysis equation is then
of the form
where the Bs are the error covariances defined
in log space. b is an observational correction,
and has different form whether we consider that
it is the measurement in physical space that
is unbiased or that it is the measurement in log
space that is unbiased.
When we consider that it is the measurement of
the physical concentration variable that is
unbiased, Cohn (1997) shows that
Transformation of mean and covariance between log
and physical quantities
37When the observations are retrievals
Retrievals are usually 1D (vertical) analyses,
using the radiance as the observation, and a
prior xp with a given error covariance Pp
averaging kernel A for the limb sounder HRDI
So in a data assimilation scheme we have
Problem !
A is usually rank deficient so unless R is full
rank, cant invert
Solution Use SVD and use singular vectors has
observations (Joiner and daSilva
1998)
386. When and why do we need advanced data
assimilation
? When the observation network is non-stationary,
e.g. LEO
MLS assimilation of ozone using the sequential
method
39- When the data coverage is sparse to increase
information - content with dynamically evolving error
covariances
Impact of HALOE observations (30 profiles per
day and ½ hour model time step)
40? To infer model parameter values
Estimates of photolysis rates during assimilation
experiment with simulated data random disturbed
photolysis rates (solid, thick) two-sigma bounds
due to specified uncertainties (dashed),
two-sigma bounds after assimilation of simulated
measurements (solid)
41 ? To infer unobserved variables. Here winds
increments are developed from ozone observations
in a 4D Var scheme, although there is no error
correlation between ozone and winds in the
background error covariance
427. Data assimilation Numerical modelling
analogy
- A data assimilation scheme can be unstable
- In data sparse areas and with wrong error
covariances - The numerical properties of the tangent linear
model and adjoint - e.g. with an advection scheme that is
limited by the courant number, - the adjoint model is described by
the flux scheme with a - Lipchitz condition (creates
numerical noise at the poles - with a grid point model)
-
- Parametrization of the error covariances
-
- Because of the size of the error
covariances (e.g. 107 x 107) they cannot - be stored so they need to be modelled
43- Initial forecast error covariance derived from
zonal monthly climatology - Retrieval error used as observation error
- No model error (assume perfect model)
44- Initial forecast error covariance derived from
zonal monthly climatology - Retrieval error used as observation error
- No model error (assume perfect model)
45- Initial forecast error covariance derived from
zonal monthly climatology - Retrieval error used as observation error
- No model error (assume perfect model)
468. Kalman filtering
In a Kalman filter, the error covariance is
dynamically evolved between the analyses, so
that both the state vector and the error
covariance are updated by the dynamics and by the
observations.
? Prediction
? Analysis
xf,a
tn
observations
Pf,a
tn
47Definition The Kalman filter produces the best
estimate of the atmospheric state given all
current and past observations, and yet the
algorithm is sequential in time in a form of a
predictor-corrector scheme.
From a Bayesian point of view the Kalman filter
constructs an estimate based on
Time sequential property of a Kalman filter is
not easy to show
Example Estimating a scalar constant using no
prior, and assuming white noise
observation error with constant error variance.
The MV estimate can be shown to be the
simple time averaging
and which can be re-written in a sequential scheme
48Prediction of errors
M is a model of the atmosphere and which
expresses our theoretical understanding, based on
physical laws of dynamics, radiation, etc
Use the same model to represent the evolution of
the true state xt
where is called the model error (or
modelling error).
Linearizing the model about the analysis state,
i.e.
It is usually assumed that the model error is
white, or equivalently that the forecast error
can be represented as a first-order Markov
process, in which case we can show that
49Results from the assimilation of CH4 observations
from UARS Menard and Chang (2000)
Kalman filter
OI and other sub-optimal schemes
Forecast error no assimilation
observation error
observation error
analysis error no assimilation
OI
KF analysis error
variance evolving
The accumulation of information over time is
limited by model error, so that in practice we
need to integrate a Kalman filter only over some
finite time to obtain the optimal solution
508.2 Asymptotic stability and observability
? We say that we have observability when it is
possible for a data assimilation system with a
perfect model and perfect observations to
determine a unique initial state from a finite
time sequence of observations.
L
Example one-dimensional advection
over a periodic domain
x
continuous observations at a single point x0
over a time TL/U determines uniquely the initial
condition ?0(x) ?(x,0)
0
t
Theorem. If an assimilation system is
observable and the model is imperfect,
, then the sequence of forecast error
covariance converges geometrically to
which is independent of
51Information about error covariances is still
needed !
529. Computational simplification for the Kalman
filter
- Ensemble Kalman filtering
- Monte Carlo approach. Storage of the error
covariance is never - needed except to compute the Kalman gain
which requires the - storage of an n x p forecast error
covariance matrix. - Because of the spatial correlation of
forecast errors, the sample - size of the ensemble can be significantly
reduced, and it is - found that sample size of 100-1000 gives
reasonable results. - Reduced rank formulation
- By using a singular value decomposition of
either the model - matrix or forecast error covariance, a
number of singular modes - is needed to evolve the error covariances
if the dynamics is unstable. - Typically of the order of 100 modes may be
needed. - This formulation is not efficient for
stable dynamics problems. - Variance evolving scheme for chemical
constituents
53Evolution of the forecast-error covariance for
the advection equation
Daley, R, 1992 Forecast-error Statistics for
Homogenous Observation Networks, Mon.
Wea. Rev., 120, 627-643.
j 1
j N
j k
i 1
? Kalman filter formulation
? M is a linear advection model
i k
i N
54Evolution of the forecast-error covariance for
the advection equation
Daley, R, 1992 Forecast-error Statistics for
Homogenous Observation Networks, Mon.
Wea. Rev., 120, 627-643.
j 1
j N
j k
i 1
? Kalman filter formulation
? M is a linear advection model
i k
i N
55Evolution of the forecast-error covariance for
the advection equation
Daley, R, 1992 Forecast-error Statistics for
Homogenous Observation Networks, Mon.
Wea. Rev., 120, 627-643.
j 1
j N
j k
i 1
? Kalman filter formulation
? M is a linear advection model
i k
i N
56Evolution of the forecast-error covariance for
the advection equation
Daley, R, 1992 Forecast-error Statistics for
Homogenous Observation Networks, Mon.
Wea. Rev., 120, 627-643.
j 1
j N
j k
i 1
? Kalman filter formulation
? M is a linear advection model
i k
i N
57Evolution of the forecast-error covariance for
the advection equation
Daley, R, 1992 Forecast-error Statistics for
Homogenous Observation Networks, Mon.
Wea. Rev., 120, 627-643.
j 1
j N
j k
i 1
? Kalman filter formulation
? M is a linear advection model
i k
i N
advection of error variance
58Theory
Mass conservation
n is the number density (molecules m-3).
Mixing ratio is a conserved quantity
Lagrangian description
trajectories conservation
59Evolution of errors
Evolution of the error covariance
Covariance of mixing ratio error between a pair
of points
Multiplying
and likewise for the ?1?t?2 term, and taking
the expectation gives,
X1
X2
Error covariance is conserved (without model
error)
60Forecast error variance equation
The equation
has two characteristic equations
whose solution are of the form
Since no trajectories will cross each other, if
initially X1 and X2 coincide, then the
characteristics will be coincident at all times,
It follows that the error variance ?2(x,t)
P(x,x,t) will obey the following variance
evolution equation
61Analysis error variance Approximation by
sequential processing (Dee 2002)
- Although the KF scheme is sequential in time,
the state estimate is optimal - accounting for all observations since the
beginning of the integration - ? Observations can be processed one at a time
(sequential processing of observations)
Update the whole Pa is costly. Dee (2002)
suggest to update only the error variance (keep
the error correlation fixed) in the sequential
processing of the observations.
62Analysis error estimation an illustration
Observing 10 grid point values on a 1D grid
Approximation errors arise because correlations
are not updated
632D Kalman filter code (Parallel MPI code
Eulerian Lagrangian formulations) ftp//dao.gsfc
.nasa.gov/pub/papers/lyster/lagn_euln_hpcc_r_1_0_0
.tar.gz
3D Sequential method http//acd.ucar.edu/boris/r
esearch.htm
- The variance evolving scheme (also known as the
sequential scheme) is the - most effective method to simplify the Kalman
filter - This scheme is limited to hyperbolic problems,
but not to - univariate problems. For instance we can
address the problem of - assimilating several chemical reacting
species using the same approach - Remark Computational simplification for
parabolic problems is currently being - investigated
6410. Estimation of Q for the Kalman filter
- Chi-square is easily obtained in
- 3D Var ?2 Jmin
- PSAS ?2 ?T?
65The lagged-forecast method revisited
66Taking the average over a day, shows that the
observation contribution balances the model error
contribution
In this context, the lagged-forecast method
actually provide information about the model error
6711. Estimation of model error bias
- The transport of chemical tracer
- where L represent the advection,
- diffusion, and chemical life time,
68Measurement and Analysis
- Suppose we have only observation
- of the concentration. The measurement
- equation becomes
- where
- and the state equation becomes
- where
- Let yo be an observation, or set of
- observations at time tn and H the
- interpolation (or observation) operator
- where the superscript t denotes the
- true value (or value we are estimating),
- ?o is the observation error.
- There are two parameters that we are
- estimating on the basis of observations,
- the initial state Co and source S.
- Lets consider a state-augmented variable
69Analysis
- Even if initially there is no
- correlation between the state
- error and the source error, the
- effect of transport create this
- coupling
so
- Lets calculate the analysis error
- covariance, from
- we get
- The gain matrix for the state estimate
and the gain matrix for the source estimate
It is from the correlation between the state
error and source error that observational
information can be projected on the source
70(No Transcript)
71- We note the measurement of
- the state is equally efficient for
- reducing the ltccgt and ltcsgt
- error covariances.
We thus observe the following 1- When the
state-source error is uncorrelated, i.e.
?cs 0, then there is no variance
reduction of source error
72Summary The system we are studying has
two unknown, the (initial) state and the source.
The effect of observing is encapsulated in the
state-state, state-source, and source-source
error covariances. In particular we note, that
we cant gain any information about the source if
there is no cross state-source error
correlation. Ironically, the effect of observing
is to reduce the cross state-source error
correlation. Finally, we note the state error
variance is greater reduced by observations than
the source error variance is reduced.
Then after a bit of algebra, we get or
That is, the effect of observing the state
reduces the cross state-source error correlation.
73Forecast
- Lets continue with the scalar model,
- and parametrized the model evolution
- by the simple equation
- where L is a constant that represent
- chemical life time and diffusion
- effects
where the superscript t denotes the true value
The error thus follows where the tilde
denotes the departure from the truth
- Also let consider a constant source
- In order to compute the evolution of
- error, a model of the truth is needed.
- To simplify, lets assume that the
- main uncertainty arises from modeling
- of the source dynamics rather than
- from the modeling of the transport,
74(No Transcript)
75(No Transcript)
76Numerical experiment
No assimilation
With assimilation R Q100
77Time correlation of errors A way to increase
sensitivity to the sources
- In synthetic inversion of the source
- long time integration period are
- used. Lets examine how longer
- time integrations may carry more
- information about the source. For
- this we will examine the time
- correlations
we get,
- Using the formal solutions
78Numerical experiment
(same conditions as before)
red gain for the source blue gain for the state
7912. Some thoughts about distinguishing model
error bias from observation error bias
- Model error bias is a growing (time evolving)
quantity which - contrast with observation error bias that is
nearly constant in time - From a strict point of view of estimation
theory, there is a fundamental - difference between observation error
estimation and model error estimation. - In model error estimation, the model bias is
an unobserved variable. - Whereas in observation bias estimation, the
observation bias is - an observed variable.
8013. Some remarks and future directions
- Data assimilation is part of nearly all
geophysical and planetary science - investigation
- Especially with advanced data assimilation, the
problem becomes that - of estimating the model error and observation
error. Thus touching upon - the scientific question of determining what is
model error from observation - error
- Its is through lagged-forecast error covariance
(e.g. 4D Var) that we can - get a lot of sensitivity to model error
- In order to make progress in data assimilation,
there is a need to develop - systematic ways to determine all the error
statistics
81Useful formulas
Differentiation with inner products
Differentiation with trace of matrices
82Sherman-Morrison-Woodbury formula
Matrix inversion lemma