Title: Slide sem ttulo
1A Latent Approach to the Statistical Analysis of
Space-time Data Dani Gamerman Instituto de
Matemática Universidade Federal do Rio de
Janeiro Brasil http//acd.ufrj.br/dani 17t
h International Workshop on Statistical
Modelling Chania, Crete, Greece, 8-12 July 2002
2World Cup Algorithm
? 2006
3A Latent Approach to the Statistical Analysis of
Space-time Data Dani Gamerman Instituto de
Matemática Universidade Federal do Rio de
Janeiro Brasil http//acd.ufrj.br/dani Joint
work with Marina S. Paez (IM-UFRJ) Flavia Landim
(IM-UFRJ) Victor de Oliveira (Arkansas) Alan
Gelfand (Connecticut) Sudipto Banerjee
(Minnesota) 17th International Workshop on
Statistical Modelling Chania, Crete, Greece, 8-12
July 2002
4 Introduction
Environmental science data in the form of a
collection of time series that are geographically
referenced.
Some examples can be found in other areas
Examples 1) measurements of pollutants in time
over a set of monitoring stations
2) selling price of properties around a
neighborhood of interest
3) counts of morbidity/mortality events in time
over a collection of geographic regions
5Main Objective spatial interpolation
6Picture showed mean interpolated values
Other features of interest can be obtained
Example Pollution in Rio de JaneiroProb ( PM10
gt 100 mg/m3 Yobs )
7Spatial Interpolation
m number of observations g number of grid
points s1, ... ,sm observed sites
s1n,...,sgn grid points (to interpolate)
Y1n,...,Ygn observations in the grid points
8y - all model parameters Ymis - missing
data, treated as parameters
If y y with probability 1 then
- Obtain P(YnYobs) by simulation.
Steps to generate from YnYobs
9Gaussian Process (GP) (or Gaussian Random Field)
S - region of Rp (in general, p2)
w(s) s ?S is a GP if ?n, s1 , ... , sm ?
S ( w(s1) , ... , w(sn) ) Nn (?, ?)
where ? ( ?1, ... , ?n ) with ?iEw(si)
and ? (?i?j ?w(si), w(sj) )i,j
Usual simplifications
1) Isotropy ?w(si),w(sj) ?q(hij) with
hijsi sj
2) Homoscedasticity ?i ?, ?i
Notation w(.) GP(?(.),?,?q)
10 Statistical Analysis
Starting point regression models Yt(s) ?t(s)
e t(s) where ?t(s) ?0 ?1 X t1(s) ...
?pXtp(s) and et(s) N(0, ?e2) independent
Suppose that Xtj(s) handles temporal
autocorrelation Otherwise, we can include a
temporal component ?t
Usually et(s) remains spatially correlated
In this case, et(s) e0(s) et1(s) e0(s) ?
errors spatially correlated et1(s) ? pure
residual (white noise) ? ?0(s) ?0 e0(s)
11Traditional approach geostatistical
?0(.) GP(?0,?,?q) or e0(.) ?0(.) ? ?0
GP(0,?,?q)
then, ?0obs N(?0 1, ?, R) ?0obs (?0(s1) ,
... , ?0(sm) )
Hiperparameters ?e2, ?2 and ?0
Inference 1. At first (3 steps)
(a) ?0 , ?1 , ... , ?p estimated in the
regression model and the residuals rt0(s)
Yt (s) ? mt (s) are constructed
(b) ?e2, ?2 and ?0 estimated from rt0(s)
12- 3) Natural solution (Kitanidis, 1986 Handcock
Stein, 1993) - specify distribution for ?0
- perform Bayesian inference
13 Model generalization
Recall EYt(s)?0(s) ?1Xt1(s) ... ?pXtp(s)
Spatial heterogeneity doesnt have to be
restricted to ?0
Example site by site effect of temperature in
the Rio pollution data
14We can accommodate spatial variation for other
coefficients ?j, j1, ... , p.
Extension of the previous model E Yt(s) ?0(s)
?1(s)Xt1(s) ... ?p(s)Xtp(s)
previous model
E Yt(s) ?0(s) ?1 Xt1(s) ... ?p
Xtp(s)
?(.) (?0(.), ?1(.),..., ?p(.))
One possibility ?(.) GP(?, ?, ??)
Hyperparameters Y (g , se , S , q0), where g
(g0, g1,..., gp)
Special cases for the ?j(.)s
15How to estimate ?j(s), j0,1,...,p ?
2) natural solutions Specify prior distribution
for Y In general, independent and non informative
priors are used
16- Model Summary
- Parameters ?obs ,? where ? (? ,?e2, ?,?)
- ?jobs (?j(s1) , ... , ?j(sm) ), j0, 1, ... , p
- ?obs (?0obs , ... , ?pobs )
- ( ?0 , ?1 , ... , ?p )
- Data Yobs (Y1(s1) , ... , YT(sm))
- Xobs (X1(s1) , ... , XT(sm))
17Simulated data
Yt(s) mt(s) et(s), t1,...,30 mt(s)
b0(s) b1(s) Xt(s) et(s) N(0, ?e2)
independent with ?e21
b0 N(g0, s0 ,?0 (.)) b1
N(g1, s1 ,?1 (.)) Xt(s) N(g2, s2 ,?2
(.)), for all time t Exponential correlation
functions rj (x) exp- qj x
g0 100 g1 5 g2 0 q0 0.4
q1 0.8 q2 1.5 s02 0.1 s12
1 s22 0.333
18Simulated Data
19Inference
Parameters (?obs ,?) ? (? ,?e2, ?,?)
Likelihood L(?obs ,?) p(Yobs ?obs , ?e2 )
Prior p(?obs ,?) p( ?obs Y) p(?) p(?e2) p(S)
p(q)
Posterior ?(?obs ,?) ? L (?obs ,?) ? p(?obs ,?)
- Complicated functional form
20Full Conditionals
(a) ?obs rest Normal
(b) ? rest Normal
? use ?jobs as if they were data
(c) ?e2 rest ?e2 Yobs , ?obs
Inverse Gamma
(d) ? rest ? bobs , ? , q
Inverse Wishart
(e) ? rest ?j p(?j ?jobs , ?j, ?) p(?)
? again, use ?jobs as data (geostatistical
analysis)
? hard to sample ? Metropolis - Hastings
21Results (based on a regular grid of m25 sites)
22Spatial Interpolation
Interpolation grid s1n , ... , sgn ?jn
(?j(s1n) , ... , ?j(sgn) ), j0, 1, ... , p ?n
(?0n , ... , ?pn )
We need to obtain the interpolation of ?js to
interpolate Yn
23Spatial Interpolation of ?s
?(?n,?obs,? Yobs) ?( ?n ?obs, ?, Yobs) ?(
?obs,? Yobs) ?(
?n ?obs ,?) ?( ?obs,? Yobs)
Simulation of ?n Yobs in 2 steps (a)
?obs,? Yobs ? using MCMC (b) ?n ?obs ,?
? using Multivariate Normal
Interpolation of Ys
?(Yn,?n,? Yobs) ?(Yn?n, ?, Yobs) ?(?n,?
Yobs) ?(Yn ?n ,?)
?(?n,? Yobs)
Simulation of Yn Yobs also in 2 steps
(a) ?n,? Yobs ? MCMC and Spatial
Interpolation
(b) Yn ?n ,? ? using Multivariate Normal
24(No Transcript)
25(No Transcript)
26 Interpolation of Xs
These interpolations assume that the interpolated
covariates Xj are available for j1, ... ,
p Otherwise, we must interpolate them
27(No Transcript)
28Results obtained by interpolating X
Precisions less sparse then when X is known
29(No Transcript)
30(No Transcript)
31Application to the pollution data
Yt (s) square root of PM10 at site s and time
tXt (MON, TUE, WED, THU, FRI, SAT)
et (s) independents N(0,se2)b0 N(g0, s0 ,?0
(.))b1 N(g1, s1 ,?1 (.))ri (.), i0,1 are
exponential correlation functions
32Results for the pollution data in Rio
33Interpolation of the b1 coefficient
Prior for t
c ? obtained by exploratory analysis site by site
(OLS)
Same idea can be used for q (explanatory
geostatistical analysis)
34We can also accommodate temporal variation of the
coefficients ?j, j0,...,p.
Natural specification ?t(.) g t GP(?t ,S
,??), independent in time
The model must be completed with (a) prior for
(se , S , q)? as before (b) specification of the
temporal evolution of the ?ts
35 Suggestion - use dynamic models
(SVP/TVM) (Landim Gamerman, 2000) ?t ?t-1
N( Gt ?t-1 , Wt ) ? ? unknown parameters of the
? evolution
Model parameters ?obs , ?, ? ? ( ?0 , ?e2 , S
, ? , W ) where ? ( ?1, ... , ?T) and ?t (
?t0 , ?t1, ... , ?tp ), t1, ... , T
Simulation cycle has 2 changes I) additional
step to ? II) modified step to ?
36Application to simulated data
Multivariate observations Yt (s) (Yt1 (s), Yt2
(s))
Yt (s) bt0 (s) bt1(s)Xt1(s) et(s) bt(.) GP
(gt, S ,??) gt gt-1 wt same spatial
correlation to b0 and b1 r(.) - exponential
correlation function with q 1.
37(No Transcript)
38 Interpolation
Samples from ytnyobs are obtained through the
algorithm below
1. Sample from btobs, y yobs - through MCMC
2. Sample from btn btobs, y - through Gaussian
process
3. Sample from ytn btn, y - Independent Normal
draws
Once again, Xtn must be known, otherwise, they
will have to be interpolated.
39 Spatially- and time-varying parameters (STVP)
Another possibility temporal evolution applied
directly to the bt processes rather than to their
means
Yt (s) bt0 (s) bt1(s)Xt1(s) et(s) bt(.)
bt-1(.) wt(.) wt(.) GP (gt, s ,??)
independent in time
Completed with b0(.) g0 N(g0,R)
Marginal Prior bt(.) gt GP (gt, tS ,??)
Not separable at the latent level, unlike SVP/TVM
40Computations
MCMC algorithm must explore the correlation
structures ? parameters are visited in
blocks (Landim and Gamerman, 2000
Fruhwirth-Schnatter, 1994)
Prediction
Based on the forecast distribution of
YThYobs, for YTh (YTh(s1f ),..., YTh(sFf
)), and any collection (s1f,..., sFf)
1. Sample from bTobs, y Yobs - through MCMC
2. Sample from bThf bTobs, Y - obtained by
introduction of bThobs
bTobs ? bThobs by successive evolution of the b
process bThobs ? bThf by interpolation with
gaussian process
3. Sample from Ytn bTn, Y - Independent normal
draws
41Time-varying locations
Assume locations st (st1,..., stnt) at time t
btobs is a nt-dimensional vector, t 1,...,T
Both densities in the integrand are multivariate
normal
The convolution of these two densities can be
shown to be normal and required evolution
equation for b can be obtained
42Non-Gaussian Observations
Two distinct types of non-normality data
l estimated jointly with other model parameters
(de Oliveira, Kedem and Short, 1997)
For example, in the bernoulli or poisson form
standard approach yt(s) EF(mt(s))
spatio-temporal modeling issues similar
computations harder
43Non-Gaussian Evolution
Abrupt changes in the process ? normality is not
suitable
Robust alternative wt(.) GP(gt,S,rq) is
replaced by wt(.) lt GP(gt,lt-1S, rq) and lt
G(nt, nt), independent for t1,...,T
Therefore, wt(.) tP(gt,S,rq)
lts control the magnitude of the evolution
44Final Comments
- More flexibility to accommodate variations in
time and space.
- Static coefficient models samples from the
posterior were generated in the software
BUGS, with interpolation made in FORTRAN
- Extensions to observations in the exponential
family and estimation of the normalizing
transformation.
- Extension to accommodate anisotropic processes
to some components of the model.
45A Latent Approach to the Statistical Analysis of
Space-time Data Dani Gamerman Instituto de
Matemática Universidade Federal do Rio de
Janeiro Brasil http//acd.ufrj.br/dani 17t
h International Workshop on Statistical
Modelling Chania, Crete, Greece, 8-12 July 2002