Additional Topics in Prediction Methodology - PowerPoint PPT Presentation

About This Presentation

Title:

Additional Topics in Prediction Methodology

Description:

Additional Topics in Prediction Methodology – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 31

Provided by: Lin155

Category:

more less

Transcript and Presenter's Notes

Title: Additional Topics in Prediction Methodology

1
Additional Topics in Prediction Methodology
2
Introduction

Predictive distribution for random variable Y0 is
meant to capture all the information about Y0
that is contained in Yn.
not completely specify Y0 but does provide a
probability distribution of more likely and less
likely values of Y0
EY0Yn is the best MSPE predictor of Y0

3
Hierarchical models have two stages

X ?Rd
f0f(x0) known p1 vector
F(fj(xj)) known np matrix
? unknown p1 vector regression coefficients
R(R(xi-xj)) known nn matrix correlations among
trainning data Yn
r0(R(xi-x0)) known n1 vector correlations of Y0
with Yn

4
Predictive Distributions when ?Z2, R and r0 are
known
5
(No Transcript)
6
Interesting features of (a) and (b)

Non-informative Prior is the limit of the normal
prior as ???
While the prior is non-informative, it is not a
proper distribution. The corresponding predictive
distribution is proper.
The same conditioning argument can be applied to
drive posterior mean for the non-informative
prior and normal prior.

7
The mean and variance of the predictive
distribution (mean)

?0n(x0) and ? 0n(x0) depend on x0 only through
the regression function f0 and correlation vector
r0
?0n(x0) is a linear unbiased predictor of Y(x0)
The continuity and other smoothness properties of
?0n(x0) are inherited from correlation function
R(.) and the regressors f(.)j1p

?0n(x0) depends on the parameters ?z2 ?2 only
through their ratio
?0n(x0) interpolate the training data. When
x0xi, f0f(xi), and r0TR-1eiT, the ith unit
vector.

9
(No Transcript)
10
The mean and variance of the predictive
distribution (Variance)

MSPE(?0n(x0) ) ? 0n2(x0)
The variance of the posterior of Y(x0) given Yn
should be 0 whenever x0xi
? 0n2(xi)0

11
Most important use of Theorem 4.1.1
12
Predictive Distributions when R and r0 are known
The posterior is a location shifted and scaled
univariate t distribution having degrees of
freedom that are enhanced when there is
informative prior information for either ? or ?z2
13
(No Transcript)
14
(No Transcript)
15
Degree of freedom

Base value for the degree of freedom ?in-p
P additional degrees of freedom when prior ? is
informative
?0 additional degree of freedom when ?z2 is
informative

16
Location shift
The same centering value as Theorem 4.1.1 (known
?z2 ) The non-informative prior gives the BLUP
17
Scale factor ?i2(x0) (compare 4.1.15 with 4.1.6)

Estimate of the scale factor ?0n2(x0).
Qi2/?i estimate ?z2
Qi2 get information about ?z2 from the
conditional distribution Yn given ?z2 and
information from the prior of ?z2
?i2(xi)0, xi is any of the training data points.

18
Prediction Distributions when Correlation
parameters are unknown

If the correlations among the observations is
unknown (R r0 are unknown)?
Assume y(.) has a Gaussian prior with
correlation function R(.?), ? is unknown vector
parameters
Two issues
Standard error of Plug-in predictor ?0n(x0?) by
substituting ? comes from MLE or REML
Bayesian approach to uncertainty in ? which is to
model it by a prior distribution

19
Prediction of Multiple Response Models

Several outputs are available for from a computer
experiment
Several codes are available for computing the
same response (fast and slow code)
Competing response
Several stochastic models for joint response
Using these models to describe the optimal
predictor for one of the several computed
responses.

20
Modeling Multiple Outputs

Zi(.) marginally mean zero stationary Gaussian
stochastic processes with unknown variance and
correlation function R
Zi(x) implies that the correlation between Zi(x1)
and Zi(x2) only depends on x1-x2
Assume Cov(Zi(x1), Zj(x2))?i?jRij(x1-x2)
Rij(.) cross-correlation function of Zi(.) and
Zj(.)
Linear model global mean of the Yi process.
fi(.) known regression functions
?i unknown regression parameters

21
Selection of correlation and cross-correlation
functions are complicated

Reason for any input sites xli, the multivariate
normal distributed random vector (Z1(x11), .)T
must have a nonnegative definite covariance
matrix
Solution construct the Zi(.) from a set of
elementary processes (usually this processes are
mutually independent)

22
Example by Kennedy and OHagan

Yi(x) prior for the ith code level (im
top-level code). The autoregressive model
Yi(x)?i-1Yi-1(x)?i(x), i2, , m
The output for each successive higher level code
i at x is related to the output of the less
precise code i-1 at x plus the refinement ?i(x)
Cov(Yi(x), Yi-1(w)Yi-1(x))0 for all wx
No additional second-order knowledge of code i at
x can be obtained from the lower-level code i-1
if the value of code i-1 at x is known (Markov
property on the hierarchy of codes)
Since there is no natural hierarchy of computer
code in such applications, we need find something
better.

23
More reasonable Model

Each constraint function is associated with the
objective function plus a refinement
Yi(x)?iY1(x)?i(x), i2, , m1
Ver Hoef and Marry
Form models in the environmental sciences
Include an unknown smooth surface plus a random
measurement error.
Moving averages over white noise processes

24
Morris and Mitchell model

Prior information about y(x) is specified by a
Gaussian processor Y(.)
Prior information about the partial derivatives
y(j)(x) is obtained by considering the
derivative processes of Y(.)
Y1(.)y(.), y2(.) y(1)(.), y1m(.)y(m)(.)
Natural prior for y(j)(x)
The covariances between Y(x1), Y(j)(x2) and
Y(i)(x1), Y(j)(x2) are

25
Optimal Predictors for Multiple Outputs

The best MSPE predictor based on training data
is
Where Y0Y1(X0), Yini(Yi(x1i), ), and yini is
observed value for i1,m

26
The joint distribution is the multivariate normal
distribution
27
Conditional expectation

..
In practice, this is useless (it requires
knowledge of marginal correlation functions,
joint correlation function and ratio of all the
process variance)
Empirical versions are of practical use
Every time we assume each of the correlation
matrices Ri and cross-correlation matrices Rij
are known up to a vector of parameters.
Estimate ? using MLE or REML

28
example1

14 point training data has feature that it allows
us to learn over the entire input space
space-filling
Compare two model
Using the predictor of y(.) based on y(.) alone
Using the predictor of y(.) base on (y(.),
y(1)(.), y(2)(.))
Second one is both more visually fit and has 24
smaller ERMSPE

29
(No Transcript)
30
Thank you!

Write a Comment

User Comments (0)