Title: Ka-fu Wong University of Hong Kong
1Ka-fu WongUniversity of Hong Kong
Modeling and Forecasting Trends
2Background
- The unobserved components approach to modeling
and forecasting economic time series assumes that
the typical economic time series, yt, is made up
of the sum of three independent components - a time trend component
- a seasonal component
- an irregular or cyclical component.
- yt time trend seasonal cyclical Tt
St Ct - The time trend refers to the long-run average
behavior of the series. - The seasonal refers to the annual predictable
cyclical behavior of the series associated with
weather patterns, holiday patterns, etc. - The cyclical component refers to the remainder of
the series after the trend and seasonal have been
accounted for.
3Background
- The assumption that these components are
determined independently means that each
component is determined and influenced by its own
set of forces and, consequently, each component
can be studied separately. - The approach is called an unobserved components
approach because we do not directly observe each
of the three components we only get to observe
their sum. Our job will be to model and estimate
the various components and use these estimates as
the basis for forecasting the components and
their sum.
4Background
- Whether the assumption underlying the unobserved
components approach, that the trend, seasonal,
and cyclical components are determined
independently, is plausible or not is debatable
and is, in fact, an issue of some controversy
among economists. - For example, many macroeconomists argue that
economic growth (trend) and the business cycle
(cyclical) are determined by a common set of
forces.
5U.S. Female Labor Force Participation Rate
6U.S. Male Labor Force Participation Rate
7Hong Kong labor force participation rates (male
and female)
8Chinas per Capita Real GDP
9Modeling the Trend
- If we look at Chinas per capita real GDP time
series or any one of your time series, the first
thing that stands out us is the obvious tendency
of the series to grow (or, in some cases, to
fall) over time. - That is, it is immediately apparent from the time
series plot that the average change in the series
is positive (or, in some cases, negative). This
tendency is the seriess trend. - The simplest model of the time trend is the
linear trend model - Tt ß0 ß1t, t 1,,T
-
10Modeling the Trend
- The simplest model of the time trend is the
linear trend model - Tt ß0 ß1t, t 1,,T
-
- The trend component is a straight line with
intercept ß0 and slope ß1. And, T1 ß0 ß1, T2
ß0 2ß1,,TT ß0 Tß1. - Note that ß1 dTt/dt and ß1 Tt Tt-1. So,
- ß1 gt 0 if y has a positive trend and
- ß1lt 0 if y has a negative trend.
- The intercept, as is often the case in
econometric models, does not have a meaningful
interpretation and its sign can be positive or
negative, regardless of the trends sign.
11Graphical view of linear trend
An downward trend
An upward trend
12Polynomial trend model
- In some cases, a linear trend is inadequate to
capture the trend of a time series. A natural
generalization of the linear trend model is the
polynomial trend model - Tt ß0 ß1t ß2t2 ßptp where p is a
positive integer. - Note that the linear trend model is a special
case of the polynomial trend model (p1). - For economic time series we almost never require
p gt 2. That is, if the linear trend model is not
adequate, the quadratic trend model will usually
work - Tt ß0 ß1t ß2t2
-
- In the quadratic model, dTt/dt ß12tß2
13Graphical view of Quadratic Trends
14Graphical view of Quadratic Trends
15The Log Linear Trend Model
- Another alternative to the linear trend model is
the log linear trend model, which is also called
the exponential trend model - Tt ß0exp(ß1t)
- or, taking natural logs on both sides,
- log(Tt) log(ß0) ß1t
- so that the log of the trend component is
linear. - Note that for the log linear trend model
- ß1 log(Tt) log(Tt-1) change in T
16Graphical view of exponential trends
17Graphical view of exponential trends
18Which trend model to use?
- Knowing the differences among these models can
help us decide whether the linear, quadratic or
log linear trend model is more appropriate for
our data. - In the linear trend model the change in T is
constant over time. - In the quadratic trend model the change in T has
a linear trend. - In the log linear trend model the growth rate
that is constant over time. - However, in practice, it is not always obvious by
simply looking at the time series plot which form
the trend model should take linear, log linear,
quadratic? Other? - Practice and experience are the most helpful.
19All Deterministic Trend Models
- Note that in all of these models, the trend is
deterministic, i.e., perfectly forecastable. For
instance, in the linear trend model, the forecast
of TTh made at time T is - ß0 (Th)ß1 TTh
- (Later in the course we will talk about
stochastic trend models, in which the trend of
the series is not perfectly forecastable.) - However, even if we correctly specify the shape
of the trend (linear, quadratic, exponential, ),
the parameters of the trend model are unknown.
So, in practice, we will have to estimate these
parameters, which will introduce errors (called
sampling or estimation error) into our trend
forecasts.
20Estimating the Trend Model
- Our assumption at this point is that our time
series, yt, can be modeled as - yt Tt(?) et
- where
- Tt is one of the trend models we discussed
earlier, - ? is the set of parameters ? (ß0, ß1) in a
linear trend model. - et denotes the other factors (i.e., the seasonal
and cyclical components) that determine yt. - Since ? is unknown, it is natural to estimate the
trend model via the least squares approach
Quadratic loss
The choice of ? that will minimize the objective
function.
21Estimating the Trend Model via the Least Squares
approach
can use OLS
- For quadratic trend model
can use OLS
- For exponential trend model
Nonlinear, has to be estimated numerically.
or
can use OLS
22Property of the Ordinary Least Squares Estimators
- Under the assumptions of the unobserved
components model, the OLS estimator of the linear
and quadratic trend models is - unbiased,
- consistent, and
- asymptotically efficient.
- Standard regression procedures can be applied to
test hypotheses about the ?s and construct
interval estimates. This is true even though the
regression errors will generally be serially
correlated and heteroskedastic.
23Forecasting the Trend
- Once we have specified a trend model our forecast
of the h-step ahead trend component of y will
simply be - Tth(?)
- When ? is unknown, we can estimate it as
discussed earlier. And, substitute the estimate
into the function above.
24Forecasting the Trend
We would like to forecast yTh based on all
information available at time T.
- Assume that the trend is linear.
If we know the true parameters, the part
?0?1TIMETh can be forecasted perfectly.
Can we forecast eTh? Sometimes YES. Sometimes
NO.
NO when et is known to be an independent
zero-mean random noise.
If et is an i.i.d. sequence with zero mean then
E(eTh ? information available at time T)
E(eTh) E(et) 0.
independent
identical
zero mean
25Forecasting the Trend
Assume et is known to be an independent zero-mean
random noise.
Forecast when parameters are known
Emphasize that forecast is made at time T,
utilizing all information that is available at
time T (usually all past information).
Forecast error
Fundamental uncertainty! Unavoidable !!
Forecast when parameters are unknown
Substitute in the estimate from the OLS
regression.
Forecast error
Due to parameter uncertainty. (increases with h)
Note TIMETh Th
26Density forecast
- Suppose we have no parameter uncertainty, we have
and
The forecast error
- Then the distribution of the forecast error will
simply be the distribution of ?Th. That is, for
any real number c,
E(eTh,T) 0,Var(eTh,T) ?2, where ?2
var(?t).
27Density forecast
- Further assume the ?s are i.i.d. N(0,?2), while
continuing to ignore parameter uncertainty. Then
the density forecast will be that
- Note that this density forecast depends on the
unknown parameter ?2. To make the density
forecast operational, we can replace ?2 with an
unbiased and consistent estimator,
28Density forecast
- Now consider the case with parameter uncertainty.
- Under usual assumptions, the forecast error due
to parameter uncertainty is asymptotically normal.
- Thus, eTh,T will be asymptotically normal.
- The unknown variance may be estimated as
Because we assume a linear trend.
29(No Transcript)
30Density forecast
Then we act as though yTh is distributed as
or, equivalently,
So, for example,
where Z is an N(0,1) random variable.
31Density forecast
- Further, we can construct interval forecasts of
yTh according to -
is a (1-?)100 forecast interval for yTh,
where Z1-(?/2) is the (1-(?/2))100 percentile
of the N(0,1) distribution.
- For example, if ? .05 then we obtain a
95-percent forecast interval for yTh,
since 1.96 is the 97.5 percentile of the N(0,1).
- Recall the interpretation of this kind of
interval 95 of the time, this procedure will
produce an interval that will turn out to include
the actual value of yTh.
32Selecting Forecasting ModelsR-square as a
criteria
- Consider the mean squared error (MSE)
where T is the sample size and
- Note that models with smallest MSE is also the
model with smallest sum of squared residuals,
because scaling the sum of squared residuals with
a constant (1/T) will not change the ranking.
33Selecting Forecasting ModelsR-square as a
criteria
- Consider the R-square (R2)
Depends only on data, not on model.
- Thus, models with the largest R-square is also
the model with the smallest MSE, and also the
model with smallest sum of squared residuals,
because scaling the sum of squared residuals by a
model-independent quantity will not change the
ranking.
34Selecting Forecasting ModelsR-square as a
criteria
- The R-square (R2) may be a good measure of
in-sample fit but a bad measure for out-of-sample
fit.
- Add an additional regressor in the model, we will
always obtain a R-square (R2) no less than the
one with less regressors. That is, a polynomial
trend model with a larger p will almost always
result in a smaller MSE and hence a larger
R-square.
- In fact, give me a time series and specify an R2,
subject to data availability, I can almost always
produce a trend model that will attain the
specific R2. - This effect is called in-sample overfitting or
data mining.
35Selecting Forecasting ModelsR-square as a
criteria
- In short, the MSE is a biased estimator of
out-of-sample h-step-ahead prediction error
variance. - because the forecast error consists of two
parts - Fundamental uncertainty (unavoidable even if we
know the parameters) - Parameter uncertainty (increases with the number
of parameters in the model)
- To reduce the bias associated with MSE and
R-square, we need to penalize for the number of
parameters included in the model (or the degree
of freedom).
36Selecting Forecasting ModelsAdjusted R-square as
a criteria
Number of parameters or degree of freedom
- Maximizing adjusted R-square is like minimizing
s2.
S2 increases with number of parameters.
37Selecting Forecasting ModelsCriteria that
penalize number of model parameters
- Akaike information criterion (AIC)
- Schwarz information criterion (SIC)
38The variation of criteria with k/T
39Use the consistent model selection criteria
- A model selection criterion is consistent if the
following conditions are met - When the true model i.e., the data-generating
process (DGP) is among the models considered,
the probability of selecting the true DGP
approaches 1 as the sample size gets large. - When the true model is not among those
considered, so that it is impossible to select
the true DGP, the probability of selecting the
best approximation to the true DGP approaches 1
as the sample size gets large. - SIC is consistent but AIC is not.
40Use the asymptotically efficient model selection
criteria
- A asymptotically efficient model selection
criterion chooses a sequence of models, as the
sample size get large, whose 1-step-ahead
forecast error variances approach the one that
would be obtained using the true model with known
parameters at a rate at least as fast as that of
any other model selection criteria. - AIC is asymptotically efficient but SIC is not.
41AIC or SIC
- Usually AIC and SIC suggest the same model.
- When AIC and SIC suggest different models, we
usually choose the model selected by SIC because
the SIC often suggests a more parsimonious model
(i.e., smaller number of parameters).
42AIC and SIC reported across software packages
- ln(AIC) ln(MSE) 2k/T
- ln(SIC) ln(MSE) kln(T)/T
43Out-of-sample fitting
- The AIC and SIC are in-sample fit criteria,
although they account for the costs of
overfitting through the inclusion of penalty
term. - What we are really interested in is the question
- Having fit the model over the sample period, how
well does it forecast outside of that sample? - The in-sample fit criteria that we discussed do
not directly answer this question.
44Out-of-sample fitting
- Suppose we have a data sample y1,,yT.
- Break it up into two parts (where n ltlt T)
- y1,yT-n (first T-n observations)
- yT-n1,,yT (last n observations)
1
T-n
T
T-n1
Use to estimate the model
Save n observations for checking the
out-of-sample fit
45Out-of-sample fitting
- Break it up into two parts (where n ltlt T)
- y1,yT-n (first T-n observations)
- yT-n1,,yT (last n observations)
- Fit the shortened sample, y1,,yT-n to various
trend models that may seem like plausible choices
based on time series plots, in-sample fit
criteria, linear, quadratic, the one selected
by AIC/SIC, log linear, - For each estimated trend model, forecast
yT-n1,,yT and compute the forecast errors
e1,,en - Compare the errors across the various models
- time series plots (of the forecasts and actual
values of yT-n1,,yT of the forecast errors) - tables of the forecasts, actuals, and errors
- mean squared prediction errors (MSPE)
46Out-of-sample fitting
- The advantage of this approach is that we are
actually comparing the trend models in terms of
their out-of-sample forecasting performance. - A disadvantage is that the comparison is based on
models fit over T-n observations rather than the
T observations we have available. (Note that if
you do use this approach and, for example, settle
on the quadratic model, then when you proceed to
construct your forecasts for T1, you should
use the quadratic model fit to the full T
observations in your sample.) - Will the fact that, for example, the quadratic
trend model outperformed other models in
forecasting out of sample based on the short
sample mean that it will perform best in
forecasting beyond the full sample? No.
47End