Forecasting Theory

About This Presentation

Title:

Forecasting Theory

Description:

Maximum Likelihood Estimation ... Maximum Likelihood Estimation. Assume that the observations are independent. We define the likelihood function ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 65

Provided by: Jack333

Category:

more less

Transcript and Presenter's Notes

Title: Forecasting Theory

1
Forecasting Theory

J. M. Akinpelu

2
What is Forecasting?

Forecast - to calculate or predict some future
event or condition, usually as a result of
rational study or analysis of pertinent data
Websters Dictionary

3
What is Forecasting?

Forecasting methods
Qualitative
intuitive, educated guesses that may or may not
depend on past data
Quantitative
based on mathematical or statistical models
The goal of forecasting is to reduce forecast
error.

4
What is Forecasting?

We will consider two types of forecasts based on
mathematical models
Regression forecasting
Single-variable (time series) forecasting

5
What is Forecasting?

Regression forecasting
We use the relationship between the variable of
interest and the other variables that explain its
variation to make predictions.
The explanatory variables are non-stochastic.
The explanatory variables are independent the
variable of interest is dependent.

6
What is Forecasting?

Regression forecasting
Height is the independent variable
Weight is the dependent variable

7
What is Forecasting?

Single-variable (time series) forecasting
We use past history of the variable of interest
to predict the future.
Predictions exploit correlations between past
history and the future.
Past history is stochastic.

8
What is Forecasting?

Single-variable (time series) forecasting

9
Normal Distribution

A continuous random variable X is normally
distributed if its density function is given by
In this case
EX ?
var(X) ? 2.

10
Normal Density Function
11
Maximum Likelihood Estimation

Suppose that Y1, , Yn are continuous random
variables with respective densities fi(y ?) that
depend on some common parameter ? (which can be
vector-valued). Assume that
? is unknown
we observe y1, , yn.
We want to estimate the value of ? associated
with Y1, , Yn . Intuitively, we want to find the
value of ? that is most likely to give rise to
the data sample y1, , yn.

12
Maximum Likelihood Estimation
Example Consider the data sample y1, , y20
below. Assume that all the densities are the
same, and that the unknown parameter is the mean.
Which of the two distributions most likely
produced the data sample below?
13
Maximum Likelihood Estimation

Assume that the observations are independent. We
define the likelihood function
In maximum likelihood estimation, we choose the
value of ? that maximizes the likelihood
function.

14
Maximum Likelihood Estimation

Furthermore, since logarithm is a monotone
increasing function, then the value of ? that
maximizes () also maximizes the log of the
likelihood function

15
Maximum Likelihood Estimation and Least Squares
Estimation

Now assume that Yi is normally distributed with
mean ?i(?), where ? is unknown. Assume also that
all of the densities have a common known variance
? 2. Then the log likelihood function becomes

16
Maximum Likelihood Estimation and Least Squares
Estimation

Hence maximizing L(? y1, , yn) is equivalent
to minimizing the sum of squared deviations
The value of ? that minimizes S(?) is called the
least squares estimate of ?.

17
Regression Forecasting

We suppose that Y is a variable of interest, and
X1, , Xp are explanatory or predictor variables
such that
Y h(X1, , Xp ß).
h is the mathematical model that determines the
relationship between the variable of interest and
the explanatory variables
? (ß0, , ßm)? are the model parameters.

18
Regression Forecasting

Further assume that
we know h (i. e., the model is known), but we do
not know ß
we have noisy measurements of the variable of
interest, Y
yi h(xi1, , xip ß) ei.

19
Regression Forecasting

the random noise ei satisfy
Eei 0 for all i.
var(ei) ? 2, a constant that does not depend on
i.
The eis are uncorrelated.
The eis are each normally distributed (which
implies that they are independent), i.e.,
ei N(0, ? 2).

20
Regression Forecasting

Note that since
yi h(xi1, , xip ß) ei
and
ei N(0, ? 2)
then
yi N(h(xi1, , xip ß) , ? 2).

21
Regression Forecasting

For any values of the explanatory variables x1,
, xp, if ? is known, we can predict y as
y h(x1, , xp ß).
Since ? is unknown, we use least squares
estimation to estimate ?, which we denote by
. In this case, we forecast y as

22
Regression Forecasting
Example
What are the best values for ?0 and ?1?
23
Regression Forecasting
Residuals are the differences between the
observed values and the predicted values. We
define the residual for the ith observation
as A good set of parameters is one for
which the residuals are small.
ei
24
Regression Forecasting

More specifically, if
then we choose to minimize

25
Regression Forecasting

Examples of Regression Models

26
Constant Mean Regression

Suppose that the yis are a constant value plus
noise
yi ?0 ei,
i.e., ? ?0. Hence
yi N(?0, ? 2).
We want to determine the value of ?0 that
minimizes

27
Constant Mean Regression

Taking the derivative of S(?0) gives
Finally setting this equal to zero leads to
Hence the sample mean is the least squares
estimator for ?0.

28
Constant Mean Regression

Example yi ?0 ei,

29
Simple Linear Regression

Consider the model
yi ?0 ?1xi ei,
i.e., ? (?0, ?1). Hence
yi N(?0 ?1xi, ? 2).
We want to determine the values of ?0 and ?1 that
minimize

30
Simple Linear Regression

Setting the first partial derivatives equal to
zero gives

31
Simple Linear Regression

Solving for ?0 and ?1 leads to the least squares
estimates
(This is left as a homework exercise.)

32
Simple Linear Regression

Define e (e1, , en)?, where
The equations
imply that

33
Simple Linear Regression

Example

34
Simple Linear Regression

Example continued

35
Simple Linear Regression

Example (continued)
Regression equation

36
General Linear Regression

Consider the linear regression model
or
where xi (1, xi1, , xip)? and ? (?0, , ?p)?.

37
General Linear Regression

Suppose that we have n observations yi. We
introduce matrix notation and define
y (y1, , yn)?, e (e1, , en)?,
Note that y is n ? 1, e is n ? 1, and X is n ? (p
1).

38
General Linear Regression

Then we can write the regression model as
Note that y has a mean vector and covariance
matrix given by
where I is the n ? n identity matrix.

39
General Linear Regression

Note that by var(y), we mean the matrix
Note that this is a symmetric matrix.

40
General Linear Regression

We assume that the matrix X, which is called the
design matrix, is of full rank. This means that
the columns of the X matrix are not linearly
related.
A violation of this assumption would indicate
that some of the independent variables are
redundant, since at least one of the variables
would contain the same information as a linear
combination of the others.

41
General Linear Regression

In matrix notation, the least squares criterion
can be expressed as minimizing
The least squares estimator is given by
(Proof is omitted.)

42
General Linear Regression

It follows that the prediction for Y is given by

43
General Linear Regression

Example Simple Linear Regression

44
General Linear Regression

Example Simple Linear Regression (p 1)

45
General Linear Regression

Example Simple Linear Regression (p 1)

46
Properties of Least Squares Estimators

The least squares estimator of ? is unbiased.
The (p 1) ? (p 1) covariance matrix of the
least squares estimator is given by
If the errors are normally distributed then

47
Properties of Least Squares Estimators

It follow from the derivation of the least
squares estimate that the residuals satisfy
X?e 0.
In particular,

48
Testing the Regression Model

How well does the regression line describe the
relationship between the independent and
dependent variables?

49
Testing the Regression Model
Explained deviation
Unexplained deviation
50
Testing the Regression Model

Lets analyze these variations
But

51
Testing the Regression Model

Hence
Total sum of squares (SSTO)
Sum of squares due to regression (SSR)
Sum of squares due to error (SSE)

52
Coefficient of Determination

The coefficient of determination R2 is a measure
of how well the model is doing in explaining the
variation of the observations around their mean
A large R2 (near 1) indicates that a large
portion of the variation is explained by the
model.
A small value of R2 (near 0) indicates that only
a small fraction of the variation is explained by
the model.

53
Correlation Coefficient

The correlation coefficient R is the square root
of the coefficient of determination. For simple
linear regression, it can also be expressed as
It varies between -1 and 1, and quantifies the
strength of the association between the
independent and dependent variables. A value of R
close to 1 indicates a strong positive
correlation a value close to -1 indicates a
strong negative correlation. A value close to
zero indicates weak or no correlation.

54
Correlation
positive correlation
negative correlation
no correlation
55
Testing the Regression Model

Example Simple Linear Regression
Regression equation

56
Estimating the Variance

So far we have assumed that we know the variance
? 2. But in general this value will be unknown.
We can estimate ? 2 from the sample data by

57
Confidence Interval for Regression Line

Simple Linear Regression Suppose x0 is a
specified value of the independent variable. A
100?(1-?) confidence interval for the value of
the mean of the dependent variable y0 at x0 is
given by

58
Prediction Interval for an Observation

Simple Linear Regression A 100?(1-?) prediction
interval for an observation y0 associated with x0
is given by

59
What is Forecasting (Revisited)?

Statistical forecasting is not predicting
a value
Statistical forecasting is predicting
the expected value
variability about the expected value

60
Homework

Complete the proof for the result that for the
simple linear regression model,
Prove that if Y is a random variable with finite
expected value, then the constant c that
minimizes E(Y c)2 is c EY.

61
Homework

Suppose that the following data represent the
total costs and the number of units produced by a
company.
Graph the relationship between X and Y.
Determine the simple linear regression line
relating Y to X.
Predict the costs for producing 10 units. Give a
95 confidence interval for the costs, and for
the expected value (mean) of the costs associated
with 10 units.
Compute the SSTO, SSR, SSE, R and R2. Interpret
the value of R2.

62
Homework

Consider the fuel consumption data on the next
slide, and the following model which relates fuel
consumption (Y) to the average hourly temperature
(X1) and the chill index (X2)
Plot Y versus X1 and Y versus X2.
Determine the least squares estimates for the
model parameters.
Predict the fuel consumption when the temperature
is 35 and the chill index is 10.
Compute the SSTO, SSR, SSE and R2. Interpret the
value of R2.

63
Data for Problem 4
64
References

Bovas Abraham, Johannes Ledolter, Statistical
Methods for Forecasting, Wiley Series in
Probability and Mathematical Sciences, 1983.
Stanton A. Glantz, Bryan K. Slinker, Primer of
Applied Regression and Analysis of Variance,
Second Edition, McGraw-Hill, Inc., 2001.
Spyros Makridakis, Steven C. Wheelwright, Rob J.
Hyndman, John Wiley Sons, Inc., 1998.

Write a Comment

User Comments (0)

About PowerShow.com

Forecasting Theory - PowerPoint PPT Presentation

Forecasting Theory

Maximum Likelihood Estimation ... Maximum Likelihood Estimation. Assume that the observations are independent. We define the likelihood function ... – PowerPoint PPT presentation