Autocorrelation in Regression Analysis - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Autocorrelation in Regression Analysis

Description:

Check a Durbin Watson table for the numbers for d-upper and d-lower. ... H0: no serial correlation. Alternatives to the d-statistic ... – PowerPoint PPT presentation

Number of Views:263

Avg rating:3.0/5.0

Slides: 26

Provided by: hjsm

Category:

more less

Transcript and Presenter's Notes

Title: Autocorrelation in Regression Analysis

1
Autocorrelation in Regression Analysis

Tests for Autocorrelation
Examples
Durbin-Watson Tests
Modeling Autoregressive Relationships

2
What causes autocorrelation?

Misspecification
Data Manipulation
Before receipt
After receipt
Event Inertia
Spatial ordering

3
Checking for Autocorrelation

Test Durbin-Watson statistic

4
Consider the following regression
Source SS df MS
Number of obs 328 -------------------
------------------------ F( 2, 325)
52.63 Model .354067287 2
.177033643 Prob gt F 0.0000
Residual 1.09315071 325 .003363541
R-squared 0.2447 ------------------------
------------------- Adj R-squared
0.2400 Total 1.447218 327
.004425743 Root MSE
.058 --------------------------------------------
---------------------------------- price
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
--------------------------------------------
ice .060075 .006827 8.80 0.000
.0466443 .0735056 quantity -2.27e-06
2.91e-07 -7.79 0.000 -2.84e-06
-1.69e-06 _cons .2783773 .0077177
36.07 0.000 .2631944 .2935602 -----------
--------------------------------------------------
-----------------
Because this is time series data, we should
consider the possibility of autocorrelation. To
run the Durbin-Watson, first we have to specify
the data as time series with the tsset command.
Next we use the dwstat command.
Durbin-Watson d-statistic( 3, 328) .2109072
5
Find the D-upper and D-lower

Check a Durbin Watson table for the numbers for
d-upper and d-lower.
http//hadm.sph.sc.edu/courses/J716/Dw.html
For n20 and k2, a .05 the values are
Lower 1.643
Upper 1.704

Durbin's alternative test for autocorrelation ----
--------------------------------------------------
--------------------- lags(p)
chi2 df Prob gt
chi2 --------------------------------------------
------------------------------ 1
1292.509 1
0.0000 -------------------------------------------
--------------------------------
H0 no serial correlation
6
Alternatives to the d-statistic

The d-statistic is not valid in models with a
lagged dependent variable
In the case of a lagged LHS variable you must use
the Durbin-a test (the command is durbina in
Stata)
Also, the d-statistic is only for first order
autocorrelation. In other instances you may use
the Durbin-a
Why would you suspect other than 1st order
autocorrelation?

7
The Runs Test

An alternative to the D-W test is a formalized
examination of the signs of the residuals. We
would expect that the signs of the residuals will
be random in the absence of autocorrelation.
The first step is to estimate the model and
predict the residuals.

8
Runs continued

Next, order the signs of the residuals against
time (or spatial ordering in the case of
cross-sectional data) and see if there are
excessive runs of positives or negatives.
Alternatively, you can graph the residuals and
look for the same trends.

9
Runs test continued
The final step is to use the expected mean and
deviation in a standard t-test Stata does this
automatically with the runtest command!
10
Visual diagnosis of autocorrelation (in a single
series)

A correlogram is a good tool to identify if a
series is autocorrelated

11
Dealing with autocorrelation

D-W is not appropriate for auto-regressive (AR)
models, where
In this case, we use the Durbin alternative test
For AR models, need to explicitly estimate the
correlation between Yi and Yi-1 as a model
parameter
Techniques
AR1 models (closest to regression 1st order
only)
ARIMA (any order)

12
Dealing with Autocorrelation

There are several approaches to resolving
problems of autocorrelation.
Lagged dependent variables
Differencing the Dependent variable
GLS
ARIMA

13
Lagged dependent variables

The most common solution
Simply create a new variable that equals Y at
t-1, and use as a RHS variable
To do this in Stata, simply use the generate
command with the new variable equal to L.variable
gen lagy L.y
gen laglagy L2.y
This correction should be based on a theoretic
belief for the specification
May cause more problems than it solves
Also costs a degree of freedom (lost observation)
There are several advanced techniques for dealing
with this as well

14
Differencing

Differencing is simply the act of subtracting the
previous observation value from the current
observation.
To do this in Stata, again use the generate
command with a capital D (instead of the L for
lags)
This process is effective however, it is an
EXPENSIVE correction
This technique throws away long-term trends
Assumes the Rho 1 exactly

15
GLS and ARIMA

GLS approaches use maximum likelihood to estimate
Rho and correct the model
These are good corrections, and can be replicated
in OLS
ARIMA is an acronym for Autoregressive Integrated
Moving Average
This process is a univariate filter used to
cleanse variables of a variety of pathologies
before analysis

16
Corrections based on Rho

There are several ways to estimate rho, the most
simple being calculating it from the residuals

We then estimate the regression by transforming
the regressors so that
and This
gives the regression
17
High tech solutions

Stata also offers the option of estimating the
model with the AR (with multiple ways of
estimating rho). There is also what is known as
a prais-winsten regression which generates values
for the lost observation
For the truly adventurous, there is also the
option of doing a full ARIMA model

18
Prais-winsten regression

Prais-Winsten AR(1) regression -- iterated
estimates
Source SS df MS
Number of obs 328
-------------------------------------------
F( 2, 325) 15.39
Model .012722308 2 .006361154
Prob gt F 0.0000
Residual .134323736 325 .000413304
R-squared 0.0865
-------------------------------------------
Adj R-squared 0.0809
Total .147046044 327 .000449682
Root MSE .02033
--------------------------------------------------
----------------------------
price Coef. Std. Err. t
Pgtt 95 Conf. Interval
-------------------------------------------------
----------------------------
ice .0098603 .0059994 1.64
0.101 -.0019422 .0216629
quantity -1.11e-07 1.70e-07 -0.66
0.512 -4.45e-07 2.22e-07
_cons .2517135 .0195727 12.86
0.000 .2132082 .2902188
-------------------------------------------------
----------------------------
rho .9436986
--------------------------------------------------
----------------------------
Durbin-Watson statistic (original) 0.210907

19
ARIMA

The ARIMA model allows us to test the hypothesis
of autocorrelation and remove it from the data.
This is an iterative process akin to the purging
we did when creating the ystar variable.

20
The model

ARIMA regression
Sample 1 to 328
Number of obs 328
Wald chi2(1) 3804.80
Log likelihood 811.6018
Prob gt chi2 0.0000
--------------------------------------------------
----------------------------
OPG
price Coef. Std. Err. z
Pgtz 95 Conf. Interval
-------------------------------------------------
----------------------------
price
_cons .2558135 .0207937 12.30
0.000 .2150587 .2965683
-------------------------------------------------
----------------------------
ARMA
ar
L1. .9567067 .01551 61.68
0.000 .9263076 .9871058
-------------------------------------------------
----------------------------
/sigma .0203009 .000342 59.35
0.000 .0196305 .0209713
--------------------------------------------------
----------------------------

Estimate of rho
Significant lag
21
The residuals of the ARIMA model
There are a few significant lags a ways back.
Generally we should expect some, but this mess is
probably an indicator of a seasonal trend (well
beyond the scope of this lecture)!
22
ARIMA with a covariate

ARIMA regression
Sample 1 to 328
Number of obs 328
Wald chi2(3) 3569.57
Log likelihood 812.9607
Prob gt chi2 0.0000
--------------------------------------------------
----------------------------
OPG
price Coef. Std. Err. z
Pgtz 95 Conf. Interval
-------------------------------------------------
----------------------------
price
ice .0095013 .0064945 1.46
0.143 -.0032276 .0222303
quantity -1.04e-07 1.22e-07 -0.85
0.393 -3.43e-07 1.35e-07
_cons .2531552 .0220777 11.47
0.000 .2098838 .2964267
-------------------------------------------------
----------------------------
ARMA
ar
L1. .9542692 .01628 58.62
0.000 .9223611 .9861773
-------------------------------------------------
----------------------------

23
Final thoughts

Each correction has a best application.
If we wanted to evaluate a mean shift (dummy
variable only model), calculating rho will not be
a good choice. Then we would want to use the
lagged dependent variable
Also, where we want to test the effect of
inertia, it is probably better to use the lag

24
Final Thoughts Continued

In Small N, calculating rho tends to be more
accurate
ARIMA is one of the best options, however, it is
very complicated!
When dealing with time, the number of time
periods and the spacing of the observations is
VERY IMPORTANT!
When using estimates of rho, a good rule of thumb
is to make sure you have 25-30 time points at a
minimum. More if the observations are too close
for the process you are observing!

25
Next Time