Autocorrelation in Regression Analysis - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Autocorrelation in Regression Analysis

Description:

Check a Durbin Watson table for the numbers for d-upper and d-lower. ... H0: no serial correlation. Alternatives to the d-statistic ... – PowerPoint PPT presentation

Number of Views:263
Avg rating:3.0/5.0
Slides: 26
Provided by: hjsm
Category:

less

Transcript and Presenter's Notes

Title: Autocorrelation in Regression Analysis


1
Autocorrelation in Regression Analysis
  • Tests for Autocorrelation
  • Examples
  • Durbin-Watson Tests
  • Modeling Autoregressive Relationships

2
What causes autocorrelation?
  • Misspecification
  • Data Manipulation
  • Before receipt
  • After receipt
  • Event Inertia
  • Spatial ordering

3
Checking for Autocorrelation
  • Test Durbin-Watson statistic

4
Consider the following regression
Source SS df MS
Number of obs 328 -------------------
------------------------ F( 2, 325)
52.63 Model .354067287 2
.177033643 Prob gt F 0.0000
Residual 1.09315071 325 .003363541
R-squared 0.2447 ------------------------
------------------- Adj R-squared
0.2400 Total 1.447218 327
.004425743 Root MSE
.058 --------------------------------------------
---------------------------------- price
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
--------------------------------------------
ice .060075 .006827 8.80 0.000
.0466443 .0735056 quantity -2.27e-06
2.91e-07 -7.79 0.000 -2.84e-06
-1.69e-06 _cons .2783773 .0077177
36.07 0.000 .2631944 .2935602 -----------
--------------------------------------------------
-----------------
Because this is time series data, we should
consider the possibility of autocorrelation. To
run the Durbin-Watson, first we have to specify
the data as time series with the tsset command.
Next we use the dwstat command.
Durbin-Watson d-statistic( 3, 328) .2109072
5
Find the D-upper and D-lower
  • Check a Durbin Watson table for the numbers for
    d-upper and d-lower.
  • http//hadm.sph.sc.edu/courses/J716/Dw.html
  • For n20 and k2, a .05 the values are
  • Lower 1.643
  • Upper 1.704

Durbin's alternative test for autocorrelation ----
--------------------------------------------------
--------------------- lags(p)
chi2 df Prob gt
chi2 --------------------------------------------
------------------------------ 1
1292.509 1
0.0000 -------------------------------------------
--------------------------------
H0 no serial correlation
6
Alternatives to the d-statistic
  • The d-statistic is not valid in models with a
    lagged dependent variable
  • In the case of a lagged LHS variable you must use
    the Durbin-a test (the command is durbina in
    Stata)
  • Also, the d-statistic is only for first order
    autocorrelation. In other instances you may use
    the Durbin-a
  • Why would you suspect other than 1st order
    autocorrelation?

7
The Runs Test
  • An alternative to the D-W test is a formalized
    examination of the signs of the residuals. We
    would expect that the signs of the residuals will
    be random in the absence of autocorrelation.
  • The first step is to estimate the model and
    predict the residuals.

8
Runs continued
  • Next, order the signs of the residuals against
    time (or spatial ordering in the case of
    cross-sectional data) and see if there are
    excessive runs of positives or negatives.
    Alternatively, you can graph the residuals and
    look for the same trends.

9
Runs test continued
The final step is to use the expected mean and
deviation in a standard t-test Stata does this
automatically with the runtest command!
10
Visual diagnosis of autocorrelation (in a single
series)
  • A correlogram is a good tool to identify if a
    series is autocorrelated

11
Dealing with autocorrelation
  • D-W is not appropriate for auto-regressive (AR)
    models, where
  • In this case, we use the Durbin alternative test
  • For AR models, need to explicitly estimate the
    correlation between Yi and Yi-1 as a model
    parameter
  • Techniques
  • AR1 models (closest to regression 1st order
    only)
  • ARIMA (any order)

12
Dealing with Autocorrelation
  • There are several approaches to resolving
    problems of autocorrelation.
  • Lagged dependent variables
  • Differencing the Dependent variable
  • GLS
  • ARIMA

13
Lagged dependent variables
  • The most common solution
  • Simply create a new variable that equals Y at
    t-1, and use as a RHS variable
  • To do this in Stata, simply use the generate
    command with the new variable equal to L.variable
  • gen lagy L.y
  • gen laglagy L2.y
  • This correction should be based on a theoretic
    belief for the specification
  • May cause more problems than it solves
  • Also costs a degree of freedom (lost observation)
  • There are several advanced techniques for dealing
    with this as well

14
Differencing
  • Differencing is simply the act of subtracting the
    previous observation value from the current
    observation.
  • To do this in Stata, again use the generate
    command with a capital D (instead of the L for
    lags)
  • This process is effective however, it is an
    EXPENSIVE correction
  • This technique throws away long-term trends
  • Assumes the Rho 1 exactly

15
GLS and ARIMA
  • GLS approaches use maximum likelihood to estimate
    Rho and correct the model
  • These are good corrections, and can be replicated
    in OLS
  • ARIMA is an acronym for Autoregressive Integrated
    Moving Average
  • This process is a univariate filter used to
    cleanse variables of a variety of pathologies
    before analysis

16
Corrections based on Rho
  • There are several ways to estimate rho, the most
    simple being calculating it from the residuals

We then estimate the regression by transforming
the regressors so that
and This
gives the regression
17
High tech solutions
  • Stata also offers the option of estimating the
    model with the AR (with multiple ways of
    estimating rho). There is also what is known as
    a prais-winsten regression which generates values
    for the lost observation
  • For the truly adventurous, there is also the
    option of doing a full ARIMA model

18
Prais-winsten regression
  • Prais-Winsten AR(1) regression -- iterated
    estimates
  • Source SS df MS
    Number of obs 328
  • -------------------------------------------
    F( 2, 325) 15.39
  • Model .012722308 2 .006361154
    Prob gt F 0.0000
  • Residual .134323736 325 .000413304
    R-squared 0.0865
  • -------------------------------------------
    Adj R-squared 0.0809
  • Total .147046044 327 .000449682
    Root MSE .02033
  • --------------------------------------------------
    ----------------------------
  • price Coef. Std. Err. t
    Pgtt 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • ice .0098603 .0059994 1.64
    0.101 -.0019422 .0216629
  • quantity -1.11e-07 1.70e-07 -0.66
    0.512 -4.45e-07 2.22e-07
  • _cons .2517135 .0195727 12.86
    0.000 .2132082 .2902188
  • -------------------------------------------------
    ----------------------------
  • rho .9436986
  • --------------------------------------------------
    ----------------------------
  • Durbin-Watson statistic (original) 0.210907

19
ARIMA
  • The ARIMA model allows us to test the hypothesis
    of autocorrelation and remove it from the data.
  • This is an iterative process akin to the purging
    we did when creating the ystar variable.

20
The model
  • ARIMA regression
  • Sample 1 to 328
    Number of obs 328

  • Wald chi2(1) 3804.80
  • Log likelihood 811.6018
    Prob gt chi2 0.0000
  • --------------------------------------------------
    ----------------------------
  • OPG
  • price Coef. Std. Err. z
    Pgtz 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • price
  • _cons .2558135 .0207937 12.30
    0.000 .2150587 .2965683
  • -------------------------------------------------
    ----------------------------
  • ARMA
  • ar
  • L1. .9567067 .01551 61.68
    0.000 .9263076 .9871058
  • -------------------------------------------------
    ----------------------------
  • /sigma .0203009 .000342 59.35
    0.000 .0196305 .0209713
  • --------------------------------------------------
    ----------------------------

Estimate of rho
Significant lag
21
The residuals of the ARIMA model
There are a few significant lags a ways back.
Generally we should expect some, but this mess is
probably an indicator of a seasonal trend (well
beyond the scope of this lecture)!
22
ARIMA with a covariate
  • ARIMA regression
  • Sample 1 to 328
    Number of obs 328

  • Wald chi2(3) 3569.57
  • Log likelihood 812.9607
    Prob gt chi2 0.0000
  • --------------------------------------------------
    ----------------------------
  • OPG
  • price Coef. Std. Err. z
    Pgtz 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • price
  • ice .0095013 .0064945 1.46
    0.143 -.0032276 .0222303
  • quantity -1.04e-07 1.22e-07 -0.85
    0.393 -3.43e-07 1.35e-07
  • _cons .2531552 .0220777 11.47
    0.000 .2098838 .2964267
  • -------------------------------------------------
    ----------------------------
  • ARMA
  • ar
  • L1. .9542692 .01628 58.62
    0.000 .9223611 .9861773
  • -------------------------------------------------
    ----------------------------

23
Final thoughts
  • Each correction has a best application.
  • If we wanted to evaluate a mean shift (dummy
    variable only model), calculating rho will not be
    a good choice. Then we would want to use the
    lagged dependent variable
  • Also, where we want to test the effect of
    inertia, it is probably better to use the lag

24
Final Thoughts Continued
  • In Small N, calculating rho tends to be more
    accurate
  • ARIMA is one of the best options, however, it is
    very complicated!
  • When dealing with time, the number of time
    periods and the spacing of the observations is
    VERY IMPORTANT!
  • When using estimates of rho, a good rule of thumb
    is to make sure you have 25-30 time points at a
    minimum. More if the observations are too close
    for the process you are observing!

25
Next Time
  • Review for Exam
  • Plenary Session
  • Exam Posting
  • Available after class Wednesday
Write a Comment
User Comments (0)
About PowerShow.com