Multicollinearity - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Multicollinearity

Description:

Lecture #8 Studenmund(2006): Chapter 8 Objectives Perfect and imperfect multicollinearity Effects of multicollinearity Detecting multicollinearity – PowerPoint PPT presentation

Number of Views:698
Avg rating:3.0/5.0
Slides: 31
Provided by: hkbuEduH6
Category:

less

Transcript and Presenter's Notes

Title: Multicollinearity


1
Lecture 8
Studenmund(2006) Chapter 8
Multicollinearity
Objectives
  • Perfect and imperfect multicollinearity
  • Effects of multicollinearity
  • Detecting multicollinearity
  • Remedies for multicollinearity

2
The nature of Multicollinearity
Perfect multicollinearity When there are
some functional relationships existing among
independent variables, that is ? ?iXi
0 or ?1X1 ?2X2 ?3X3 ?iXi
0 Such as 1X1 2X2 0 ? X1 -2X2
If multicollinearity is perfect, the regression
coefficients of the Xi variables, ?is, are
indeterminate and their standard errors, Se(?i)s,
are infinite.
3
(No Transcript)
4
If multicollinearity is imperfect, x2 ?1 x1
? where ? is a stochastic error (or
x2 ?0 ?1 x1 ? ) Then the regression
coefficients, although determinate, possess
large standard errors, which means the
coefficients can be estimated but with less
accuracy.
5
Example Production function Yi ?0 ?1X1i
?2X2i ?3X3i ?i
Y X1 X2 X3
122 10 50 52
170 15 75 75
202 18 90 97
270 24 120 129
330 30 150 152
Y Output X1 Capital X2 Labor X3 Land
X1 5X2
6
Example Perfect multicollinearity a. Suppose
D1, D2, D3 and D4 1 for spring, summer, autumn
and winter, respectively. Yi ?0 ?1D1i
?2D2i ?3D3i ?4D4i ?1X1i ?i.
b. Yi ?0 ?1X1i ?2X2i ?3X3i ?i X1
Nominal interest rate X2 Real interest rate
X3 CPI
c. Yt ?0 ?1?Xt ?2Xt ?3Xt-1 ?t
Where ?Xt (Xt Xt-1) is called first
different
7
Imperfect Multicollinearity
Yi ?0 ?1X1i ?2X2i ?KXKi
?i When some independent variables are linearly
correlated but the relation is not exact, there
is imperfect multicollinearity. ?0 ?1X1i
?2X2i ? ?KXKi ui 0 where u is a random
error term and ?k ? 0 for some k.
When will it be a problem?
8
Consequences of imperfect multicollinearity
1. The estimated coefficients are still BLUE,
however, OLS estimators have large variances and
covariances, thus making the estimation
with less accuracy.
2. The estimation confidence intervals tend to
be much wider, leading to accept the zero
null hypothesis more readily.
3. The t-statistics of coefficients tend to be
statistically insignificant.
4. The R2 can be very high.
5. The OLS estimators and their standard errors
can be sensitive to small change in the data.
9
OLS estimators are still BLUE under imperfect
multicollinearity
Why???
  • Remarks
  • Unbiasedness is a repeated sampling property, not
    about the properties of estimators in any given
    sample
  • Minimum variance does not mean small variance
  • Imperfect multicollinearity is just a sample
    phenomenon

10
Effects of Imperfect Multicollinearity
  • Unaffected
  • OLS estimators are still BLUE.
  • The overall fit of the equation
  • The estimation of the coefficients of
    non-multicollinear variables

11
The variances of OLS estimators increase with the
degree of multicollinearity
Regression model Yi ?0 ?1X1i ?2X2i ?i
  • High correlation between X1 and X2
  • Difficult to isolate effects of X1 and X2 from
  • each other

12
  • Closer relation between X1 and X2
  • larger r212
  • larger VIF
  • larger variances

where VIFk 1/(1-Rk²), k1,...,K and Rk² is the
coefficient of determination of regressing Xk on
all other (K-1) explanatory variables.
13
(No Transcript)
14
a. More likely to get unexpected signs.
Larger variances tend to increase the standard
errors of estimated coefficients.
c. Larger standard errors ? Lower t-values
15
d. Larger standard errors ? Wider
confidence intervals
Less precise interval estimates.
16
Detection of Multicollinearity
Example Data set CONS8 (pp. 254 255)
COi ?0 ?1Ydi ?2LAi ?i CO Annual
consumption expenditure Yd Annual disposable
income LA Liquid assets
17
Studenmund (2006) - Eq. 8.9, pp254
Since LA (liquid assets, saving, etc.) is highly
related to YD (disposable income)
Drop one variable
18
OLS estimates and SEs can be sensitive to
specification and small changes in data
Specification changes Add or drop variables
Small changes Add or drop some observations
Change some data values
19
High Simple Correlation Coefficients
Remark High rij for any i and j is a sufficient
indicator for the existence of multicollinearity
but not necessary.
20
Variance Inflation Factors (VIF) method
Procedures
Rule of thumb VIF gt 5 ? multicollinearity
Notes (a.) Using VIF is not a statistical test.
(b.) The cutting point is arbitrary.
21
Remedial Measures
1. Drop the Redundant Variable
Using theories to pick the variable(s) to
drop. Do not drop a variable that is strongly
supported by theory. (Danger of specification
error)
22
Since M1 and M2 are highly related
Other examples CPI ltgt WPI
CD rate ltgt TB rate GDP ? GNP ? GNI
23
  • Check after dropping variables
  • The estimation of the coefficients of other
    variables are not affected. (necessary)
  • R2 does not fall much when some collinear
    variables are dropped. (necessary)
  • More significant t-values vs. smaller standard
    errors (likely)

24
2. Redesigning the Regression Model
There is no definite rule for this method.
Example (Studenmund(2006), pp.268)
Ft average pounds of fish consumed per
capita PFt price index for fish PBt price
index for beef Ydt real per capita disposable
income N the of Catholic P dummy 1 after
the Pops 1966 decision, 0 otherwise
25
High correlations
VIFPF 43.4 VIFlnYd 23.3 VIFPB 18.9 VIFN
18.5 VIFP 4.4
Signs are unexpected Most t-values are
insignificant
26
Drop N, but not improved
Improved
27
Improved much
Using the lagged term of RP to allow the lag
effect in the regression Ft ?0 ?1RPt-1
?2lnYdt ?3Pt ?t
28
3. Using A Priori Information
From previous empirical work, e.g. Consi
?0 ?1Incomei ?2Wealthi ?i and a priori
information ?2 0.1. Then construct a new
variable or proxy, (Consi Consi
0.1Wealthi) Run OLS Consi ?0 ?1Incomei
?i
29
4. Transformation of the Model
Taking first differences of time series data.
Origin regression model Yt ?0 ?1X1t ?2X2t
?t Transforming model First differencing ?Yt
?0 ?1?X1t ?2?X2t ut Where ?Yt Yt-
Yt-1, (Yt-1 is called a lagged term)
?X1t X1t- X1,t-1, ?X2t X2t- X2,t-1,
30
5. Collect More Data (expand sample size)
Larger sample size means smaller variance of
estimators.
6. Doing Nothing
Unless multicollearity causes serious biased, and
the change of specification give better results.
Write a Comment
User Comments (0)
About PowerShow.com