Title: Econometric Methods 1
1Econometric Methods 1
- Lecture 4 Specification Tests and Violations of
the OLS assumptions
2Tests of multiple hypotheses
- Can more than one hypothesis at time in
regression. i.e. the constant equals -60 and the
slope coefficients are equal but opposite in
sign. - By multiple hypothesis mean there is more than
one sign in H0. e.g. the above hypothesis can
be written - Hypothesis is essentially a restriction on the
model, a simplification. Here we have q 2
restrictions in the null hypothesis. We might
consider estimating both the unrestricted and the
restricted model.
H0 ß1 -60 and ß2 - ß 3 vs H1 ß1 ? -60 or ß2
? - ß 3 or both.
3Tests of multiple hypotheses
- Restricted model
- y b1 b2 x2 b3 x3 u
- while the restricted version is
- y 60 b2 (x2 x3) u'
4Example
- Illustrate with farm data. Let our
unrestricted regression be - Source SS df MS
Number of obs 369 - -------------------------------------------
F( 6, 362) 29.56 - Model 67.6016953 6 11.2669492
Prob gt F 0.0000 - Residual 137.966779 362 .381123698
R-squared 0.3289 - -------------------------------------------
Adj R-squared 0.3177 - Total 205.568474 368 .558609983
Root MSE .61735 - --------------------------------------------------
---------------------------- - lva Coef. Std. Err. t
Pgtt 95 Conf. Interval - -------------------------------------------------
---------------------------- - lfs -.1879987 .0519786 -3.62
0.000 -.2902165 -.0857808 - propwell 1.01175 .1179663 8.58
0.000 .7797651 1.243736 - propcan 1.14663 .1882754 6.09
0.000 .7763792 1.516881 - lpopd .2375268 .0527616 4.50
0.000 .1337692 .3412845 - lplotn .2219142 .071631 3.10
0.002 .081049 .3627793 - lfamsize .0256384 .0846921 0.30
0.762 -.1409118 .1921886 - _cons 7.071058 .171034 41.34
0.000 6.734713 7.407403
5Example
- Test joint restriction b(lfamsize) 0 and
b(propwell) b(propcan). Excluding lfamsize and
creating a variable (propirr) which is sum of
propwell and propcan. - Source SS df MS
Number of obs 369 - -------------------------------------------
F( 4, 364) 44.37 - Model 67.3785946 4 16.8446486
Prob gt F 0.0000 - Residual 138.189879 364 .379642526
R-squared 0.3278 - -------------------------------------------
Adj R-squared 0.3204 - Total 205.568474 368 .558609983
Root MSE .61615 - --------------------------------------------------
---------------------------- - lva Coef. Std. Err. t
Pgtt 95 Conf. Interval - -------------------------------------------------
---------------------------- - lfs -.1845138 .0516282 -3.57
0.000 -.2860408 -.0829868 - propirr 1.034815 .11124 9.30
0.000 .8160611 1.253569 - lpopd .2299474 .0515964 4.46
0.000 .1284829 .3314119 - lplotn .2234388 .0682978 3.27
0.001 .089131 .3577466 - _cons 7.110554 .1296769 54.83
0.000 6.855544 7.365564 - --------------------------------------------------
----------------------------
6Example of hypothesis testing
- Can test using classical or ML methods, comparing
the two regressions (restricted and
unrestricted). Need information on the RSS and
likelihoods. -
-
- Remember that
7Example
- For the F test
- 0.2886 lt 3.0 F(2,) at 5 Cannot Reject H0
- q is the number of restrictions in the null.
- Sometimes tricky to work out number of
restrictions in hypothesis tests. q (degrees of
freedom in the numerator) can be calculated as
the difference between the degrees of freedom in
each of the restricted and unrestricted models.
In the denominator we simply have the number of
degrees of freedom in the unrestricted model. - LR test. -2 ln (LogLr LogLu) 0.6 lt (5)
5.99
8Specification Problems Stability testing
- Common problem is that regression may not apply
over whole sample. - e.g. the oil price shocks in the 1970s may
have altered price and wages setting and hence
the parameters of the Phillips curve. - We use a Chow test
- Suppose we want to test if the sample can be
split into two sub-samples. We could - - (a) estimate one equation for each sub-sample,
or - - (b) estimate one equation over the whole
sample - The RSS in the latter case must be larger than
the (sum of the) former. If there is a big enough
difference we reject the null of stability.
9Chow Test
- Formally, we wish to test for a break after n1
periods. - estimate y Xb e over the first n1
observations. Obtain RSS1. - then estimate y Xb' e over the remaining n2
observations. Obtain RSS2. - calculate RSSU RSS1Â RSS2
- estimate the equation over the whole sample.
Obtain RSSR. - use the F-test F
- If greater than the critical value, reject H0
coefficients are stable.
10Specification Problems Omitted explanatory
variables
- The true model is Y b1 b2 X2 b3 X3 u
- we estimate Y b1 b2 X2 u by mistake,
i.e. leave out X3. - Consequences OLS estimate of b2 is in general
biased. - b2
- and E(b2)
- This equals b2 only if (a) b3 0 or (b) there is
zero correlation between X2 and X3.
11Omitted explanatory variables other
consequences
- The estimates are also inconsistent, unless all
correlations between included and excluded
variables disappear as n ? ?. - The estimated variance of b2 in the mis-specified
model could be larger or smaller than in the
correct model. In practice, it is often 'too
large'. - Omission could lead to symptoms such as
autocorrelation or heteroscedasticity in the
error term (see later) - Tests of functional form, stability and normality
of the errors may be failed. - Exercise had example of omission of house
characteristics in regression of price on
distance to incinerator
12Specification Problems Inclusion of an
irrelevant regressor
- This is generally a less severe problem.
- The true model is Y b1 b2 X2 u
- We estimate Y b1 b2 X2 b3 X3 u
- The coefficient b3 should be insignificantly
different from zero (its true value). The
estimate of b2 will be unbiased but, in general,
there is some loss in efficiency, - i.e. the sampling variance of b2 is larger than
it need be. Hence, should omit X2 .
13Specification ProblemsMulticollinearity
- If two variables move together (over time or in
cross-section) then it is difficult to tell which
is influencing Y. - If move exactly together, then it is impossible.
Multicollinearity is problem of degree, ranging
from perfect to zero. - Perfect multicollinearity
- Linear relationship variables, e.g. X2 a bX3.
- - X has less than full column rank.
- - (X'X)-1 doesn't exist, so no estimates.
- In practice, perfect multicollinearity often
occurs by mistake - entering the same X variable
twice or too many dummy variables.
14Near Perfect Multicollinearity
- Approximate linear dependence X still has full
column rank, so OLS estimates exist. - (X'X) is nearly singular (i.e. determinant is
near to 0) and (X'X)-1 is large. - OLS is still BLUE, but V(b) may be large
- Hence inference may be affected - t-ratios are
small and you will think variables are not
significant when in fact they are. - All models contain some multicollinearity,
question of when it is problem.
15Near Perfect Multicollinearity
- Symptoms
- There is no one test, since multicollinearity is
a question of degree. - High R2 and F, but high variances and low
t-statistics - High correlations between X variables (neither
necessary nor sufficient). - Small changes in sample data lead to big changes
in estimates (of coefficients and variances) - Cures
- respecify after thinking hard about the
possible causalities and correlations .
16Specification ProblemsFunctional form
- Have assumed the equation is linear. Ramsey's
RESET tests whether this is appropriate - Add powers of the X variables to the estimating
equation. - Instead of using powers of X, we use powers of
- The procedure is
- 1. estimate Y b1 b2 X2 ... bk Xk u
- 2. obtain fitted values
- 3. estimate Y b'1 b'2 X2 ... b'k Xk c
2 u' - 4. test for the significance of c. If it is
significantly different from 0, reject the linear
specification.
17Example RESET Test on Farm data
- Take fitted values and square them ? fits2.
- reg lva lfs propirr lpopd frag fits2
- Source SS df MS
Number of obs 369 - -------------------------------------------
F( 5, 363) 35.79 - Model 67.8727105 5 13.5745421
Prob gt F 0.0000 - Residual 137.695763 363 .379327172
R-squared 0.3302 - -------------------------------------------
Adj R-squared 0.3209 - Total 205.568474 368 .558609983
Root MSE .6159 - --------------------------------------------------
---------------------------- - lva Coef. Std. Err. t
Pgtt 95 Conf. Interval - -------------------------------------------------
---------------------------- - lfs -.6413701 .4036006 -1.59
0.113 -1.435059 .1523188 - propirr 3.653044 2.296729 1.59
0.113 -.8635211 8.169609 - lpopd .7999175 .5020516 1.59
0.112 -.1873774 1.787212 - frag .7609251 .4758566 1.60
0.111 -.1748567 1.696707 - fits2 -.1606089 .1407221 -1.14
0.254 -.4373418 .116124 - _cons 15.21461 7.101781 2.14
0.033 1.248809 29.18041
18Specification Problems Normality
- Errors assumed to be Normal for inference. Are
they? - Tests based on the skewness and kurtosis
- - skewness related to the 3rd moment kurtosis to
the 4th - - these should be 0 and 3 respectively in Normal
- Jarque-Bera Test is based on this, H0 Normal
- JB
Under the null - Where sum of residuals each
raised to the - power j, divided by the sample size
19Heteroscedasticity
- Relax some of the classical assumptions. We look
now at assumption A2 and relax this. - Recall
- A2. E(uu) s2 I.
- This is a matrix with s 2 on the diagonals,
zeroes everywhere else. We can relax this in two
ways - 1. have non-constant values on the diagonal
- 2. have non-zero elements off the diagonal
- 1. leads to the problem of heteroscedasticity,
- 2. leads to autocorrelation (later).
20Heteroscedasticity
- Variance of error term now varies across
observations. - It may increase with one X variable, with time,
or in some more complex way. - (NB we still assume independence of the errors).
- Often occurs in cross-section data when wide
range to the X variables, - e.g. holiday expenditure and income probably
rises with income but also becomes more variable.
- It can also arise in grouped data, when each
observation is an average for a group (village,
industry, etc) and groups different sizes.
21Heteroscedasticity
- We now assume
-
- V(b) s2O
- and in general s12 ? s 22 etc.
22Heteroscedasticity
23Consequences of heteroscedasticity
- OLS estimates of ß unbiased and consistent
- - but inefficient (not BLUE) and not
asymptotically efficient. - - E(b) b E(S(xu)/Sx2) ) still no
correlation between x and u - (E(xu) 0). Hence OLS still unbiased.
- estimate of the error variance is biased,
invalidating inference. - Inefficiency is harder to demonstrate,
- - OLS gives equal weight to all observations.
- - If know which error terms have high variance,
better to give less weight to them.
24Goldfeld-Quandt test
- Applicable if heteroscedasticity is related to
only one of the x variables, say x1. The
procedure is (where s2 increases with x1) - order your observations in increasing order of x1
- omit the central r ? n/3 observations
- run a regression on the first n1 (n-r)/2 obs,
obtaining RSS1 - run a regression on the last n2 (n-r)/2 obs,
obtaining RSS2. - the test statistic for H0 s12 s22 ... sn2
is - F RSS2/RSS1 Fc1,c2 where ci (ni - r -
2k)/2. - bit restrictive since you have to know which is
x1.
25White's test
- We assume that V(ui) s 2 g(E(Yi))2
- i.e. the variance of u depends upon the expected
value of Y (and hence on at least one of the x
variables). Hence - V(ui) s2 g(b1 b2 x2 ... bk xk)2
- If H0 g 0 then we have homoscedasticity,
otherwise heteroscedasticity. - Hence we regress e2 on x22, x32, etc and x2 x3,
x3 x4, etc (ie all the cross-products). - From this auxiliary regression we obtain nR2
c2(k-1) as an LM statistic.
26Example Farm data
- After running our standard regression, regress
squared residuals on original Xs, plus their
squares and cross-product - nR2 369.0803 29.6 ?2(14) Here 5 is 26.1
and 1 is 29.1, so it is significant at the 1
level. - Reject the null of homoscedasticity
- Simpler version uses instead of all the Xs
- In other words the auxiliary regression is
- e2 a b 2 v
- Test is H0 b 0.
- This gives similar results. t -2.1, again
between the 5 and 1 levels.
27A now-standard cure
- Whites heteroscedasticity consistent standard
errors (HCSEs) - Since it is only the standard errors rather than
the coefficients themselves which are biased (and
inconsistent), another approach is to find
alternative estimates of the standard errors. - White suggests replacing si2 s2 with si2 ei2,
the appropriate OLS residual. This is
implemented in STATA using the robust option.
28Estimation when heteroscedasticity is present
- We illustrate the use of generalised least
squares (GLS). - OLS assumes E(uu) s2 I, a very restricted form
of the error structure. - GLS allows the assumption that E(uu) s2 W,
where W is any n ? n matrix. - Taking account of W GLS estimates give us BLUEs.
- There are two (equivalent) ways of obtaining GLS
estimates - use a different formula from OLS
- transform the data and use OLS on the transformed
data.
29Generalised least squares (GLS)
- Need to use the GLS estimator for efficient
estimates b (X' W-1 X)-1 X' W-1 y - Let y Xb u with E(uu) s2 W. e.g. of W li
on diagonal, zeroes elsewhere. - Let PP W. (P always exists if W is
symmetric and positive-definite). - Hence P-1 W P-1 I (easy to show)
- and P-1 P-1 W-1 (NOTE prime)
- Now premultiply the original model by P-1
- P-1 y P-1 Xb P-1 u ? y Xb u
- where y P-1 y, etc.
30Generalised least squares (GLS)
- Note now that E(uu) E(P-1 u u P-1 )
P-1 E(uu) P-1 P-1 s2 W P-1 s2 I - So the starred model satisfies the classical
assumptions about u and we could apply OLS to
obtain BLUEs. - We have b (XX)-1 Xy (X P-1 P-1
X)-1 X P-1 ' P-1 y (X' W-1 X)-1 X' W-1 y - Note that W-1 matrix with 1/li on diagonal.
- V(b) s2 (X X)-1 s2 (X W-1 X)-1
- and our estimate of s2 is e W-1 e/(n-k)
31Generalised least squares (GLS)
- If know W can use the GLS formulae to find BLUEs.
Alternatively, from W find P-1 and transform the
data and use OLS - How do we find P-1? Have to assume something
giving us feasible GLS. Suppose a graph suggests
that s2 ? x22, i.e. E(ui2) s2 x2i2. - Hence W
32Generalised least squares (GLS)
- Hence P-1
- and so P-1 y y1/x21 y2 /x22 ... yn /x2n
- Similarly, all the explanatory variables are
divided by x2. Hence instead of estimating - y b1 b2 x2 ... bkxk u. We estimate
(by OLS) - y/x2 b1 /x2 b2 b3 x3 /x2 ... bk xk /x2
u/x2 - i.e. we deflate all variables by x2.
- Note that V(u/x2) V(u)/x22 s2 x2/x2 s2.