Title: QM1 Week 8 FTest in Multiple Linear Regressions OLS Assumptions
1QM1 Week 8F-Test in Multiple Linear
RegressionsOLS Assumptions
- Dr Alexander Moradi
- University of Oxford, Dept. of Economics
- Email alexander.moradi_at_economics.ox.ac.uk
29.1 F-Test
- Each t-statistic indicates the statistical
significance for one regressor - What if we want to test, whether a group of
variables has an effect on the dependent
variable? - F Test Multiple linear restrictions
- Example
- yab1x1b2x2b3x3b4x4e
- H0 b1b2b30
- In words explanatory variables x1, x2 and x3 do
not jointly influence y - H1 H0 is not true
- In contrast, t-statistics refer to a test for
each single coefficient - H0 b10 H1 b1?0
- H0 b20 H1 b2?0
- H0 b30 H1 b3?0
39.1 F-Test
- What is the effect of making the restrictions
(b1b2b30)? - Two regressions (1) without and (2) with the
restrictions in H0
- nnumber of observations
- knumber of estimated coefficients in the
unrestricted model - mnumber of restrictions
- RSSUResidual Sum of Squares in the unrestricted
model - RSSRResidual Sum of Squares in the restricted
model - Example
- (1) Unrestricted model yab1x1b2x2b3x3b4x4e
- H0 b1b2b30
- (2) Restricted model ydb5x4u
- ? k4 m3 n 64 RSSU?e² from (1) RSSR ?u²
from (2)
49.1 F-Test
- If F(m, n-k)gt Fcrit, reject H0
- ? The restrictions can be rejected
- ? The variables do significantly explain the
variation in the dependent variable and
therefore, the variables must not be excluded
from the regression model - Fcrit depends on m, n, k. The exact value can be
found in F distribution tables (Appendix in
almost any statistical textbook) - Example Fcrit2.76 for m3, n-k60
- STATA reports the p-value
59.1 F-Test
- Hint for joint significance when excluding
variables with low t-values, R²-adj. decreases
considerably ? F Test should be carried out - F Test is used for all kinds of joint hypotheses,
e.g. whether there is a structural break,
b1b2b30, etc. - Intuition If we impose restrictions on
parameters in a regression model, will the
residual sum of squares significantly increase
(and the goodness of fit decrease)?
6OLS Assumptions
79.2 OLS Assumptions
- e is normal distributed
- E(e)0 (no systematic influence of the
error term on y) - 3. var(e)constant (homoscedasticity)
- 4. cov(ei,ej)0 (residuals do not
correlate) - 5. cov(xi,et)0 (error term and the exogenous
variables do not correlate) - If one or more of the assumptions are violated,
OLS will lead to inconsistent estimates and
confidence intervals
89.3 Problems of Model Specification
- Fundamental requirement Relationship between the
dependent variable and the explanatory variables
is correctly modelled - What is the underlying model that the data
follows? - How is the functional form? Is the relationship
linear? - Is the list of explanatory variables complete?
- Are there structural breaks (are the parameters
stable?)
99.4 Homoskedasticity
Regression model a b x e (5 observations
were drawn)
y
yabx
a
x3
x
x5
x4
x1
x2
x
Homoscedasticity means that the variance of the
error term is equal across all observations
var(e)constant
109.4 Heteroscedasticity
y
yabx
Residuals follow a horn shaped pattern
a
x3
x
x5
x4
x1
x2
x
Heteroscedasticity error term is normal
distributed with mean 0, but the variance is no
longer constant the variance of the error term
differs across observations
119.4 Heteroscedasticity
129.4 Consequences of Heteroscedasticity
- Note Random differences in the size of the
residuals across observations does not constitute
heteroscedasticity - ? Error term must have a clear and systematic
(statistically significant) pattern of distortion - Consequences
- OLS regression coefficients are unbiased
- Effect on variance of residuals and standard
errors ? t-tests, F-Tests, and confidence
intervals are inconsistent and should not be
interpreted - ? Testing for heteroscedasticity before testing
for statistical significance
139.4 Breusch-Pagan Test
- Is there a pattern in the variation of residuals?
- Breusch-Pagan Test for heteroscedasticity if
var(e)constant, there should be no significant
correlation of squared residuals with the
independent variables - Example three explanatory variables
yab1x1b2x2b3x3e
e²cd1x1d2x2d3x3 error term
- Test Do regression coefficients (except for the
constant c) jointly differ from 0 (H0
d1d2d30) - H0 Constant variance/Homoscedasticity
- H1 Heteroscedasticity
14- WAGE aAGEe
- Variance of the error term is not constant
- Residuals follow a horn shaped pattern
- Squared Residuals Residuals are mirrored by the
regression line - Squared residuals increase with values of the
explanatory variables
159.4 Causes and Remedies
- 1. Differences in scale of variables
- Remedy Transformation of dependent variable/
explanatory variables, e.g. log, square root,
square, etc. - 2. Omitted variables/ factors that, with varying
values of the dependent variable, gain in
importance - Remedy Including the omitted explanatory
variables - 3. True heteroscedasticity
- Remedies
- Weighted Least Squares weights the observations
with a factor that removes heteroscedasticity,
i.e. 1/var(ei) - Heteroscedasticity robust standard errors
(Huber/White/ sandwich estimate of variance)
169.5 Model Misspecification
- Including irrelevant variables in a regression
model (That is, variables that have no partial
effect on y in the population ? population
coefficient is 0) - Regression coefficient is unbiased
- Larger standard errors of the regression
coefficients - Omitting relevant variables The true underlying
model consists of determinants that we omit in
our regression model - Regression coefficients are biased
- test statistics (t-statistics) are biased and
invalid
179.5 One Source of Endogeneity Omitted Variable
Bias
- Other sources of endogeneity cov(xi,et)?0
- Reverse causality
- Simultaneity
- If we know the true model, we can predict the
size and direction of the OVB - Example
- True model yab1x1b2x2e1
- Estimated model ycdx1e2
- If cov(x1, x2)?0, then cov(x1, e2)?0
189.5 Omitted Variable Bias
True model yab1x1b2x2e1
Estimated model ycdx1e2
y
y
b1
b2
b2
b1
d
x1
x1
x2
x2
b3b2
b3
b3
- ? Consequence Biased estimate of the influence
of x1 - Here x1b3x2u db1b2b3 (destimated impact
of x1 on y) - ? d?b1
- overestimation of the influence of edu, if
b2b3gt0 - underestimation of the influence of edu, if
b2b3lt0
199.5.Omitted Variable Bias
- Regression coefficients are unbiased, if
- omitted variable is irrelevant (it does not
appear in the true model ? b20) - omitted variable does not correlate with the
explanatory variables included in the model ?
b30) - The more omitted variables and the less clear the
collinearities between included and omitted
variables, the less clear is the size and
direction of the bias
209.1 Exercise F-Test
- Data set weimar_election.dta
- Run a regression of Nazi votes on unemployment
rate, share of workers, Catholics, farmers, voter
participation, and dummy variables for each
general election - Are the explanatory variables jointly
significant? - Is the influence of unemployment on Nazi votes
constant over time? Test for parameter stability
in the unemployment rate over the four elections
Hint Use interaction terms election
dummyunemployment - p_nsdapF(unemp, workers, cath, farmers,
votpart, d3207, d3211, d3303, d3207unemp,
d3211unemp, d3303unemp) - Is the model in (3) correctly specified? What
about changing influences of the other
explanatory variables? Run a regression for each
election - Do the parameters significantly vary over the
four elections? - Repeat (3) with the last three elections (t
3207, 3211, 3303). Test whether the influence of
unemployment varied significantly in the last
three elections
219 Exercise Heteroscedasticity
- Dataset india.dta
- Estimate the model WIab1AGEb2EDUb3FEMALEb4EDU
FEMALEe. Interpret the results - Test for heteroscedasticity
- Plot a scatterplot with the residuals from the
model in (1) on the vertical axis and AGE on the
horizontal axis - Would you expect the variance in the residuals to
be mainly dependent on AGE? What about the dummy
variables EDU and FEMALE? - What is the likely cause of heteroscedasticity?
- Use the ladder of powers to arrive at a suitable
transformation of the wage variable - Use the log of wage as independent variable. Test
for heteroscedasticity - Estimate the model in (1) using robust standard
errors. Interpret the results. Compare the
results with (1) and (6). What model
specification would you prefer?
229 STATA commands
239 Homework Exercises Week 8
- Read chapter 9.3 of FT (p. 268-272)
- Do the following exercises from FT (p. 278) 3
- Read chapter 11 of FT (p. 300-311, 316-325)
- Do the following exercises from FT (p. 325-329)
1, 6, 8