QM1 Week 8 FTest in Multiple Linear Regressions OLS Assumptions - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

QM1 Week 8 FTest in Multiple Linear Regressions OLS Assumptions

Description:

Each t-statistic indicates the statistical significance for one regressor ... into a normally distributed variable (which passes the Skewness-Kurtosis test) ... – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 24

Provided by: sascha8

Category:

more less

Transcript and Presenter's Notes

Title: QM1 Week 8 FTest in Multiple Linear Regressions OLS Assumptions

1
QM1 Week 8F-Test in Multiple Linear
RegressionsOLS Assumptions

Dr Alexander Moradi
University of Oxford, Dept. of Economics
Email alexander.moradi_at_economics.ox.ac.uk

2
9.1 F-Test

Each t-statistic indicates the statistical
significance for one regressor
What if we want to test, whether a group of
variables has an effect on the dependent
variable?
F Test Multiple linear restrictions
Example
yab1x1b2x2b3x3b4x4e
H0 b1b2b30
In words explanatory variables x1, x2 and x3 do
not jointly influence y
H1 H0 is not true
In contrast, t-statistics refer to a test for
each single coefficient
H0 b10 H1 b1?0
H0 b20 H1 b2?0
H0 b30 H1 b3?0

3
9.1 F-Test

What is the effect of making the restrictions
(b1b2b30)?
Two regressions (1) without and (2) with the
restrictions in H0

nnumber of observations
knumber of estimated coefficients in the
unrestricted model
mnumber of restrictions
RSSUResidual Sum of Squares in the unrestricted
model
RSSRResidual Sum of Squares in the restricted
model
Example
(1) Unrestricted model yab1x1b2x2b3x3b4x4e
H0 b1b2b30
(2) Restricted model ydb5x4u
? k4 m3 n 64 RSSU?e² from (1) RSSR ?u²
from (2)

4
9.1 F-Test

If F(m, n-k)gt Fcrit, reject H0
? The restrictions can be rejected
? The variables do significantly explain the
variation in the dependent variable and
therefore, the variables must not be excluded
from the regression model
Fcrit depends on m, n, k. The exact value can be
found in F distribution tables (Appendix in
almost any statistical textbook)
Example Fcrit2.76 for m3, n-k60
STATA reports the p-value

5
9.1 F-Test

Hint for joint significance when excluding
variables with low t-values, R²-adj. decreases
considerably ? F Test should be carried out
F Test is used for all kinds of joint hypotheses,
e.g. whether there is a structural break,
b1b2b30, etc.
Intuition If we impose restrictions on
parameters in a regression model, will the
residual sum of squares significantly increase
(and the goodness of fit decrease)?

6
OLS Assumptions
7
9.2 OLS Assumptions

e is normal distributed
E(e)0 (no systematic influence of the
error term on y)
3. var(e)constant (homoscedasticity)
4. cov(ei,ej)0 (residuals do not
correlate)
5. cov(xi,et)0 (error term and the exogenous
variables do not correlate)
If one or more of the assumptions are violated,
OLS will lead to inconsistent estimates and
confidence intervals

8
9.3 Problems of Model Specification

Fundamental requirement Relationship between the
dependent variable and the explanatory variables
is correctly modelled
What is the underlying model that the data
follows?
How is the functional form? Is the relationship
linear?
Is the list of explanatory variables complete?
Are there structural breaks (are the parameters
stable?)

9
9.4 Homoskedasticity
Regression model a b x e (5 observations
were drawn)

y
yabx
a
x3
x
x5
x4
x1
x2
x
Homoscedasticity means that the variance of the
error term is equal across all observations
var(e)constant
10
9.4 Heteroscedasticity
y
yabx
Residuals follow a horn shaped pattern
a
x3
x
x5
x4
x1
x2
x
Heteroscedasticity error term is normal
distributed with mean 0, but the variance is no
longer constant the variance of the error term
differs across observations
11
9.4 Heteroscedasticity

Example WAGE aAGEe

12
9.4 Consequences of Heteroscedasticity

Note Random differences in the size of the
residuals across observations does not constitute
heteroscedasticity
? Error term must have a clear and systematic
(statistically significant) pattern of distortion
Consequences
OLS regression coefficients are unbiased
Effect on variance of residuals and standard
errors ? t-tests, F-Tests, and confidence
intervals are inconsistent and should not be
interpreted
? Testing for heteroscedasticity before testing
for statistical significance

13
9.4 Breusch-Pagan Test

Is there a pattern in the variation of residuals?
Breusch-Pagan Test for heteroscedasticity if
var(e)constant, there should be no significant
correlation of squared residuals with the
independent variables
Example three explanatory variables
yab1x1b2x2b3x3e

e²cd1x1d2x2d3x3 error term

Test Do regression coefficients (except for the
constant c) jointly differ from 0 (H0
d1d2d30)
H0 Constant variance/Homoscedasticity
H1 Heteroscedasticity

WAGE aAGEe
Variance of the error term is not constant
Residuals follow a horn shaped pattern

Squared Residuals Residuals are mirrored by the
regression line
Squared residuals increase with values of the
explanatory variables

15
9.4 Causes and Remedies

1. Differences in scale of variables
Remedy Transformation of dependent variable/
explanatory variables, e.g. log, square root,
square, etc.
2. Omitted variables/ factors that, with varying
values of the dependent variable, gain in
importance
Remedy Including the omitted explanatory
variables
3. True heteroscedasticity
Remedies
Weighted Least Squares weights the observations
with a factor that removes heteroscedasticity,
i.e. 1/var(ei)
Heteroscedasticity robust standard errors
(Huber/White/ sandwich estimate of variance)

16
9.5 Model Misspecification

Including irrelevant variables in a regression
model (That is, variables that have no partial
effect on y in the population ? population
coefficient is 0)
Regression coefficient is unbiased
Larger standard errors of the regression
coefficients
Omitting relevant variables The true underlying
model consists of determinants that we omit in
our regression model
Regression coefficients are biased
test statistics (t-statistics) are biased and
invalid

17
9.5 One Source of Endogeneity Omitted Variable
Bias

Other sources of endogeneity cov(xi,et)?0
Reverse causality
Simultaneity
If we know the true model, we can predict the
size and direction of the OVB
Example
True model yab1x1b2x2e1
Estimated model ycdx1e2
If cov(x1, x2)?0, then cov(x1, e2)?0

18
9.5 Omitted Variable Bias
True model yab1x1b2x2e1
Estimated model ycdx1e2
y
y
b1
b2
b2
b1
d
x1
x1
x2
x2
b3b2
b3
b3

? Consequence Biased estimate of the influence
of x1
Here x1b3x2u db1b2b3 (destimated impact
of x1 on y)
? d?b1
overestimation of the influence of edu, if
b2b3gt0
underestimation of the influence of edu, if
b2b3lt0

19
9.5.Omitted Variable Bias

Regression coefficients are unbiased, if
omitted variable is irrelevant (it does not
appear in the true model ? b20)
omitted variable does not correlate with the
explanatory variables included in the model ?
b30)
The more omitted variables and the less clear the
collinearities between included and omitted
variables, the less clear is the size and
direction of the bias

20
9.1 Exercise F-Test

Data set weimar_election.dta
Run a regression of Nazi votes on unemployment
rate, share of workers, Catholics, farmers, voter
participation, and dummy variables for each
general election
Are the explanatory variables jointly
significant?
Is the influence of unemployment on Nazi votes
constant over time? Test for parameter stability
in the unemployment rate over the four elections
Hint Use interaction terms election
dummyunemployment
p_nsdapF(unemp, workers, cath, farmers,
votpart, d3207, d3211, d3303, d3207unemp,
d3211unemp, d3303unemp)
Is the model in (3) correctly specified? What
about changing influences of the other
explanatory variables? Run a regression for each
election
Do the parameters significantly vary over the
four elections?
Repeat (3) with the last three elections (t
3207, 3211, 3303). Test whether the influence of
unemployment varied significantly in the last
three elections

21
9 Exercise Heteroscedasticity

Dataset india.dta
Estimate the model WIab1AGEb2EDUb3FEMALEb4EDU
FEMALEe. Interpret the results
Test for heteroscedasticity
Plot a scatterplot with the residuals from the
model in (1) on the vertical axis and AGE on the
horizontal axis
Would you expect the variance in the residuals to
be mainly dependent on AGE? What about the dummy
variables EDU and FEMALE?
What is the likely cause of heteroscedasticity?
Use the ladder of powers to arrive at a suitable
transformation of the wage variable
Use the log of wage as independent variable. Test
for heteroscedasticity
Estimate the model in (1) using robust standard
errors. Interpret the results. Compare the
results with (1) and (6). What model
specification would you prefer?