Hypothesis Testing, Model Specification and Multicollinearity

About This Presentation

Title:

Hypothesis Testing, Model Specification and Multicollinearity

Description:

Suppose that with 40 degrees of freedom in our model we obtain a t-statistic of 1.85 ... Significance of the Model ... The above model has been estimated by ... – PowerPoint PPT presentation

Number of Views:685

Avg rating:3.0/5.0

Slides: 27

Provided by: GeorgeP4

Category:

more less

Transcript and Presenter's Notes

Title: Hypothesis Testing, Model Specification and Multicollinearity

1
Hypothesis Testing, Model Specification and
Multicollinearity
2
Computing p-Values for t-Tests

Selecting and committing to a significance level
before we make the decision on a hypothesis can
hide useful information about the tests outcome
Example Suppose we want to test a null
hypothesis that a coefficient is zero against a
two-sided alternative
Suppose that with 40 degrees of freedom in our
model we obtain a t-statistic of 1.85
If we select a 5 significance level, we do not
reject the null hypothesis since the critical
t-value is 2.021

3
Computing p-Values for t-Tests

If our goal is NOT to reject the null, we would
report the above outcome at the 5 level of
significance
However, at the 10 level of significance, the
null would be rejected (t-critical 1.648)
Rather than selecting and testing at different
significant levels, it is better to provide an
answer to the following question
What is the smallest significance level at which
the null would be rejected?

4
Computing p-Values for t-Tests

This level is known as the p-value and gives us
the probability of type I error if we reject the
null hypothesis
P-values are computed by software packages, such
as SPSS
Interpretation of p-value shows the probability
of observing a t-statistic as extreme as we did
if the null hypothesis is true
Implication small p-values are evidence against
the null while large p-values provide little
evidence against the null

5
The F-Test Testing the Overall Significance of
the Model

The t-test cannot be used to test hypotheses
about more than one coefficient in our model
The F-test is used to test the overall
significance of the regression equation
In a model with K explanatory variables, the two
hypotheses are
H0 ?1 ?2 ?k 0
H1 H0 not true

6
The F-Test Testing the Overall Significance of
the Model

The F-statistic is
The decision rule is
Reject H0 if F ? Fc
Do not reject H0 if F
where Fc is the critical value determined by the
table of the F-distribution with K and n-K-1
degrees of freedom

7
Model Specification Errors Omitting Relevant
Variables and Including Irrelevant Variables

To properly estimate a regression model, we need
to have specified the correct model
A typical specification error occurs when the
estimated model does not include the correct set
of explanatory variables
This specification error takes two forms
Omitting one or more relevant explanatory
variables
Including one or more irrelevant explanatory
variables
Either form of specification error results in
problems with OLS estimates

8
Model Specification Errors Omitting Relevant
Variables

Example Two-factor model of stock returns
Suppose that the true model that explains a
particular stocks returns is given by a
two-factor model with the growth of GDP and the
inflation rate as factors
Suppose instead that we estimated the following
model

9
Model Specification Errors Omitting Relevant
Variables

The above model has been estimated by omitting
the explanatory variable INF
Thus, the error term of this model is actually
equal to
If there is any correlation between the omitted
variable (INF) and the explanatory variable
(GDP), then there is a violation of classical
assumption III

10
Model Specification Errors Omitting Relevant
Variables

This means that the explanatory variable and the
error term are not uncorrelated
If that is the case, the OLS estimate of ?1 (the
coefficient of GDP) will be biased
As in the above example, it is highly likely that
there will be some correlation between two
financial (or economic) variables
If, however, the correlation is low or the true
coefficient of the omitted variable is zero, then
the specification error is very small

11
Model Specification Errors Omitting Relevant
Variables

How can we correct the omitted variable bias in a
model?
A simple solution is to add the omitted variable
back to the model, but the problem with this
solution is to be able to detect which is the
omitted variable
Omitted variable bias is hard to detect, but
there could be some obvious indications of this
specification error
For example, our estimated model has a
significant coefficient with the opposite sign
from that expected by our arguments

12
Model Specification Errors Omitting Relevant
Variables

The best way to detect the omitted variable
specification bias is to rely on the theoretical
arguments behind the model
Which variables does the theory suggest should be
included?
What are the expected signs of the coefficients?
Have we omitted a variable that most other
similar studies include in the estimated model?
Note, though, that a significant coefficient with
the unexpected sign can also occur due to a small
sample size
However, most of the data sets used in empirical
finance are large enough that this most likely is
not the cause of the specification bias

13
Model Specification Errors Including Irrelevant
Variables

Example Going back to the two-factor model,
suppose that we include a third explanatory
variable in the model, for example, the degree of
wage inequality (INEQ)
So, we estimate the following model
The inclusion of an irrelevant variable (INEQ) in
the model increases the standard errors of the
estimated coefficients and, thus, decreases the
t-statistics

14
Model Specification Errors Including Irrelevant
Variables

This implies that it will be more difficult to
reject a null hypothesis that a coefficient of
one of the explanatory variables is equal to zero
Also, the inclusion of an irrelevant variable
will usually decrease the adjusted R-sq (but not
the R-sq)
Finally, we can show that the inclusion of an
irrelevant variable does still allow us to obtain
unbiased estimates of the models coefficients

15
Dummy Variables Incorporating Qualitative
Information in the Model

In several cases, it may be necessary to include
explanatory information in the model in the form
of a qualitative variable
Example Suppose we want to estimate a model of
the relationship between firm performance (ROE)
and board independence
We are interested in empirically testing the
argument that greater board independence leads to
better firm performance
In our estimated model, the null hypothesis would
be that greater board independence will not
affect or result in worse firm performance

16
Dummy Variables Incorporating Qualitative
Information in the Model

We can measure board independence by, for
example, the proportion of independent directors
However, how can we measure the impact of the
fact that in some firms the CEO is also the
Chairman of the Board?
We assume that performance will be different in
firms with this attribute (given everything else)
compared to those where this is not true
We can capture this effect through the inclusion
of a dummy variable as an explanatory variable in
our model

17
Dummy Variables Incorporating Qualitative
Information in the Model

In this example, the dummy variable will take the
value of
1 for firms where the CEO is also the Chairman
0 otherwise
Therefore, we estimate the following model (in
general form)
where Di 1 if the ith observation satisfies
our condition, and 0 otherwise

18
Dummy Variables Incorporating Qualitative
Information in the Model
Y
Di 0
?k1
Di 1
?0
X
19
Dummy Variables Incorporating Qualitative
Information in the Model

Example Suppose we empirically test the
relationship between a firms size and its
monthly stock returns with a sample of time
series data
In our model, we should include a dummy variable
that accounts for the well-known phenomenon of
January effect
The dummy variable will take the following values
1 for the observation of returns in the month of
January
0 for all other observations of monthly returns

20
Dummy Variables Incorporating Qualitative
Information in the Model

The above cases are examples of intercept dummy
variables meaning that inclusion of the dummy
variable shifts the regression line, but does not
change its slope
Another form of a dummy variable is a slope dummy
variable that allows the slope of a regression to
change depending on whether a condition is
satisfied
Example Suppose we want to test the argument
that the relationship between credit card lending
and loan losses for a particular bank has changed
in the last five years

21
Dummy Variables Incorporating Qualitative
Information in the Model

I.e., suppose that credit card lending
contributes less to this banks loan losses due
to the implementation of better risk evaluation
methods (credit scoring)
We estimate the following model
where the variable T is a time dummy variable
that takes the value
1 for observations in the past five years
0 otherwise

22
Dummy Variables Incorporating Qualitative
Information in the Model

In this case, the coefficient of the CARDLN
variable is
?1 - ?2 for observations from the past five years
?1 otherwise
Dummy variable trap Always include one less
dummy variable than the possible qualitative
states in the data
Example If there are 3 qualitative states, for
example, small, medium, large firms, we should
include 2 dummy variables

23
Multicollinearity

Multicollinearity occurs when some or all of the
explanatory variables in the regression model are
highly correlated
In this case, assumption VI of the classical
model does not hold and OLS estimates lose some
of their nice properties
It is common, particularly in the case of time
series data, that two or more explanatory
variables are correlated
When multicollinearity is present, the estimated
coefficients are unstable in the degree of
statistical significance, magnitude and sign

24
The Impact of Multicollinearity on the OLS
Estimates

Multicollinearity has the following consequences
on the OLS estimated model
The OLS estimates remain unbiased
The standard errors of the estimated coefficients
are higher and, thus, the t-statistics fall
OLS estimates become very sensitive to the
addition or removal of explanatory variables or
to changes in the data sample
The overall fit of the regression (and the
significance of non-multicollinear coefficients)
is to a large extent unaffected
This implies that a sign of multicollinearity is
a high adjusted R-sq and no statistically
significant coefficients

25
Detecting Multicollinearity

One approach to detect multicollinearity is to
examine the simple correlation coefficients
between explanatory variables
This will be shown in the correlation matrix
between the models variables
Some researchers consider a correlation
coefficient with an absolute value above .80 to
be an indication of concern for multicollinearity
A second detection approach is to use the
Variance Inflation Factor (VIF)

26
Detecting Multicollinearity

The VIF method tries to detect multicollinearity
by examining the degree to which a given
explanatory variable is explained by the others
The method involves the following steps
Run an OLS regression of the explanatory variable
Xi on all other explanatory variables
Calculate the VIF for the coefficient of variable
Xi given by 1/(1 R2i) where the R-sq
is that given by the regression
Evaluate the size of the VIF
Rule of thumb if the VIF of the coefficient of
explanatory variable Xi is greater than 5 then
the higher is the impact of multicollinearity on
the estimated coefficient of this variable

Write a Comment

User Comments (0)

About PowerShow.com

Hypothesis Testing, Model Specification and Multicollinearity - PowerPoint PPT Presentation

Hypothesis Testing, Model Specification and Multicollinearity

Suppose that with 40 degrees of freedom in our model we obtain a t-statistic of 1.85 ... Significance of the Model ... The above model has been estimated by ... – PowerPoint PPT presentation