Title: Hypothesis Testing, Model Specification and Multicollinearity
1Hypothesis Testing, Model Specification and
Multicollinearity
2Computing p-Values for t-Tests
- Selecting and committing to a significance level
before we make the decision on a hypothesis can
hide useful information about the tests outcome - Example Suppose we want to test a null
hypothesis that a coefficient is zero against a
two-sided alternative - Suppose that with 40 degrees of freedom in our
model we obtain a t-statistic of 1.85 - If we select a 5 significance level, we do not
reject the null hypothesis since the critical
t-value is 2.021
3Computing p-Values for t-Tests
- If our goal is NOT to reject the null, we would
report the above outcome at the 5 level of
significance - However, at the 10 level of significance, the
null would be rejected (t-critical 1.648) - Rather than selecting and testing at different
significant levels, it is better to provide an
answer to the following question - What is the smallest significance level at which
the null would be rejected?
4Computing p-Values for t-Tests
- This level is known as the p-value and gives us
the probability of type I error if we reject the
null hypothesis - P-values are computed by software packages, such
as SPSS - Interpretation of p-value shows the probability
of observing a t-statistic as extreme as we did
if the null hypothesis is true - Implication small p-values are evidence against
the null while large p-values provide little
evidence against the null
5The F-Test Testing the Overall Significance of
the Model
- The t-test cannot be used to test hypotheses
about more than one coefficient in our model - The F-test is used to test the overall
significance of the regression equation - In a model with K explanatory variables, the two
hypotheses are - H0 ?1 ?2 ?k 0
- H1 H0 not true
6The F-Test Testing the Overall Significance of
the Model
- The F-statistic is
- The decision rule is
- Reject H0 if F ? Fc
- Do not reject H0 if F
- where Fc is the critical value determined by the
table of the F-distribution with K and n-K-1
degrees of freedom
7Model Specification Errors Omitting Relevant
Variables and Including Irrelevant Variables
- To properly estimate a regression model, we need
to have specified the correct model - A typical specification error occurs when the
estimated model does not include the correct set
of explanatory variables - This specification error takes two forms
- Omitting one or more relevant explanatory
variables - Including one or more irrelevant explanatory
variables - Either form of specification error results in
problems with OLS estimates
8Model Specification Errors Omitting Relevant
Variables
- Example Two-factor model of stock returns
- Suppose that the true model that explains a
particular stocks returns is given by a
two-factor model with the growth of GDP and the
inflation rate as factors - Suppose instead that we estimated the following
model
9Model Specification Errors Omitting Relevant
Variables
- The above model has been estimated by omitting
the explanatory variable INF - Thus, the error term of this model is actually
equal to - If there is any correlation between the omitted
variable (INF) and the explanatory variable
(GDP), then there is a violation of classical
assumption III
10Model Specification Errors Omitting Relevant
Variables
- This means that the explanatory variable and the
error term are not uncorrelated - If that is the case, the OLS estimate of ?1 (the
coefficient of GDP) will be biased - As in the above example, it is highly likely that
there will be some correlation between two
financial (or economic) variables - If, however, the correlation is low or the true
coefficient of the omitted variable is zero, then
the specification error is very small
11Model Specification Errors Omitting Relevant
Variables
- How can we correct the omitted variable bias in a
model? - A simple solution is to add the omitted variable
back to the model, but the problem with this
solution is to be able to detect which is the
omitted variable - Omitted variable bias is hard to detect, but
there could be some obvious indications of this
specification error - For example, our estimated model has a
significant coefficient with the opposite sign
from that expected by our arguments
12Model Specification Errors Omitting Relevant
Variables
- The best way to detect the omitted variable
specification bias is to rely on the theoretical
arguments behind the model - Which variables does the theory suggest should be
included? - What are the expected signs of the coefficients?
- Have we omitted a variable that most other
similar studies include in the estimated model? - Note, though, that a significant coefficient with
the unexpected sign can also occur due to a small
sample size - However, most of the data sets used in empirical
finance are large enough that this most likely is
not the cause of the specification bias
13Model Specification Errors Including Irrelevant
Variables
- Example Going back to the two-factor model,
suppose that we include a third explanatory
variable in the model, for example, the degree of
wage inequality (INEQ) - So, we estimate the following model
- The inclusion of an irrelevant variable (INEQ) in
the model increases the standard errors of the
estimated coefficients and, thus, decreases the
t-statistics
14Model Specification Errors Including Irrelevant
Variables
- This implies that it will be more difficult to
reject a null hypothesis that a coefficient of
one of the explanatory variables is equal to zero - Also, the inclusion of an irrelevant variable
will usually decrease the adjusted R-sq (but not
the R-sq) - Finally, we can show that the inclusion of an
irrelevant variable does still allow us to obtain
unbiased estimates of the models coefficients
15Dummy Variables Incorporating Qualitative
Information in the Model
- In several cases, it may be necessary to include
explanatory information in the model in the form
of a qualitative variable - Example Suppose we want to estimate a model of
the relationship between firm performance (ROE)
and board independence - We are interested in empirically testing the
argument that greater board independence leads to
better firm performance - In our estimated model, the null hypothesis would
be that greater board independence will not
affect or result in worse firm performance
16Dummy Variables Incorporating Qualitative
Information in the Model
- We can measure board independence by, for
example, the proportion of independent directors - However, how can we measure the impact of the
fact that in some firms the CEO is also the
Chairman of the Board? - We assume that performance will be different in
firms with this attribute (given everything else)
compared to those where this is not true - We can capture this effect through the inclusion
of a dummy variable as an explanatory variable in
our model
17Dummy Variables Incorporating Qualitative
Information in the Model
- In this example, the dummy variable will take the
value of - 1 for firms where the CEO is also the Chairman
- 0 otherwise
- Therefore, we estimate the following model (in
general form) - where Di 1 if the ith observation satisfies
our condition, and 0 otherwise
18Dummy Variables Incorporating Qualitative
Information in the Model
Y
Di 0
?k1
Di 1
?0
X
19Dummy Variables Incorporating Qualitative
Information in the Model
- Example Suppose we empirically test the
relationship between a firms size and its
monthly stock returns with a sample of time
series data - In our model, we should include a dummy variable
that accounts for the well-known phenomenon of
January effect - The dummy variable will take the following values
- 1 for the observation of returns in the month of
January - 0 for all other observations of monthly returns
20Dummy Variables Incorporating Qualitative
Information in the Model
- The above cases are examples of intercept dummy
variables meaning that inclusion of the dummy
variable shifts the regression line, but does not
change its slope - Another form of a dummy variable is a slope dummy
variable that allows the slope of a regression to
change depending on whether a condition is
satisfied - Example Suppose we want to test the argument
that the relationship between credit card lending
and loan losses for a particular bank has changed
in the last five years
21Dummy Variables Incorporating Qualitative
Information in the Model
- I.e., suppose that credit card lending
contributes less to this banks loan losses due
to the implementation of better risk evaluation
methods (credit scoring) - We estimate the following model
-
- where the variable T is a time dummy variable
that takes the value - 1 for observations in the past five years
- 0 otherwise
-
22Dummy Variables Incorporating Qualitative
Information in the Model
- In this case, the coefficient of the CARDLN
variable is - ?1 - ?2 for observations from the past five years
- ?1 otherwise
- Dummy variable trap Always include one less
dummy variable than the possible qualitative
states in the data - Example If there are 3 qualitative states, for
example, small, medium, large firms, we should
include 2 dummy variables
23Multicollinearity
- Multicollinearity occurs when some or all of the
explanatory variables in the regression model are
highly correlated - In this case, assumption VI of the classical
model does not hold and OLS estimates lose some
of their nice properties - It is common, particularly in the case of time
series data, that two or more explanatory
variables are correlated - When multicollinearity is present, the estimated
coefficients are unstable in the degree of
statistical significance, magnitude and sign
24The Impact of Multicollinearity on the OLS
Estimates
- Multicollinearity has the following consequences
on the OLS estimated model - The OLS estimates remain unbiased
- The standard errors of the estimated coefficients
are higher and, thus, the t-statistics fall - OLS estimates become very sensitive to the
addition or removal of explanatory variables or
to changes in the data sample - The overall fit of the regression (and the
significance of non-multicollinear coefficients)
is to a large extent unaffected - This implies that a sign of multicollinearity is
a high adjusted R-sq and no statistically
significant coefficients
25Detecting Multicollinearity
- One approach to detect multicollinearity is to
examine the simple correlation coefficients
between explanatory variables - This will be shown in the correlation matrix
between the models variables - Some researchers consider a correlation
coefficient with an absolute value above .80 to
be an indication of concern for multicollinearity - A second detection approach is to use the
Variance Inflation Factor (VIF)
26Detecting Multicollinearity
- The VIF method tries to detect multicollinearity
by examining the degree to which a given
explanatory variable is explained by the others - The method involves the following steps
- Run an OLS regression of the explanatory variable
Xi on all other explanatory variables - Calculate the VIF for the coefficient of variable
Xi given by 1/(1 R2i) where the R-sq
is that given by the regression - Evaluate the size of the VIF
- Rule of thumb if the VIF of the coefficient of
explanatory variable Xi is greater than 5 then
the higher is the impact of multicollinearity on
the estimated coefficient of this variable