Hypothesis Testing, Model Specification and Multicollinearity - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Hypothesis Testing, Model Specification and Multicollinearity

Description:

Suppose that with 40 degrees of freedom in our model we obtain a t-statistic of 1.85 ... Significance of the Model ... The above model has been estimated by ... – PowerPoint PPT presentation

Number of Views:685
Avg rating:3.0/5.0
Slides: 27
Provided by: GeorgeP4
Category:

less

Transcript and Presenter's Notes

Title: Hypothesis Testing, Model Specification and Multicollinearity


1
Hypothesis Testing, Model Specification and
Multicollinearity
2
Computing p-Values for t-Tests
  • Selecting and committing to a significance level
    before we make the decision on a hypothesis can
    hide useful information about the tests outcome
  • Example Suppose we want to test a null
    hypothesis that a coefficient is zero against a
    two-sided alternative
  • Suppose that with 40 degrees of freedom in our
    model we obtain a t-statistic of 1.85
  • If we select a 5 significance level, we do not
    reject the null hypothesis since the critical
    t-value is 2.021

3
Computing p-Values for t-Tests
  • If our goal is NOT to reject the null, we would
    report the above outcome at the 5 level of
    significance
  • However, at the 10 level of significance, the
    null would be rejected (t-critical 1.648)
  • Rather than selecting and testing at different
    significant levels, it is better to provide an
    answer to the following question
  • What is the smallest significance level at which
    the null would be rejected?

4
Computing p-Values for t-Tests
  • This level is known as the p-value and gives us
    the probability of type I error if we reject the
    null hypothesis
  • P-values are computed by software packages, such
    as SPSS
  • Interpretation of p-value shows the probability
    of observing a t-statistic as extreme as we did
    if the null hypothesis is true
  • Implication small p-values are evidence against
    the null while large p-values provide little
    evidence against the null

5
The F-Test Testing the Overall Significance of
the Model
  • The t-test cannot be used to test hypotheses
    about more than one coefficient in our model
  • The F-test is used to test the overall
    significance of the regression equation
  • In a model with K explanatory variables, the two
    hypotheses are
  • H0 ?1 ?2 ?k 0
  • H1 H0 not true

6
The F-Test Testing the Overall Significance of
the Model
  • The F-statistic is
  • The decision rule is
  • Reject H0 if F ? Fc
  • Do not reject H0 if F
  • where Fc is the critical value determined by the
    table of the F-distribution with K and n-K-1
    degrees of freedom

7
Model Specification Errors Omitting Relevant
Variables and Including Irrelevant Variables
  • To properly estimate a regression model, we need
    to have specified the correct model
  • A typical specification error occurs when the
    estimated model does not include the correct set
    of explanatory variables
  • This specification error takes two forms
  • Omitting one or more relevant explanatory
    variables
  • Including one or more irrelevant explanatory
    variables
  • Either form of specification error results in
    problems with OLS estimates

8
Model Specification Errors Omitting Relevant
Variables
  • Example Two-factor model of stock returns
  • Suppose that the true model that explains a
    particular stocks returns is given by a
    two-factor model with the growth of GDP and the
    inflation rate as factors
  • Suppose instead that we estimated the following
    model

9
Model Specification Errors Omitting Relevant
Variables
  • The above model has been estimated by omitting
    the explanatory variable INF
  • Thus, the error term of this model is actually
    equal to
  • If there is any correlation between the omitted
    variable (INF) and the explanatory variable
    (GDP), then there is a violation of classical
    assumption III

10
Model Specification Errors Omitting Relevant
Variables
  • This means that the explanatory variable and the
    error term are not uncorrelated
  • If that is the case, the OLS estimate of ?1 (the
    coefficient of GDP) will be biased
  • As in the above example, it is highly likely that
    there will be some correlation between two
    financial (or economic) variables
  • If, however, the correlation is low or the true
    coefficient of the omitted variable is zero, then
    the specification error is very small

11
Model Specification Errors Omitting Relevant
Variables
  • How can we correct the omitted variable bias in a
    model?
  • A simple solution is to add the omitted variable
    back to the model, but the problem with this
    solution is to be able to detect which is the
    omitted variable
  • Omitted variable bias is hard to detect, but
    there could be some obvious indications of this
    specification error
  • For example, our estimated model has a
    significant coefficient with the opposite sign
    from that expected by our arguments

12
Model Specification Errors Omitting Relevant
Variables
  • The best way to detect the omitted variable
    specification bias is to rely on the theoretical
    arguments behind the model
  • Which variables does the theory suggest should be
    included?
  • What are the expected signs of the coefficients?
  • Have we omitted a variable that most other
    similar studies include in the estimated model?
  • Note, though, that a significant coefficient with
    the unexpected sign can also occur due to a small
    sample size
  • However, most of the data sets used in empirical
    finance are large enough that this most likely is
    not the cause of the specification bias

13
Model Specification Errors Including Irrelevant
Variables
  • Example Going back to the two-factor model,
    suppose that we include a third explanatory
    variable in the model, for example, the degree of
    wage inequality (INEQ)
  • So, we estimate the following model
  • The inclusion of an irrelevant variable (INEQ) in
    the model increases the standard errors of the
    estimated coefficients and, thus, decreases the
    t-statistics

14
Model Specification Errors Including Irrelevant
Variables
  • This implies that it will be more difficult to
    reject a null hypothesis that a coefficient of
    one of the explanatory variables is equal to zero
  • Also, the inclusion of an irrelevant variable
    will usually decrease the adjusted R-sq (but not
    the R-sq)
  • Finally, we can show that the inclusion of an
    irrelevant variable does still allow us to obtain
    unbiased estimates of the models coefficients

15
Dummy Variables Incorporating Qualitative
Information in the Model
  • In several cases, it may be necessary to include
    explanatory information in the model in the form
    of a qualitative variable
  • Example Suppose we want to estimate a model of
    the relationship between firm performance (ROE)
    and board independence
  • We are interested in empirically testing the
    argument that greater board independence leads to
    better firm performance
  • In our estimated model, the null hypothesis would
    be that greater board independence will not
    affect or result in worse firm performance

16
Dummy Variables Incorporating Qualitative
Information in the Model
  • We can measure board independence by, for
    example, the proportion of independent directors
  • However, how can we measure the impact of the
    fact that in some firms the CEO is also the
    Chairman of the Board?
  • We assume that performance will be different in
    firms with this attribute (given everything else)
    compared to those where this is not true
  • We can capture this effect through the inclusion
    of a dummy variable as an explanatory variable in
    our model

17
Dummy Variables Incorporating Qualitative
Information in the Model
  • In this example, the dummy variable will take the
    value of
  • 1 for firms where the CEO is also the Chairman
  • 0 otherwise
  • Therefore, we estimate the following model (in
    general form)
  • where Di 1 if the ith observation satisfies
    our condition, and 0 otherwise

18
Dummy Variables Incorporating Qualitative
Information in the Model
Y
Di 0
?k1
Di 1
?0
X
19
Dummy Variables Incorporating Qualitative
Information in the Model
  • Example Suppose we empirically test the
    relationship between a firms size and its
    monthly stock returns with a sample of time
    series data
  • In our model, we should include a dummy variable
    that accounts for the well-known phenomenon of
    January effect
  • The dummy variable will take the following values
  • 1 for the observation of returns in the month of
    January
  • 0 for all other observations of monthly returns

20
Dummy Variables Incorporating Qualitative
Information in the Model
  • The above cases are examples of intercept dummy
    variables meaning that inclusion of the dummy
    variable shifts the regression line, but does not
    change its slope
  • Another form of a dummy variable is a slope dummy
    variable that allows the slope of a regression to
    change depending on whether a condition is
    satisfied
  • Example Suppose we want to test the argument
    that the relationship between credit card lending
    and loan losses for a particular bank has changed
    in the last five years

21
Dummy Variables Incorporating Qualitative
Information in the Model
  • I.e., suppose that credit card lending
    contributes less to this banks loan losses due
    to the implementation of better risk evaluation
    methods (credit scoring)
  • We estimate the following model
  • where the variable T is a time dummy variable
    that takes the value
  • 1 for observations in the past five years
  • 0 otherwise

22
Dummy Variables Incorporating Qualitative
Information in the Model
  • In this case, the coefficient of the CARDLN
    variable is
  • ?1 - ?2 for observations from the past five years
  • ?1 otherwise
  • Dummy variable trap Always include one less
    dummy variable than the possible qualitative
    states in the data
  • Example If there are 3 qualitative states, for
    example, small, medium, large firms, we should
    include 2 dummy variables

23
Multicollinearity
  • Multicollinearity occurs when some or all of the
    explanatory variables in the regression model are
    highly correlated
  • In this case, assumption VI of the classical
    model does not hold and OLS estimates lose some
    of their nice properties
  • It is common, particularly in the case of time
    series data, that two or more explanatory
    variables are correlated
  • When multicollinearity is present, the estimated
    coefficients are unstable in the degree of
    statistical significance, magnitude and sign

24
The Impact of Multicollinearity on the OLS
Estimates
  • Multicollinearity has the following consequences
    on the OLS estimated model
  • The OLS estimates remain unbiased
  • The standard errors of the estimated coefficients
    are higher and, thus, the t-statistics fall
  • OLS estimates become very sensitive to the
    addition or removal of explanatory variables or
    to changes in the data sample
  • The overall fit of the regression (and the
    significance of non-multicollinear coefficients)
    is to a large extent unaffected
  • This implies that a sign of multicollinearity is
    a high adjusted R-sq and no statistically
    significant coefficients

25
Detecting Multicollinearity
  • One approach to detect multicollinearity is to
    examine the simple correlation coefficients
    between explanatory variables
  • This will be shown in the correlation matrix
    between the models variables
  • Some researchers consider a correlation
    coefficient with an absolute value above .80 to
    be an indication of concern for multicollinearity
  • A second detection approach is to use the
    Variance Inflation Factor (VIF)

26
Detecting Multicollinearity
  • The VIF method tries to detect multicollinearity
    by examining the degree to which a given
    explanatory variable is explained by the others
  • The method involves the following steps
  • Run an OLS regression of the explanatory variable
    Xi on all other explanatory variables
  • Calculate the VIF for the coefficient of variable
    Xi given by 1/(1 R2i) where the R-sq
    is that given by the regression
  • Evaluate the size of the VIF
  • Rule of thumb if the VIF of the coefficient of
    explanatory variable Xi is greater than 5 then
    the higher is the impact of multicollinearity on
    the estimated coefficient of this variable
Write a Comment
User Comments (0)
About PowerShow.com