INTERPRETATION OF A REGRESSION EQUATION

About This Presentation

Title:

INTERPRETATION OF A REGRESSION EQUATION

Description:

The scatter diagram shows hourly earnings in 1994 plotted against highest grade ... Hence a literal interpretation of b1 would be unwise. ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 76

Provided by: thomas660

Category:

more less

Transcript and Presenter's Notes

Title: INTERPRETATION OF A REGRESSION EQUATION

1
INTERPRETATION OF A REGRESSION EQUATION

The scatter diagram shows hourly earnings in 1994
plotted against highest grade completed for a
sample of 570 respondents.
1
2
INTERPRETATION OF A REGRESSION EQUATION
. reg EARNINGS S Source SS df
MS Number of obs
570 ---------------------------------------
F( 1, 568) 65.64 Model
3977.38016 1 3977.38016 Prob gt
F 0.0000 Residual 34419.6569 568
60.5979875 R-squared
0.1036 ---------------------------------------
Adj R-squared 0.1020 Total
38397.0371 569 67.4816117 Root
MSE 7.7845 ------------------------------
------------------------------------------------ E
ARNINGS Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- S 1.073055 .1324501
8.102 0.000 .8129028 1.333206 _cons
-1.391004 1.820305 -0.764 0.445
-4.966354 2.184347 ----------------------------
--------------------------------------------------

This is the output from a regression of earnings
on highest grade completed, using Stata.
4
3
INTERPRETATION OF A REGRESSION EQUATION
. reg EARNINGS S Source SS df
MS Number of obs
570 ---------------------------------------
F( 1, 568) 65.64 Model
3977.38016 1 3977.38016 Prob gt
F 0.0000 Residual 34419.6569 568
60.5979875 R-squared
0.1036 ---------------------------------------
Adj R-squared 0.1020 Total
38397.0371 569 67.4816117 Root
MSE 7.7845 ------------------------------
------------------------------------------------ E
ARNINGS Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- S 1.073055 .1324501
8.102 0.000 .8129028 1.333206 _cons
-1.391004 1.820305 -0.764 0.445
-4.966354 2.184347 ----------------------------
--------------------------------------------------

For the time being, we will be concerned only
with the estimates of the parameters. The
variables in the regression are listed in the
first column and the second column gives the
estimates of their coefficients.
5
4
INTERPRETATION OF A REGRESSION EQUATION
. reg EARNINGS S Source SS df
MS Number of obs
570 ---------------------------------------
F( 1, 568) 65.64 Model
3977.38016 1 3977.38016 Prob gt
F 0.0000 Residual 34419.6569 568
60.5979875 R-squared
0.1036 ---------------------------------------
Adj R-squared 0.1020 Total
38397.0371 569 67.4816117 Root
MSE 7.7845 ------------------------------
------------------------------------------------ E
ARNINGS Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- S 1.073055 .1324501
8.102 0.000 .8129028 1.333206 _cons
-1.391004 1.820305 -0.764 0.445
-4.966354 2.184347 ----------------------------
--------------------------------------------------

In this case there is only one variable, S, and
its coefficient is 1.073. _cons, in Stata,
refers to the constant. The estimate of the
intercept is -1.391.
6
5
INTERPRETATION OF A REGRESSION EQUATION

Here is the scatter diagram again, with the
regression line shown.
7
6
INTERPRETATION OF A REGRESSION EQUATION

What do the coefficients actually mean?
8
7
INTERPRETATION OF A REGRESSION EQUATION

To answer this question, you must refer to the
units in which the variables are measured.
9
8
INTERPRETATION OF A REGRESSION EQUATION

S is measured in years (strictly speaking, grades
completed), EARNINGS in dollars per hour. So the
slope coefficient implies that hourly earnings
increase by 1.07 for each extra year of
schooling.
10
9
INTERPRETATION OF A REGRESSION EQUATION

We will look at a geometrical representation of
this interpretation. To do this, we will enlarge
the marked section of the scatter diagram.
11
10
INTERPRETATION OF A REGRESSION EQUATION
11.49
1.07
One year
10.41
The regression line indicates that completing
12th grade instead of 11th grade would increase
earnings by 1.073, from 10.413 to 11.486, as a
general tendency.
12
11
INTERPRETATION OF A REGRESSION EQUATION

You should ask yourself whether this is a
plausible figure. If it is implausible, this
could be a sign that your model is misspecified
in some way.
13
12
INTERPRETATION OF A REGRESSION EQUATION

For low levels of education it might be
plausible. But for high levels it would seem to
be an underestimate.
14
13
INTERPRETATION OF A REGRESSION EQUATION

What about the constant term? (Try to answer
this question yourself before continuing with
this sequence.)
15
14
INTERPRETATION OF A REGRESSION EQUATION

Literally, the constant indicates that an
individual with no years of education would have
to pay 1.39 per hour to be allowed to work.
16
15
INTERPRETATION OF A REGRESSION EQUATION

This does not make any sense at all, an
interpretation of negative payment is impossible
to sustain.
17
16
INTERPRETATION OF A REGRESSION EQUATION

A safe solution to the problem is to limit the
interpretation to the range of the sample data,
and to refuse to extrapolate on the ground that
we have no evidence outside the data range.
18
17
INTERPRETATION OF A REGRESSION EQUATION

With this explanation, the only function of the
constant term is to enable you to draw the
regression line at the correct height on the
scatter diagram. It has no meaning of its own.
19
18
MULTIPLE REGRESSION WITH TWO EXPLANATORY
VARIABLES EXAMPLE
Specifically, we will look at an earnings
function model where hourly earnings, EARNINGS,
depend on years of schooling (highest grade
completed), S, and a measure of cognitive
ability, A.
The model has three dimensions, one each for
EARNINGS, S, and A. The starting point for
investigating the determination of EARNINGS is
the intercept, b1.
Literally the intercept gives EARNINGS for those
respondents who have no schooling and who scored
zero on the ability test. However, the ability
score is scaled in such a way as to make it
impossible to score zero. Hence a literal
interpretation of b1 would be unwise.
The next term on the right side of the equation
gives the effect of variations in S. A one year
increase in S causes EARNINGS to increase by b2
dollars, holding A constant.
Similarly, the third term gives the effect of
variations in A. A one point increase in A
causes earnings to increase by b3 dollars,
holding S constant.
The final element of the model is the disturbance
term, u. In this observation, u happens to have
a positive value.
2
19
MULTIPLE REGRESSION WITH TWO EXPLANATORY
VARIABLES EXAMPLE
A sample consists of a number of observations
generated in this way. Note that the
interpretation of the model does not depend on
whether S and A are correlated or not.
However we do assume that the effects of S and A
on EARNINGS are additive. The impact of a
difference in S on EARNINGS is not affected by
the value of A, or vice versa.
The regression coefficients are derived using the
same least squares principle used in simple
regression analysis. The fitted value of Y in
observation i depends on our choice of b1, b2,
and b3.
11
20
MULTIPLE REGRESSION WITH TWO EXPLANATORY
VARIABLES EXAMPLE
The residual ei in observation i is the
difference between the actual and fitted values
of Y.
12
21
MULTIPLE REGRESSION WITH TWO EXPLANATORY
VARIABLES EXAMPLE

We define RSS, the sum of the squares of the
residuals, and choose b1, b2, and b3 so as to
minimize it.
13
22
MULTIPLE REGRESSION WITH TWO EXPLANATORY
VARIABLES EXAMPLE
. reg EARNINGS S ASVABC Source SS
df MS Number of obs
570 ---------------------------------------
F( 2, 567) 39.98 Model
4745.74965 2 2372.87483 Prob gt
F 0.0000 Residual 33651.2874 567
59.3497133 R-squared
0.1236 ---------------------------------------
Adj R-squared 0.1205 Total
38397.0371 569 67.4816117 Root
MSE 7.7039 ------------------------------
------------------------------------------------ E
ARNINGS Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- S .7390366 .1606216
4.601 0.000 .4235506 1.054523 ASVABC
.1545341 .0429486 3.598 0.000
.0701764 .2388918 _cons -4.624749
2.0132 -2.297 0.022 -8.578989
-.6705095 ----------------------------------------
--------------------------------------
Here is the regression output for the earnings
function using Data Set 21.
19
23
MULTIPLE REGRESSION WITH TWO EXPLANATORY
VARIABLES EXAMPLE
. reg EARNINGS S A Source SS df
MS Number of obs
570 ---------------------------------------
F( 2, 567) 39.98 Model
4745.74965 2 2372.87483 Prob gt
F 0.0000 Residual 33651.2874 567
59.3497133 R-squared
0.1236 ---------------------------------------
Adj R-squared 0.1205 Total
38397.0371 569 67.4816117 Root
MSE 7.7039 ------------------------------
------------------------------------------------ E
ARNINGS Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- S .7390366 .1606216
4.601 0.000 .4235506 1.054523 A
.1545341 .0429486 3.598 0.000
.0701764 .2388918 _cons -4.624749
2.0132 -2.297 0.022 -8.578989
-.6705095 ----------------------------------------
--------------------------------------
It indicates that earnings increase by 0.74 for
every extra year of schooling and by 0.15 for
every extra point increase in A.
20
24
MULTIPLE REGRESSION WITH TWO EXPLANATORY
VARIABLES EXAMPLE
. reg EARNINGS S A Source SS df
MS Number of obs
570 ---------------------------------------
F( 2, 567) 39.98 Model
4745.74965 2 2372.87483 Prob gt
F 0.0000 Residual 33651.2874 567
59.3497133 R-squared
0.1236 ---------------------------------------
Adj R-squared 0.1205 Total
38397.0371 569 67.4816117 Root
MSE 7.7039 ------------------------------
------------------------------------------------ E
ARNINGS Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- S .7390366 .1606216
4.601 0.000 .4235506 1.054523 A
.1545341 .0429486 3.598 0.000
.0701764 .2388918 _cons -4.624749
2.0132 -2.297 0.022 -8.578989
-.6705095 ----------------------------------------
--------------------------------------
Literally, the intercept indicates that an
individual who had no schooling and an A score of
zero would have hourly earnings of -4.62.
21
25
MULTIPLE REGRESSION WITH TWO EXPLANATORY
VARIABLES EXAMPLE
. reg EARNINGS S A Source SS df
MS Number of obs
570 ---------------------------------------
F( 2, 567) 39.98 Model
4745.74965 2 2372.87483 Prob gt
F 0.0000 Residual 33651.2874 567
59.3497133 R-squared
0.1236 ---------------------------------------
Adj R-squared 0.1205 Total
38397.0371 569 67.4816117 Root
MSE 7.7039 ------------------------------
------------------------------------------------ E
ARNINGS Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- S .7390366 .1606216
4.601 0.000 .4235506 1.054523 A
.1545341 .0429486 3.598 0.000
.0701764 .2388918 _cons -4.624749
2.0132 -2.297 0.022 -8.578989
-.6705095 ----------------------------------------
--------------------------------------
Obviously, this is impossible. The lowest value
of S in the sample was 6, and the lowest A score
was 22. We have obtained a nonsense estimate
because we have extrapolated too far from the
data range.
22
26
t TEST OF A HYPOTHESIS RELATING TO A REGRESSION
COEFFICIENT
t
Distribution Critical values of t Degrees of
Two-tailed test 10 5 2
1 0.2 0.1 freedom
One-tailed test 5 2.5 1
0.5 0.1 0.05 1 6.314 12.706 31.821
63.657 318.31 636.62 2 2.920 4.303 6.965 9.925 22
.327 31.598 3 2.353 3.182 4.541 5.841 10.214 12.9
24 4 2.132 2.776 3.747 4.604 7.173 8.610 5 2.015
2.571 3.365 4.032 5.893 6.869
18 1.734 2.101 2.552 2.878 3.610 3.922
19 1.729 2.093 2.539 2.861 3.579 3.883 20 1.725
2.086 2.528 2.845 3.552 3.850
120 1.658 1.980 2.358 2.617 3.160 3.37
3 1.645 1.960 2.326 2.576 3.090 3.291
For this reason we need to refer to a table of
critical values of t when performing significance
tests on the coefficients of a regression
equation.
18
27
t TEST OF A HYPOTHESIS RELATING TO A REGRESSION
COEFFICIENT
t
Distribution Critical values of t Degrees of
Two-tailed test 10 5 2
1 0.2 0.1 freedom
One-tailed test 5 2.5 1
0.5 0.1 0.05 1 6.314 12.706 31.821
63.657 318.31 636.62 2 2.920 4.303 6.965 9.925 22
.327 31.598 3 2.353 3.182 4.541 5.841 10.214 12.9
24 4 2.132 2.776 3.747 4.604 7.173 8.610 5 2.015
2.571 3.365 4.032 5.893 6.869
18 1.734 2.101 2.552 2.878 3.610 3.922
19 1.729 2.093 2.539 2.861 3.579 3.883 20 1.725
2.086 2.528 2.845 3.552 3.850
120 1.658 1.980 2.358 2.617 3.160 3.37
3 1.645 1.960 2.326 2.576 3.090 3.291
At the top of the table are listed possible
significance levels for a test. For the time
being we will be performing two-tailed tests, so
ignore the line for one-tailed tests.
19
28
t TEST OF A HYPOTHESIS RELATING TO A REGRESSION
COEFFICIENT
t
Distribution Critical values of t Degrees of
Two-tailed test 10 5 2
1 0.2 0.1 freedom
One-tailed test 5 2.5 1
0.5 0.1 0.05 1 6.314 12.706 31.821
63.657 318.31 636.62 2 2.920 4.303 6.965 9.925 22
.327 31.598 3 2.353 3.182 4.541 5.841 10.214 12.9
24 4 2.132 2.776 3.747 4.604 7.173 8.610 5 2.015
2.571 3.365 4.032 5.893 6.869
18 1.734 2.101 2.552 2.878 3.610 3.922
19 1.729 2.093 2.539 2.861 3.579 3.883 20 1.725
2.086 2.528 2.845 3.552 3.850
120 1.658 1.980 2.358 2.617 3.160 3.37
3 1.645 1.960 2.326 2.576 3.090 3.291
Hence if we are performing a (two-tailed) 5
significance test, we should use the column thus
indicated in the table.
20
29
t TEST OF A HYPOTHESIS RELATING TO A REGRESSION
COEFFICIENT
t
Distribution Critical values of t Degrees of
Two-tailed test 10 5 2
1 0.2 0.1 freedom
One-tailed test 5 2.5 1
0.5 0.1 0.05 1 6.314 12.706 31.821
63.657 318.31 636.62 2 2.920 4.303 6.965 9.925 22
.327 31.598 3 2.353 3.182 4.541 5.841 10.214 12.9
24 4 2.132 2.776 3.747 4.604 7.173 8.610 5 2.015
2.571 3.365 4.032 5.893 6.869
18 1.734 2.101 2.552 2.878 3.610 3.922
19 1.729 2.093 2.539 2.861 3.579 3.883 20 1.725
2.086 2.528 2.845 3.552 3.850
120 1.658 1.980 2.358 2.617 3.160 3.37
3 1.645 1.960 2.326 2.576 3.090 3.291
Number of degrees of freedom in a regression
number of observations - number of parameters
estimated.
The left hand vertical column lists degrees of
freedom. The number of degrees of freedom in a
regression is defined to be the number of
observations minus the number of parameters
estimated.
21
30
t TEST OF A HYPOTHESIS RELATING TO A REGRESSION
COEFFICIENT
t
Distribution Critical values of t Degrees of
Two-tailed test 10 5 2
1 0.2 0.1 freedom
One-tailed test 5 2.5 1
0.5 0.1 0.05 1 6.314 12.706 31.821
63.657 318.31 636.62 2 2.920 4.303 6.965 9.925 22
.327 31.598 3 2.353 3.182 4.541 5.841 10.214 12.9
24 4 2.132 2.776 3.747 4.604 7.173 8.610 5 2.015
2.571 3.365 4.032 5.893 6.869
18 1.734 2.101 2.552 2.878 3.610 3.922
19 1.729 2.093 2.539 2.861 3.579 3.883 20 1.725
2.086 2.528 2.845 3.552 3.850
120 1.658 1.980 2.358 2.617 3.160 3.37
3 1.645 1.960 2.326 2.576 3.090 3.291
In a simple regression, we estimate just two
parameters, the constant and the slope
coefficient, so the number of degrees of freedom
is n - 2 if there are n observations.
22
31
t TEST OF A HYPOTHESIS RELATING TO A REGRESSION
COEFFICIENT
t
Distribution Critical values of t Degrees of
Two-tailed test 10 5 2
1 0.2 0.1 freedom
One-tailed test 5 2.5 1
0.5 0.1 0.05 1 6.314 12.706 31.821
63.657 318.31 636.62 2 2.920 4.303 6.965 9.925 22
.327 31.598 3 2.353 3.182 4.541 5.841 10.214 12.9
24 4 2.132 2.776 3.747 4.604 7.173 8.610 5 2.015
2.571 3.365 4.032 5.893 6.869
18 1.734 2.101 2.552 2.878 3.610 3.922
19 1.729 2.093 2.539 2.861 3.579 3.883 20 1.725
2.086 2.528 2.845 3.552 3.850
120 1.658 1.980 2.358 2.617 3.160 3.37
3 1.645 1.960 2.326 2.576 3.090 3.291
If we were performing a regression with 20
observations, as in the price inflation/wage
inflation example, the number of degrees of
freedom would be 18 and the critical value of t
for a 5 test would be 2.101.
23
32
t TEST OF A HYPOTHESIS RELATING TO A REGRESSION
COEFFICIENT
t
Distribution Critical values of t Degrees of
Two-tailed test 10 5 2
1 0.2 0.1 freedom
One-tailed test 5 2.5 1
0.5 0.1 0.05 1 6.314 12.706 31.821
63.657 318.31 636.62 2 2.920 4.303 6.965 9.925 22
.327 31.598 3 2.353 3.182 4.541 5.841 10.214 12.9
24 4 2.132 2.776 3.747 4.604 7.173 8.610 5 2.015
2.571 3.365 4.032 5.893 6.869
18 1.734 2.101 2.552 2.878 3.610 3.922
19 1.729 2.093 2.539 2.861 3.579 3.883 20 1.725
2.086 2.528 2.845 3.552 3.850
120 1.658 1.980 2.358 2.617 3.160 3.37
3 1.645 1.960 2.326 2.576 3.090 3.291
Note that as the number of degrees of freedom
becomes large, the critical value converges on
1.96, the critical value for the normal
distribution. This is because the t distribution
converges on the normal distribution.
24
33
t TEST OF A HYPOTHESIS RELATING TO A REGRESSION
COEFFICIENT
t
Distribution Critical values of t Degrees of
Two-tailed test 10 5 2
1 0.2 0 .1 freedom
One-tailed test 5 2.5 1
0.5 0.1 0.05 1 6.314 12.706 31.821
63.657 318.31 636.62 2 2.920 4.303 6.965 9.925 22
.327 31.598 3 2.353 3.182 4.541 5.841 10.214 12.9
24 4 2.132 2.776 3.747 4.604 7.173 8.610 5 2.015
2.571 3.365 4.032 5.893 6.869
18 1.734 2.101 2.552 2.878 3.610 3.922
19 1.729 2.093 2.539 2.861 3.579 3.883 20 1.725
2.086 2.528 2.845 3.552 3.850
120 1.658 1.980 2.358 2.617 3.160 3.37
3 1.645 1.960 2.326 2.576 3.090 3.291
If instead we wished to perform a 1 significance
test, we would use the column indicated above.
Note that as the number of degrees of freedom
becomes large, the critical value converges to
2.58, the critical value for the normal
distribution.
27
34
t TEST OF A HYPOTHESIS RELATING TO A REGRESSION
COEFFICIENT
t
Distribution Critical values of t Degrees of
Two-tailed test 10 5 2
1 0.2 0 .1 freedom
One-tailed test 5 2.5 1
0.5 0.1 0.05 1 6.314 12.706 31.821
63.657 318.31 636.62 2 2.920 4.303 6.965 9.925 22
.327 31.598 3 2.353 3.182 4.541 5.841 10.214 12.9
24 4 2.132 2.776 3.747 4.604 7.173 8.610 5 2.015
2.571 3.365 4.032 5.893 6.869
18 1.734 2.101 2.552 2.878 3.610 3.922
19 1.729 2.093 2.539 2.861 3.579 3.883 20 1.725
2.086 2.528 2.845 3.552 3.850
120 1.658 1.980 2.358 2.617 3.160 3.37
3 1.645 1.960 2.326 2.576 3.090 3.291
For a simple regression with 20 observations, the
critical value of t at the 1 level is 2.878.
28
35
t TEST OF A HYPOTHESIS RELATING TO A REGRESSION
COEFFICIENT

s.d. of b2 known
s.d. of b2 not known
discrepancy between hypothetical value and sample
estimate, in terms of s.d.
discrepancy between hypothetical value and sample
estimate, in terms of s.e.
5 significance test reject H0 b2 b2 if z gt
1.96 or z lt -1.96
1 significance test reject H0 b2 b2 if t gt
2.878 or t lt -2.878
0
0
So we should this figure in the test procedure
for a 1 test.
29
36
EXERCISE
3.10 A researcher with a sample of 50
individuals with similar education but differing
amounts of training hypothesizes that hourly
earnings, EARNINGS, may be related to hours of
training, TRAINING, according to the
relationship EARNINGS b1 b2 TRAINING
u He is prepared to test the null hypothesis
H0 b2 0 against the alternative hypothesis H1
b2 0 at the 5 percent and 1 percent levels.
What should he report 1. If b2 0.30,
s.e.(b2) 0.12? 2. If b2 0.55, s.e.(b2)
0.12? 3. If b2 0.10, s.e.(b2) 0.12? 4. If
b2 -0.27, s.e.(b2) 0.12?
1
37
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom
There are 50 observations and 2 parameters have
been estimated, so there are 48 degrees of
freedom.
2
38
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68
The table giving the critical values of t does
not give the values for 48 degrees of freedom.
We will use the values for 50 as a guide. For
the 5 level the value is 2.01, and for the 1
level it is 2.68. The critical values for 48
will be slightly higher.
3
39
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 1. If b2 0.30, s.e.(b2) 0.12? t
2.50.
In the first case, the t statistic is 2.50.
4
40
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 1. If b2 0.30, s.e.(b2) 0.12? t
2.50. Reject H0 at the 5 level but not at the
1 level.
This is greater than the critical value of t at
the 5 level, but less than the critical value at
the 1 level.
5
41
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 1. If b2 0.30, s.e.(b2) 0.12? t
2.50. Reject H0 at the 5, but not at the 1,
level.
In this case we should mention both tests. It is
not enough to say "Reject at the 5 level",
because it leaves open the possibility that we
might be able to reject at the 1 level.
6
42
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 1. If b2 0.30, s.e.(b2) 0.12? t
2.50. Reject H0 at the 5, but not at the 1,
level.
Likewise it is not enough to say "Do not reject
at the 1 level", because this does not reveal
whether the result is significant at the 5 level
or not.
7
43
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 2. If b2 0.55, s.e.(b2) 0.12? t
4.58.
In the second case, t is equal to 4.58.
8
44
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 2. If b2 0.55, s.e.(b2) 0.12? t
4.58. Reject H0 at the 1 level.
We report only the result of the 1 test. There
is no need to mention the 5 test. If you do,
you reveal that you do not understand that
rejection at the 1 level automatically means
rejection at the 5 level, and you look ignorant.
9
45
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 2. If b2 0.55, s.e.(b2) 0.12? t
4.58. Reject H0 at the 0.1 level (tcrit, 0.1
3.50).
Actually, given the large t statistic, it is a
good idea to investigate whether we can reject H0
at the 0.1 level. It turns out that we can.
The critical value for 50 degrees of freedom is
3.50. So we just report the outcome of this
test. There is no need to mention the 1 test.
10
46
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 2. If b2 0.55, s.e.(b2) 0.12? t
4.58. Reject H0 at the 0.1 level (tcrit, 0.1
3.50).
Why is it a good idea to press on to a 0.1 test,
if the t statistic is large? Try to answer this
question before looking at the next slide.
11
47
EXERCISE 3.10
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 2. If b2 0.55, s.e.(b2) 0.12? t
4.58. Reject H0 at the 0.1 level (tcrit, 0.1
3.50).
The reason is that rejection at the 1 level
still leaves open the possibility of a 1 risk of
having made a Type I error (rejecting the null
hypothesis when it is in fact true). So there is
a 1 risk of the "significant" result having
occurred as a matter of chance.
12
48
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 2. If b2 0.55, s.e.(b2) 0.12? t
4.58. Reject H0 at the 0.1 level (tcrit, 0.1
3.50).
If you can reject at the 0.1 level, you reduce
that risk to one tenth of 1. This means that
the result is almost certainly genuine.
13
49
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 3. If b2 0.10, s.e.(b2) 0.12? t
0.83.
In the third case, t is equal to 0.83.
14
50
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 3. If b2 0.10, s.e.(b2) 0.12? t
0.83. Do not reject H0 at the 5 level.
We report only the result of the 5 test. There
is no need to mention the 1 test. If you do,
you reveal that you do not understand that not
rejecting at the 5 level automatically means not
rejecting at the 1 level, and you look ignorant.
15
51
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 4. If b2 -0.27, s.e.(b2) 0.12? t
-2.25.
In the fourth case, t is equal to -2.25.
16
52
EXERCISE
EARNINGS b1 b2 TRAINING u H0 b2
0, H1 b2 0 n 50, so 48 degrees of
freedom tcrit, 5 2.01, tcrit, 1
2.68 ____________________________________________
___ 4. If b2 -0.27, s.e.(b2) 0.12? t
-2.25. Reject H0 at the 5 level but not at
the 1 level.
The absolute value of the t statistic is between
the critical values for the 5 and 1 tests. So
we mention both tests, as in the first case.
17
53
F TESTS OF GOODNESS OF FIT

This sequence describes two F tests of goodness
of fit in a multiple regression model. The first
relates to the goodness of fit of the equation as
a whole.
1
54
F TESTS OF GOODNESS OF FIT

We will consider the general case where there are
k - 1 explanatory variables. For the F test of
goodness of fit of the equation as a whole, the
null hypothesis, in words, is that the model has
no explanatory power at all.
2
55
F TESTS OF GOODNESS OF FIT

Of course we hope to reject it and conclude that
the model does have some explanatory power.
3
56
F TESTS OF GOODNESS OF FIT

The model will have no explanatory power if it
turns out that Y is unrelated to any of the
explanatory variables. Mathematically,
therefore, the null hypothesis is that all the
coefficients b2, ..., bk are zero.
4
57
F TESTS OF GOODNESS OF FIT

The alternative hypothesis is that at least one
of these b coefficients is different from zero.
5
58
F TESTS OF GOODNESS OF FIT

In the multiple regression model there is a
difference between the roles of the F and t
tests. The F test tests the joint explanatory
power of the variables, while the t tests test
their explanatory power individually.
6
59
F TESTS OF GOODNESS OF FIT

In the simple regression model the F test was
equivalent to the (two-tailed) t test on the
slope coefficient because the "group" consisted
of just one variable.
7
60
F TESTS OF GOODNESS OF FIT

The F statistic for the test was defined in the
last sequence in Chapter 3. ESS is the explained
sum of squares and RSS is the residual sum of
squares.
8
61
F TESTS OF GOODNESS OF FIT

It can be expressed in terms of R2 by dividing
the numerator and denominator by TSS, the total
sum of squares.
9
62
F TESTS OF GOODNESS OF FIT

ESS / TSS is equal to R2 and RSS / TSS is equal
to (1 - R2). (For proofs, see the last sequence
in Chapter 3.)
10
63
F TESTS OF GOODNESS OF FIT
The educational attainment model will be used as
an example. We will suppose that S depends on
ASVABC, the ability score, and SM, and SF, the
highest grade completed by the mother and father
of the respondent, respectively.
11
64
F TESTS OF GOODNESS OF FIT
The null hypothesis for the F test of goodness of
fit is that all three slope coefficients are
equal to zero. The alternative hypothesis is
that at least one of them is non-zero.
12
65
F TESTS OF GOODNESS OF FIT
. reg S ASVABC SM SF Source SS
df MS Number of obs
570 ---------------------------------------
F( 3, 566) 110.83 Model
1278.24153 3 426.080508 Prob gt
F 0.0000 Residual 2176.00584 566
3.84453329 R-squared
0.3700 ---------------------------------------
Adj R-squared 0.3667 Total
3454.24737 569 6.07073351 Root
MSE 1.9607 ------------------------------
------------------------------------------------
S Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ASVABC .1295006 .0099544
13.009 0.000 .1099486 .1490527
SM .069403 .0422974 1.641 0.101
-.013676 .152482 SF .1102684
.0311948 3.535 0.000 .0489967
.1715401 _cons 4.914654 .5063527
9.706 0.000 3.920094
5.909214 -----------------------------------------
-------------------------------------
Here is the regression output using Data Set 21.
13
66
F TESTS OF GOODNESS OF FIT
. reg S ASVABC SM SF Source SS
df MS Number of obs
570 ---------------------------------------
F( 3, 566) 110.83 Model
1278.24153 3 426.080508 Prob gt
F 0.0000 Residual 2176.00584 566
3.84453329 R-squared
0.3700 ---------------------------------------
Adj R-squared 0.3667 Total
3454.24737 569 6.07073351 Root
MSE 1.9607 ------------------------------
------------------------------------------------
S Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ASVABC .1295006 .0099544
13.009 0.000 .1099486 .1490527
SM .069403 .0422974 1.641 0.101
-.013676 .152482 SF .1102684
.0311948 3.535 0.000 .0489967
.1715401 _cons 4.914654 .5063527
9.706 0.000 3.920094
5.909214 -----------------------------------------
-------------------------------------
In this example, k - 1, the number of explanatory
variables, is equal to 3 and n - k, the number of
degrees of freedom, is equal to 566.
14
67
F TESTS OF GOODNESS OF FIT
. reg S ASVABC SM SF Source SS
df MS Number of obs
570 ---------------------------------------
F( 3, 566) 110.83 Model
1278.24153 3 426.080508 Prob gt
F 0.0000 Residual 2176.00584 566
3.84453329 R-squared
0.3700 ---------------------------------------
Adj R-squared 0.3667 Total
3454.24737 569 6.07073351 Root
MSE 1.9607 ------------------------------
------------------------------------------------
S Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ASVABC .1295006 .0099544
13.009 0.000 .1099486 .1490527
SM .069403 .0422974 1.641 0.101
-.013676 .152482 SF .1102684
.0311948 3.535 0.000 .0489967
.1715401 _cons 4.914654 .5063527
9.706 0.000 3.920094
5.909214 -----------------------------------------
-------------------------------------
The numerator of the F statistic is the explained
sum of squares divided by k - 1. In the Stata
output these numbers are given in the Model row.
15
68
F TESTS OF GOODNESS OF FIT
. reg S ASVABC SM SF Source SS
df MS Number of obs
570 ---------------------------------------
F( 3, 566) 110.83 Model
1278.24153 3 426.080508 Prob gt
F 0.0000 Residual 2176.00584 566
3.84453329 R-squared
0.3700 ---------------------------------------
Adj R-squared 0.3667 Total
3454.24737 569 6.07073351 Root
MSE 1.9607 ------------------------------
------------------------------------------------
S Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ASVABC .1295006 .0099544
13.009 0.000 .1099486 .1490527
SM .069403 .0422974 1.641 0.101
-.013676 .152482 SF .1102684
.0311948 3.535 0.000 .0489967
.1715401 _cons 4.914654 .5063527
9.706 0.000 3.920094
5.909214 -----------------------------------------
-------------------------------------
The denominator is the residual sum of squares
divided by the number of degrees of freedom
remaining.
16
69
F TESTS OF GOODNESS OF FIT
. reg S ASVABC SM SF Source SS
df MS Number of obs
570 ---------------------------------------
F( 3, 566) 110.83 Model
1278.24153 3 426.080508 Prob gt
F 0.0000 Residual 2176.00584 566
3.84453329 R-squared
0.3700 ---------------------------------------
Adj R-squared 0.3667 Total
3454.24737 569 6.07073351 Root
MSE 1.9607 ------------------------------
------------------------------------------------
S Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ASVABC .1295006 .0099544
13.009 0.000 .1099486 .1490527
SM .069403 .0422974 1.641 0.101
-.013676 .152482 SF .1102684
.0311948 3.535 0.000 .0489967
.1715401 _cons 4.914654 .5063527
9.706 0.000 3.920094
5.909214 -----------------------------------------
-------------------------------------
Hence the F statistic is 110.8. All serious
regression packages compute it for you as part of
the diagnostics in the regression output.
17
70
F TESTS OF GOODNESS OF FIT
. reg S ASVABC SM SF Source SS
df MS Number of obs
570 ---------------------------------------
F( 3, 566) 110.83 Model
1278.24153 3 426.080508 Prob gt
F 0.0000 Residual 2176.00584 566
3.84453329 R-squared
0.3700 ---------------------------------------
Adj R-squared 0.3667 Total
3454.24737 569 6.07073351 Root
MSE 1.9607 ------------------------------
------------------------------------------------
S Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ASVABC .1295006 .0099544
13.009 0.000 .1099486 .1490527
SM .069403 .0422974 1.641 0.101
-.013676 .152482 SF .1102684
.0311948 3.535 0.000 .0489967
.1715401 _cons 4.914654 .5063527
9.706 0.000 3.920094
5.909214 -----------------------------------------
-------------------------------------
The critical value for F(3,566) is not given in
the F tables, but we know it must be lower than
F(3,120), which is given. At the 0.1 level,
this is 5.78. Hence we easily reject H0 at the
0.1 level.
18
71
F TESTS OF GOODNESS OF FIT
. reg S ASVABC SM SF Source SS
df MS Number of obs
570 ---------------------------------------
F( 3, 566) 110.83 Model
1278.24153 3 426.080508 Prob gt
F 0.0000 Residual 2176.00584 566
3.84453329 R-squared
0.3700 ---------------------------------------
Adj R-squared 0.3667 Total
3454.24737 569 6.07073351 Root
MSE 1.9607 ------------------------------
------------------------------------------------
S Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ASVABC .1295006 .0099544
13.009 0.000 .1099486 .1490527
SM .069403 .0422974 1.641 0.101
-.013676 .152482 SF .1102684
.0311948 3.535 0.000 .0489967
.1715401 _cons 4.914654 .5063527
9.706 0.000 3.920094
5.909214 -----------------------------------------
-------------------------------------
This result could have been anticipated because
both ASVABC and SF have highly significant t
statistics. So we knew in advance that both b2
and b4 were non-zero.
19
72
F TESTS OF GOODNESS OF FIT
. reg S ASVABC SM SF Source SS
df MS Number of obs
570 ---------------------------------------
F( 3, 566) 110.83 Model
1278.24153 3 426.080508 Prob gt
F 0.0000 Residual 2176.00584 566
3.84453329 R-squared
0.3700 ---------------------------------------
Adj R-squared 0.3667 Total
3454.24737 569 6.07073351 Root
MSE 1.9607 ------------------------------
------------------------------------------------
S Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ASVABC .1295006 .0099544
13.009 0.000 .1099486 .1490527
SM .069403 .0422974 1.641 0.101
-.013676 .152482 SF .1102684
.0311948 3.535 0.000 .0489967
.1715401 _cons 4.914654 .5063527
9.706 0.000 3.920094
5.909214 -----------------------------------------
-------------------------------------
It is unusual for the F statistic not to be
significant if some of the t statistics are
significant. In principle it could happen
though. Suppose that you ran a regression with
40 explanatory variables, none being a true
determinant of the dependent variable.
20
73
F TESTS OF GOODNESS OF FIT
. reg S ASVABC SM SF Source SS
df MS Number of obs
570 ---------------------------------------
F( 3, 566) 110.83 Model
1278.24153 3 426.080508 Prob gt
F 0.0000 Residual 2176.00584 566
3.84453329 R-squared
0.3700 ---------------------------------------
Adj R-squared 0.3667 Total
3454.24737 569 6.07073351 Root
MSE 1.9607 ------------------------------
------------------------------------------------
S Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ASVABC .1295006 .0099544
13.009 0.000 .1099486 .1490527
SM .069403 .0422974 1.641 0.101
-.013676 .152482 SF .1102684
.0311948 3.535 0.000 .0489967
.1715401 _cons 4.914654 .5063527
9.706 0.000 3.920094
5.909214 -----------------------------------------
-------------------------------------
Then the F statistic should be low enough for H0
not to be rejected. However, if you are
performing t tests on the slope coefficients at
the 5 level, with a 5 chance of a Type I error,
on average 2 of the 40 variables could be
expected to have "significant" coefficients.
21
74
F TESTS OF GOODNESS OF FIT
. reg S ASVABC SM SF Source SS
df MS Number of obs
570 ---------------------------------------
F( 3, 566) 110.83 Model
1278.24153 3 426.080508 Prob gt
F 0.0000 Residual 2176.00584 566
3.84453329 R-squared
0.3700 ---------------------------------------
Adj R-squared 0.3667 Total
3454.24737 569 6.07073351 Root
MSE 1.9607 ------------------------------
------------------------------------------------
S Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ASVABC .1295006 .0099544
13.009 0.000 .1099486 .1490527
SM .069403 .0422974 1.641 0.101
-.013676 .152482 SF .1102684
.0311948 3.535 0.000 .0489967
.1715401 _cons 4.914654 .5063527
9.706 0.000 3.920094
5.909214 -----------------------------------------
-------------------------------------
The opposite can easily happen, though. Suppose
you have a multiple regression model which is
correctly specified and the R2 is high. You
would expect to have a highly significant F
statistic.
22
75
F TESTS OF GOODNESS OF FIT
. reg S ASVABC SM SF Source SS
df MS Number of obs
570 ---------------------------------------
F( 3, 566) 110.83 Model
1278.24153 3 426.080508 Prob gt
F 0.0000 Residual 2176.00584 566
3.84453329 R-squared
0.3700 ---------------------------------------
Adj R-squared 0.3667 Total
3454.24737 569 6.07073351 Root
MSE 1.9607 ------------------------------
------------------------------------------------
S Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ASVABC .1295006 .0099544
13.009 0.000 .1099486 .1490527
SM .069403 .0422974 1.641 0.101
-.013676 .152482 SF .1102684
.0311948 3.535 0.000 .0489967
.1715401 _cons 4.914654 .5063527
9.706 0.000 3.920094
5.909214 -----------------------------------------
-------------------------------------
However, if the explanatory variables are highly
correlated and the model is subject to severe
multicollinearity, the standard errors of the
slope coefficients could all be so large that
none of the t statistics is significant.
23

Write a Comment

User Comments (0)