Title: Our Friend, the Standard Error
1Our Friend, the Standard Error
2What is a Standard Error again?
- Think back to the very first day. We were
summarizing variables. - We wanted to describe dispersion
- Candy Bar consumption
- One group consumes 8,5, 6, 7, 9 Mean 7
- Another group 2, 4, 12, 7, 10 Mean 7
- Difference is not in mean, difference is
dispersion - We define this with mean deviation, variance (s2
/ s2), or standard deviation (s / s),
3Std. Dev. Measures Dispersion
Less Dispersion
More Dispersion
4So we can look at IQ scores
s 15
70 85 µ100 115 130
5Sampling Distribution
- What if, instead of looking at the probability of
getting at or below a certain level, we took the
probability of drawing a sample of 10 people
whose average is at or below a certain level? - How will the shape of the distribution change?
6Mean IQ scores for samples of 10 people
s 5
90 95 µ100 105 110
7Sampling Distribution vs. Probability Distribution
Sampling Distribution s 5
Individual Probability Distribution s 15
8Sampling Distributions
- Sampling Distribution for means
- Take a sample of 10, get the mean
- Take another sample, get the mean
- Repeat samples, what is the distribution of the
mean? - Sampling Distribution for difference of means
- Take a sample of 10 men, a sample of 10 women,
find the difference between their means - Take another sample of 10 men and another sample
of 10 women. Find their difference between means - Repeat samples, what is the distribution of the
difference between means? - This distribution describes all possible
differences for samples of 10 men and 10 women
9Sampling Distributions
- How should we conceive of the sampling
distribution for a regression coefficient, b ? - Take a sample of 50 people and measure their
opinion on x and y. Compute b by the formula. - Take another sample of 50 people and measure
their opinion on x and y. Compute b again - Repeat samples, calculating b for each one.
- Sampling Distribution describes all possible
values of b for samples of size 50.
10Standard Error of b
- Standard Error is the Standard Deviation of a
sampling distribution - Recall that for a CI for means, we dont know
where µ is, but that we can estimate the standard
error, and know that wherever µ is, 95 of cases
lie within t standard errors of the mean. - We estimate the std. error and we can use t to
create a confidence interval or do a direct
hypothesis test
11Steps for a Confidence Interval for means
- Example People rate their approval of the Iraq
war on a scale from 0-100. We survey 30 people
and find a mean of 42 and a std. dev. of 13.
Estimate the true approval in the population. - Step 1 Get the information
- Mean 42
- Std. Dev. 13
- n 30
12- Step 2 Estimate the Std. Err.
- Step 3 Determine Degrees of Freedom, and choose
a value of t that tells us how far we must go
from the mean of the distribution to get 95 of
cases - d.f. 30-1 29
- t 2.045
13- Step 4 Plug and Chug
- How would you interpret this, both substantively
and statistically?
14Interpretation
- Our estimate is that the mean support score for
the Iraq war is 42 4.93 - In repeated samples of the same size from the
same population, 95 of all samples would yield
an interval that contains the true population
mean. - While it is possible that our sample is one of
the few that doesnt contain the true mean, it is
most likely that it does contain it.
15Same logic applies to Regression
- Step 1 Get your information
- Run your regression (in Stata or by hand)
- Find sample regression coefficient (b) and
estimated root mean square error from regression
16Step 2 Estimate Standard Error
- For 1 independent variable,
- Lets talk about this for a minute
- What is se2 ? Have we seen it before?
- Sum of Squared Errors (RSS)
- We dont know it, but RMSE is our estimate,
17Step 3 Determine Degrees of Freedom
- d.f. n k 1
- Choose an appropriate value of t from the table.
18Step 4 Calculate the C.I.
- What do we know about the samp. distrib. of b?
- We do not know the true value of ß
- We know something about the shape of the
distribution - If the 10 assumptions hold it is
distributed t with n-k-1 - degrees of freedom, with ß as its mean.
- We still dont know ß, but wherever it is, 95
possible sample bs are within t standard
deviations - If 95 of sample bs are within t std. devs. of
the mean, than we can make an interval around our
b and this strategy will, 95 times out of 100,
yield an interval that contains the true
population b.
19What do we Know Now ?
- If we took repeated samples, 95 would yield an
interval that contains the true ß - If our interval does not contain 0, we are 95
confident that ß ? 0 (but it could be that our
interval doesnt contain ß and that ß0, so there
is still 5 risk) - If our interval does contain 0, we cannot be sure
that ß ? 0. So, we say our value of b is not
statistically significant (we fail to reject the
null that ß0)
20Example (by hand)
- We want to predict someones FT score for Bush
in 2000 by knowing how the feel about Gore. We
sample 50 people - What is D.V., I.V. ?
- We find b -.82, RMSE 2.518
21Example (by hand)
- Step 3 Get t stuff
- d.f. n-k-1 50-1-148
- t2.021
- Step 4 Plug and Chug
- CI for b is (-.98, -.65). In repeated samples,
95 of CIs would contain ß
22In Stata
. regress bushft goreft Source SS
df MS Number of obs
50 ----------------------------------------
F( 1, 48) 96.83 Model
26535.5757 1 2735.4457 Prob gt F
0.0000 Residual 25531.2593 48
1403.3364 R-squared
0.6686 ----------------------------------------
Adj R-squared 0.6617 Total
43937.38 49 896.681224 Root MSE
2.518 -------------------------------------------
------------------------------ bushft
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
--------------------------------- goreft
-.8194653 .0832774 -9.84 0.000 -.9869057
-.6520249 _cons 95.43093 5.42934
17.58 0.000 84.51451 106.3474 ------------
--------------------------------------------------
-----------
Confidence Interval for b does not contain 0, it
is significant Confidence Interval for a does not
contain 0, it is significant
23If We had seen
. regress bushft perotft Source SS
df MS Number of obs
50 ----------------------------------------
F( 1, 48) 12.83 Model
29375.4547 1 29375.4547 Prob gt F
0.0434 Residual 14561.9253 48
303.373444 R-squared
0.1263 ----------------------------------------
Adj R-squared 0.0127 Total
43937.38 49 896.681224 Root MSE
27.418 ------------------------------------------
------------------------------- bushft
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
--------------------------------- perotft
-.3922048 .24779 -1.58 0.000 -.9869057
.2024961 _cons 51.43093 5.42934
12.58 0.000 36.51451 65.3474 -------------
--------------------------------------------------
----------
Confidence Interval for b contains 0, it is not
significant Confidence Interval for a does not
contain 0, it is significant
24Practice
- To highlight the way this works, lets show that
we can work backwards. - Can we figure out the standard error?
. regress bushft goreft Source SS
df MS Number of obs
50 ----------------------------------------
F( 1, 48) 96.83 Model
26535.5757 1 2735.4457 Prob gt F
0.0000 Residual 25531.2593 48
1403.3364 R-squared
0.6686 ----------------------------------------
Adj R-squared 0.6617 Total
43937.38 49 896.681224 Root MSE
2.518 -------------------------------------------
------------------------------ bushft
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
--------------------------------- goreft
-.8194653 .0832774 -9.84 0.000 -.9869057
-.6520249 _cons 95.43093 5.42934
17.58 0.000 84.51451 106.3474 ------------
--------------------------------------------------
-----------
?
25Yes!
- Formula for Confidence interval
- Fill in what we know
26C.I. for Multiple Regression
- Two differences
- Each coefficient (b1, b2, b3, bn) has its own
sampling distribution, so each one has its own
standard error. Each one could take on a totally
different range of values with changes in samples - Formula for Std. Error Changes
27One last trick
- Stata automatically gives 95 C.I. What if you
want 99 C.I.s? - Remember the formula? Only adjustment is in your
choice of t. You can go back and do this by
hand, using b t(S.E.), just choosing t from
the 99 column - Alternatively, tell Stata you want 99
- regress y x1 x2 x3, level(99)
28What you should know and be able to do
- Interpret the Confidence Intervals in Stata
output. Do this for all intervals (not just 95) - Calculate C.I.s for any level of risk by hand
given the formulas and necessary information - Do both of these in both bivariate and multiple
regression settings - Work backwards through this, given necessary
information
29Direct Hypothesis Testing with Regression
Coefficients
- Recall that when we wanted to see a difference
between two means, we could either to C.I.
approach or Hypo Test - If we assume the real difference is 0 in the
population, we can calculate a t score an assess
the chances of getting a difference this big by
sampling error alone.
30- A random sample of 16 governments from European
countries with parliamentary systems finds an
average government length of 3.4 years with a
standard deviation of 1.2 years. 11 randomly
sampled Non-European countries with parliamentary
systems had an average government duration of 2.7
years with a standard deviation of 1.5 years. - Test the hypothesis that European Governments and
Non-European governments differ in average
duration.
31Hypo Test Steps
- State the null and alternative hypothesis
- -Null No Difference Between Euro govts and
non-Euro gvts. - -Alternative European and non-European govts
are different - Compute the standard error for the sampling
distribution for difference of means
n1 16 x1 3.4 S1 1.2
n2 11 x2 2.7 S2 1.5
32With a Direct Hypothesis Test
- 3. Get an appropriate t value from the t table
(2.060, for 25 d.f. ). - 4. Compute the t score for your difference
33Conclusion
- Step 5 The t-score we computed (1.29) is less
than the critical t from the table (2.060), so we
fail to reject the null hypothesis - In repeated samples of these sizes from the same
population, more than 5 of samples would show a
difference this big by sampling error alone
34By the Pictures
-2.060
2.060
1.29
0
Hypothesis Test Units of Standard Deviation
35Hypothesis Testing with Regression Coefficients
- Step 1 State Hypothesis
- Null ß 0 (No relationship between x and y)
- Alternative ß ? 0 (Some relationship)
- Step 2 Compute the Standard Error of the
sampling distribution for b
36Hypo Test Steps
- Step 3 Choose critical t (t) from the table
- Recall that if Se2 is normally distributed, we
can use the t distribution with n-k-1 degrees of
freedom - This gives us the range in which 95 of values b
could take on by random chance (given our sample
size) if the true population regression
coefficient is zero
37Hypo Test Steps
- Step 4 Compute t score for your data
- Step 5 If your t-score is greater than the
critical value, t, from the table, we reject the
null hypothesis. If your t score is less than
the critical value, we fail to reject the null
hypothesis
38Example
- Regression using a sample of size 50 yields the
following equation - ADA Score 5.04 RegDems.00047
- Std. Err. for a1.45, for b.00018
- Step 1 State Nulls
- Step 2 Standard Errors given
- Step 3 Choose t 2.021
39Example
- Step 4 Calculate t from your data
- Step 5 For a, t gt t, so we reject the null
hypothesis (it is significant). For b, t gtt,
so we again reject the null (it is also
statistically significant).
40Review Confidence Levels
- We are used to setting a (level of risk) at .05.
This gives a 95 level of confidence - We have also switched to 99 or 90 levels of
confidence (a .01 or .1, respectively) - What are the tradeoffs involved?
- Said another way, a represents Type I Error,
while (1- a) represents Type II Error
41New Concept p values
- Some regression coefficients might be
significant (we reject the null) at the 95
confidence level (a .05), but not significant
at the 99 level. - Others might be significant at the 99.99 level
but we dont realize it if we only look at the
95 level - What if we could know the exact smallest level
of a at which we still reject the null - i.e., We reject at 95, we fail to reject at 99,
we could search and find that it is significant
at 97.4 level (a2.6), but not at 97.5 (a2.5)
42How is this done?
- Doing this many confidence intervals by hand
would ultimately be painful - Remember, though that
- So we can just go to the t-table and scan across
the columns, and see what the best we can do is.
Of course, we only have .20, .10, .05, .025, .01,
and .001 on our tables, so we cannot be terribly
precise - Suppose t 2.87, two-tailed. Assume 13 d.f.
43- t2.87
- So our p value is less than .02, but more than .01
44How is this done?
- Stata does it automatically and with precision.
- If this column lt .05, we reject the null
. regress turnout diplomau mdnincm Source
SS df MS Number of obs
426 ---------------------------------------
F( 2, 423) 31.37 Model 1.4313e11
2 7.1565e10 Prob gt F
0.0000 Residual 9.6512e11 423 2.2816e09
R-squared 0.1291 -----------------------
---------------- Adj R-squared 0.1250
Total 1.1082e12 425 2.6076e09 Root
MSE 47766 ------------------------------
-------------------------------------- turnout
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
---------------------------- diplomau 1101.359
504.4476 2.18 0.030 109.823 2092.895
mdnincm 1.111589 .4325834 2.65 0.009
.261308 1.961869 _cons 154154.4 9641.523
15.99 0.000 135203.1 173105.6 ------------
--------------------------------------------------
------
45Substantive significance
- A variable may be statistically significant at
the .05 level (95 confidence level) - This does not mean this variable is very
important. The coefficient could be significant
but very small. - Example States try to reduce high school class
sizes to improve the quality of education - We find these results
46- Dep. Var. Index for educational quality ranging
from 0-100 - Interpret Class Size
- Statistically Significant
- Substantively ?
- Interpret Spending
- Statistically Significant
- Substantively Sig.
- Interpret Med. Income
- Not Statistically Sig.
- Constant
- Statistically Sig.
- No independent interp.
Coefficient (95 C.I.)
Class Size -.104 (-.192, -.016)
Education Spending (per pupil) .012 (.008, .016)
Median Income (in 1,000s) .042 (-.0012, .0852)
Constant 24.35 (20.02, 28.68)
47What you should know and be able to do
- Execute a hypothesis test for the significance of
regression coefficients by hand given b and the
standard error of b (or a and the standard error
of a) - Interpret the results of a by-hand hypothesis
test - Interpret the hypothesis-test output in Stata,
including t-scores and p-values - Explain what the standard error of b means
- Explain hypothesis tests from the standpoint of
repeated samples - Evaluate substantive Significance
48Old trick for a new dog
- We did one-tailed tests for difference of means
when we knew that one group would be greater than
the other - If we know that there is a positive relationship
between x and y (b gt 0), we can specify a
one-tailed test - Same holds for knowing that there is a negative
relationship (b lt 0)
49Guidelines for 1-tailed test
- You must specify 1-tailed before you type regress
- Stata doesnt do 1-tailed tests
- You must convert Statas two-tailed test into a
one-tailed test - Take the reported p-value (not the t-value!) and
divide by two to get the one-tailed p-value (then
compare that to .05 or .01 or whatever)
50Example
- Stata reports a p-value for education of .03 If
we divide by two, the one-tailed p-value is .015 - The most important difference will be for values
between .10 and .05 (they become sig. at .05-.25
level)
. regress turnout diplomau mdnincm Source
SS df MS Number of obs
426 ---------------------------------------
F( 2, 423) 31.37 Model 1.4313e11
2 7.1565e10 Prob gt F
0.0000 Residual 9.6512e11 423 2.2816e09
R-squared 0.1291 -----------------------
---------------- Adj R-squared 0.1250
Total 1.1082e12 425 2.6076e09 Root
MSE 47766 ------------------------------
-------------------------------------- turnout
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
---------------------------- diplomau 1101.359
504.4476 2.18 0.030 109.823 2092.895
mdnincm 1.111589 .4325834 2.65 0.009
.261308 1.961869 _cons 154154.4 9641.523
15.99 0.000 135203.1 173105.6 ------------
--------------------------------------------------
------
51New Trick, New Dog
- Remember, we are testing to see if the sample
regression coefficient, b, is different from 0
in the population - What if we wanted to test to make sure the
coefficient was different from 1 in the
population?
52Example
- The NES often calls people before and after the
election and gauges their F.T. score for
candidates. - We might be interested in knowing how their
pre-election score effects their post-election
score - If , there is no
change - If , people
increased their ratings of Bush - If , people
decreased their ratings of Bush
53Why test 1 and not 0?
- We are sure there pre affects post, we want
to know if there is change or if there is no
change - If there is no change, there is no rally around
the leader effect - We need to know if ß1, not just if b0
54What is the difference?
- All steps of hypothesis testing remain the same
except - Null Hypothesis ß 1 (not ß 0)
- Alternative Hypothesis ß ? 1 (instead of ß ? 0)
- The t-score. Instead of dividing b by zero, we
divide (b-1) by zero (if we are testing b against
1). - Alternatively, we can look at our confidence
interval to see if it contains 1
55Given Stata Output
. regress postbush prebush Source SS
df MS Number of obs
1000 ----------------------------------------
F( 1, 998) 3179.96 Model 862797.186
1 862797.186 Prob gt F 0.0000
Residual 270780.456 998 271.323102
R-squared 0.7611 -------------------------
--------------- Adj R-squared 0.7609
Total 1133577.64 999 1134.71235 Root
MSE 16.472 ------------------------------
--------------------------------------- postbush
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
----------------------------- prebush
1.003752 .0177998 56.39 0.000 .9688223
1.038681 _cons 19.19367 1.019167 18.83
0.000 17.19371 21.19362 ---------------------
------------------------------------------------
- Confidence interval contains 1, we fail to reject
the null. -
, less than t, 1.96
56What if my variables are not significant ?
- Insignificant variables, for all we know, are
unrelated to the Dependent Variable in the
population. - They still represent the sample regression
function - If you are convinced it really was sampling
error, you can try collecting a new sample - Dont just drop out insignificant variables
57Specification searches
- You may find that including or excluding certain
variables improves things. - This can lead to model searches where you try
to find the model that makes your key variable
work best, or gives the smallest RMSE - This is bad. The hypothesis tests assume you are
testing one specification on one randomly drawn
set of data - Each time you respecify, you increase the chances
of Type I error (finding a significant results
when there is none)
58Example
- I created 20 new variables. Each one consists of
randomly selected numbers between 0 and 1. - I use these 20 random variables to predict
turnout in the turnout 2000 dataset
59. regress turnout blah1-blah20 output
ommitted ----------------------------------------
----------------------------- turnout Coef.
Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
----------------------------- blah1 6998.188
8636.168 0.81 0.418 -9979.126 23975.5
blah2 -8231.304 8514.165 -0.97 0.334
-24968.78 8506.171 blah3 -5606.377
9151.171 -0.61 0.540 -23596.1 12383.35
blah4 6530.433 9005.308 0.73 0.469
-11172.55 24233.42 blah5 -6244.087
8705.139 -0.72 0.474 -23356.98 10868.81
blah6 6738.672 8891.378 0.76 0.449
-10740.34 24217.69 blah7 -3470.607
8745.91 -0.40 0.692 -20663.65 13722.44
blah8 3793.932 8602.303 0.44 0.659
-13116.81 20704.67 blah9 18200.8
8605.186 2.12 0.035 1284.393
35117.21 blah10 -10410.87 8953.021 -1.16
0.246 -28011.07 7189.323 blah11 7139.257
8799.115 0.81 0.418 -10158.38
24436.9 blah12 -7153.182 8729.034 -0.82
0.413 -24313.05 10006.69 blah13 10615.99
8772.349 1.21 0.227 -6629.034
27861.01 blah14 5923.244 8771.629 0.68
0.500 -11320.36 23166.85 blah15 -8837.456
8436.582 -1.05 0.295 -25422.41
7747.503 blah16 1169.621 8719.185 0.13
0.893 -15970.89 18310.13 blah17 7458.636
8567.25 0.87 0.384 -9383.195
24300.47 blah18 22153.02 9047.698 2.45
0.015 4366.703 39939.33 blah19 1652.157
8698.105 0.19 0.849 -15446.91
18751.23 blah20 -3329.179 8815.673 -0.38
0.706 -20659.37 14001.01 _cons 193705.8
20601.31 9.40 0.000 153207
234204.7 -----------------------------------------
-----------------------------
60Problem
- Each has some probability (5/100 or 1/20) of
being significant when the relationship exists
because of sampling error. - We could choose the couple of variables that we
know are unrelated (they are randomly generated)
but are significant, and put them in - If we repeat with another 20 variables, we keep
finding significant variables by sampling
error. - If we drop the insignificant variable and keep
the significant ones, we mess up the
probabilities in the later analysis.
61 . regress turnout diplomau mdnincm blah9
blah18 Source SS df MS
Number of obs 426 ---------------------
------------------ F( 4, 421) 18.16
Model 1.6308e11 4 4.0770e10 Prob
gt F 0.0000 Residual 9.4516e11 421
2.2450e09 R-squared
0.1472 ---------------------------------------
Adj R-squared 0.1390 Total 1.1082e12
425 2.6076e09 Root MSE
47382 -------------------------------------------
-------------------------- turnout Coef.
Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
----------------------------- diplomau
1143.341 500.6077 2.28 0.023 159.3394
2127.343 mdnincm 1.026752 .4316698 2.38
0.018 .1782557 1.875249 blah9 19881.04
7898.252 2.52 0.012 4356.114 35405.96
blah18 13559.12 8325.046 1.63 0.104
-2804.717 29922.95 _cons 139305.5
10783.39 12.92 0.000 118109.5
160501.4 -----------------------------------------
----------------------------
62Moral of the Story
- Be wary of specification searches
- Specify your model before hand, then go to Stata
- Dont drop out insignificant variables
- If you try enough variables, eventually, you can
get a regression full of variables that appear
significant, but probably are not - Schemes to try different combinations of
variables to maximize significance are generally
called stepwise regression or regression using
stepwise entry. - Such schemes have little use.
63Testing multiple coefficients
- We may be interested in knowing the probability
that all of the coefficients are actually zero in
the population. - How do we make this comparison?
- Here is our model
- Here is our model if all of the coefficients are
zero
64Joint Hypothesis Test
- Why do this?
- Gives us a sense if this combination of variables
collectively does anything to explain y - If we have two independent variables correlated
above r.9, we may find that one or both is
insignificant due to collinearity. We can test
to see if they jointly have an effect. - This is not the same as the hypothesis test for
each variable (given by the t-test) - This doesnt say all of the variables are
significant, but that jointly the regression
predicts y better than using the mean alone
65How does it work?
- It sounds a lot like r2. It is
- We compare how well the regression predicts y
with how well the restricted model (the model
where all of the coefficients are zero, resulting
in a model based on the mean alone) - To avoid the pitfalls of r2, we want it to
account for sample size and the number of
independent variables.
66Working Parts
- The working parts for computing F are found in
Stata output.
. regress hispthrm immithrm prej Source
SS df MS Number of obs
2059 ---------------------------------------
F( 2, 2056) 422.27 Model 217673.966
2 108836.983 Prob gt F
0.0000 Residual 529917.313 2056 257.741884
R-squared 0.2912 ----------------------
----------------- Adj R-squared 0.2905
Total 747591.279 2058 363.261069 Root
MSE 16.054 ------------------------------
--------------------------------------- hispthrm
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
----------------------------------- immithrm
.4837819 .0180266 26.84 0.000 .4484297
.5191341 prej .6161944 .0747776 8.24
0.000 .4695466 .7628421 _cons 19.85906
1.849749 10.74 0.000 16.23148
23.48664 -----------------------------------------
----------------------------
RSS / n-k-1 RMS
RegSS / k Reg MS
TSS / n-1 TMS
F 108,836.983 / 257.741884 422.27123
67Working Parts
- Of course, Stata also gives it directly
. regress hispthrm immithrm prej Source
SS df MS Number of obs
2059 ---------------------------------------
F( 2, 2056) 422.27 Model 217673.966
2 108836.983 Prob gt F
0.0000 Residual 529917.313 2056 257.741884
R-squared 0.2912 ----------------------
----------------- Adj R-squared 0.2905
Total 747591.279 2058 363.261069 Root
MSE 16.054 ------------------------------
--------------------------------------- hispthrm
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
----------------------------------- immithrm
.4837819 .0180266 26.84 0.000 .4484297
.5191341 prej .6161944 .0747776 8.24
0.000 .4695466 .7628421 _cons 19.85906
1.849749 10.74 0.000 16.23148
23.48664 -----------------------------------------
----------------------------
68Samples and Populations
- Of course, this F-statistic is based on our
sample. It may be that in the population, the
bs have no real effect - This test is not normally distributed. It has
its own distribution, the F-distribution - The numerator has k degrees of freedom
- The denominator has n-k-1 degrees of freedom
- Using those degrees of freedom, you can take your
value of F to an F table. If your F is greater
than F from the table, the coefficients are
jointly significant - If your F is smaller than F, none of your
regression coefficients matter. This is bad
news. - Rather than sending you to a table, Stata gives
you a p-value
69Working Parts
- If this p-value is less than .05, your regression
coefficients are not jointly 0 in the population
(your regression has meaning)
. regress hispthrm immithrm prej Source
SS df MS Number of obs
2059 ---------------------------------------
F( 2, 2056) 422.27 Model 217673.966
2 108836.983 Prob gt F
0.0000 Residual 529917.313 2056 257.741884
R-squared 0.2912 ----------------------
----------------- Adj R-squared 0.2905
Total 747591.279 2058 363.261069 Root
MSE 16.054 ------------------------------
--------------------------------------- hispthrm
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
----------------------------------- immithrm
.4837819 .0180266 26.84 0.000 .4484297
.5191341 prej .6161944 .0747776 8.24
0.000 .4695466 .7628421 _cons 19.85906
1.849749 10.74 0.000 16.23148
23.48664 -----------------------------------------
----------------------------
Interpretation In the population, this set of
variables has a real effect on y.
70What you should know and be able to do
- Perform a 1-tailed hypothesis test given Stata
output (or given b, std. err. of b, and a
one-tailed t-table) - Perform Hypothesis tests against null hypotheses
other than b 0. - Understand the consequences of adding or dropping
variables (specification searches) - Conduct an F-test to see if the coefficients are
jointly significant