Our Friend, the Standard Error - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Our Friend, the Standard Error

Description:

Mean IQ scores for samples of 10 people. 90 95 =100 105 110. s = 5 ... European Governments and Non-European governments differ in average duration. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 71
Provided by: Dam948
Category:

less

Transcript and Presenter's Notes

Title: Our Friend, the Standard Error


1
Our Friend, the Standard Error
2
What is a Standard Error again?
  • Think back to the very first day. We were
    summarizing variables.
  • We wanted to describe dispersion
  • Candy Bar consumption
  • One group consumes 8,5, 6, 7, 9 Mean 7
  • Another group 2, 4, 12, 7, 10 Mean 7
  • Difference is not in mean, difference is
    dispersion
  • We define this with mean deviation, variance (s2
    / s2), or standard deviation (s / s),

3
Std. Dev. Measures Dispersion
Less Dispersion
More Dispersion
4
So we can look at IQ scores
s 15
70 85 µ100 115 130
5
Sampling Distribution
  • What if, instead of looking at the probability of
    getting at or below a certain level, we took the
    probability of drawing a sample of 10 people
    whose average is at or below a certain level?
  • How will the shape of the distribution change?

6
Mean IQ scores for samples of 10 people
s 5
90 95 µ100 105 110
7
Sampling Distribution vs. Probability Distribution
Sampling Distribution s 5
Individual Probability Distribution s 15
8
Sampling Distributions
  • Sampling Distribution for means
  • Take a sample of 10, get the mean
  • Take another sample, get the mean
  • Repeat samples, what is the distribution of the
    mean?
  • Sampling Distribution for difference of means
  • Take a sample of 10 men, a sample of 10 women,
    find the difference between their means
  • Take another sample of 10 men and another sample
    of 10 women. Find their difference between means
  • Repeat samples, what is the distribution of the
    difference between means?
  • This distribution describes all possible
    differences for samples of 10 men and 10 women

9
Sampling Distributions
  • How should we conceive of the sampling
    distribution for a regression coefficient, b ?
  • Take a sample of 50 people and measure their
    opinion on x and y. Compute b by the formula.
  • Take another sample of 50 people and measure
    their opinion on x and y. Compute b again
  • Repeat samples, calculating b for each one.
  • Sampling Distribution describes all possible
    values of b for samples of size 50.

10
Standard Error of b
  • Standard Error is the Standard Deviation of a
    sampling distribution
  • Recall that for a CI for means, we dont know
    where µ is, but that we can estimate the standard
    error, and know that wherever µ is, 95 of cases
    lie within t standard errors of the mean.
  • We estimate the std. error and we can use t to
    create a confidence interval or do a direct
    hypothesis test

11
Steps for a Confidence Interval for means
  • Example People rate their approval of the Iraq
    war on a scale from 0-100. We survey 30 people
    and find a mean of 42 and a std. dev. of 13.
    Estimate the true approval in the population.
  • Step 1 Get the information
  • Mean 42
  • Std. Dev. 13
  • n 30

12
  • Step 2 Estimate the Std. Err.
  • Step 3 Determine Degrees of Freedom, and choose
    a value of t that tells us how far we must go
    from the mean of the distribution to get 95 of
    cases
  • d.f. 30-1 29
  • t 2.045

13
  • Step 4 Plug and Chug
  • How would you interpret this, both substantively
    and statistically?

14
Interpretation
  • Our estimate is that the mean support score for
    the Iraq war is 42 4.93
  • In repeated samples of the same size from the
    same population, 95 of all samples would yield
    an interval that contains the true population
    mean.
  • While it is possible that our sample is one of
    the few that doesnt contain the true mean, it is
    most likely that it does contain it.

15
Same logic applies to Regression
  • Step 1 Get your information
  • Run your regression (in Stata or by hand)
  • Find sample regression coefficient (b) and
    estimated root mean square error from regression

16
Step 2 Estimate Standard Error
  • For 1 independent variable,
  • Lets talk about this for a minute
  • What is se2 ? Have we seen it before?
  • Sum of Squared Errors (RSS)
  • We dont know it, but RMSE is our estimate,

17
Step 3 Determine Degrees of Freedom
  • d.f. n k 1
  • Choose an appropriate value of t from the table.

18
Step 4 Calculate the C.I.
  • What do we know about the samp. distrib. of b?
  • We do not know the true value of ß
  • We know something about the shape of the
    distribution
  • If the 10 assumptions hold it is
    distributed t with n-k-1
  • degrees of freedom, with ß as its mean.
  • We still dont know ß, but wherever it is, 95
    possible sample bs are within t standard
    deviations
  • If 95 of sample bs are within t std. devs. of
    the mean, than we can make an interval around our
    b and this strategy will, 95 times out of 100,
    yield an interval that contains the true
    population b.

19
What do we Know Now ?
  • If we took repeated samples, 95 would yield an
    interval that contains the true ß
  • If our interval does not contain 0, we are 95
    confident that ß ? 0 (but it could be that our
    interval doesnt contain ß and that ß0, so there
    is still 5 risk)
  • If our interval does contain 0, we cannot be sure
    that ß ? 0. So, we say our value of b is not
    statistically significant (we fail to reject the
    null that ß0)

20
Example (by hand)
  • We want to predict someones FT score for Bush
    in 2000 by knowing how the feel about Gore. We
    sample 50 people
  • What is D.V., I.V. ?
  • We find b -.82, RMSE 2.518

21
Example (by hand)
  • Step 3 Get t stuff
  • d.f. n-k-1 50-1-148
  • t2.021
  • Step 4 Plug and Chug
  • CI for b is (-.98, -.65). In repeated samples,
    95 of CIs would contain ß

22
In Stata
. regress bushft goreft Source SS
df MS Number of obs
50 ----------------------------------------
F( 1, 48) 96.83 Model
26535.5757 1 2735.4457 Prob gt F
0.0000 Residual 25531.2593 48
1403.3364 R-squared
0.6686 ----------------------------------------
Adj R-squared 0.6617 Total
43937.38 49 896.681224 Root MSE
2.518 -------------------------------------------
------------------------------ bushft
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
--------------------------------- goreft
-.8194653 .0832774 -9.84 0.000 -.9869057
-.6520249 _cons 95.43093 5.42934
17.58 0.000 84.51451 106.3474 ------------
--------------------------------------------------
-----------
Confidence Interval for b does not contain 0, it
is significant Confidence Interval for a does not
contain 0, it is significant
23
If We had seen
. regress bushft perotft Source SS
df MS Number of obs
50 ----------------------------------------
F( 1, 48) 12.83 Model
29375.4547 1 29375.4547 Prob gt F
0.0434 Residual 14561.9253 48
303.373444 R-squared
0.1263 ----------------------------------------
Adj R-squared 0.0127 Total
43937.38 49 896.681224 Root MSE
27.418 ------------------------------------------
------------------------------- bushft
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
--------------------------------- perotft
-.3922048 .24779 -1.58 0.000 -.9869057
.2024961 _cons 51.43093 5.42934
12.58 0.000 36.51451 65.3474 -------------
--------------------------------------------------
----------
Confidence Interval for b contains 0, it is not
significant Confidence Interval for a does not
contain 0, it is significant
24
Practice
  • To highlight the way this works, lets show that
    we can work backwards.
  • Can we figure out the standard error?

. regress bushft goreft Source SS
df MS Number of obs
50 ----------------------------------------
F( 1, 48) 96.83 Model
26535.5757 1 2735.4457 Prob gt F
0.0000 Residual 25531.2593 48
1403.3364 R-squared
0.6686 ----------------------------------------
Adj R-squared 0.6617 Total
43937.38 49 896.681224 Root MSE
2.518 -------------------------------------------
------------------------------ bushft
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
--------------------------------- goreft
-.8194653 .0832774 -9.84 0.000 -.9869057
-.6520249 _cons 95.43093 5.42934
17.58 0.000 84.51451 106.3474 ------------
--------------------------------------------------
-----------
?
25
Yes!
  • Formula for Confidence interval
  • Fill in what we know

26
C.I. for Multiple Regression
  • Two differences
  • Each coefficient (b1, b2, b3, bn) has its own
    sampling distribution, so each one has its own
    standard error. Each one could take on a totally
    different range of values with changes in samples
  • Formula for Std. Error Changes

27
One last trick
  • Stata automatically gives 95 C.I. What if you
    want 99 C.I.s?
  • Remember the formula? Only adjustment is in your
    choice of t. You can go back and do this by
    hand, using b t(S.E.), just choosing t from
    the 99 column
  • Alternatively, tell Stata you want 99
  • regress y x1 x2 x3, level(99)

28
What you should know and be able to do
  • Interpret the Confidence Intervals in Stata
    output. Do this for all intervals (not just 95)
  • Calculate C.I.s for any level of risk by hand
    given the formulas and necessary information
  • Do both of these in both bivariate and multiple
    regression settings
  • Work backwards through this, given necessary
    information

29
Direct Hypothesis Testing with Regression
Coefficients
  • Recall that when we wanted to see a difference
    between two means, we could either to C.I.
    approach or Hypo Test
  • If we assume the real difference is 0 in the
    population, we can calculate a t score an assess
    the chances of getting a difference this big by
    sampling error alone.

30
  • A random sample of 16 governments from European
    countries with parliamentary systems finds an
    average government length of 3.4 years with a
    standard deviation of 1.2 years. 11 randomly
    sampled Non-European countries with parliamentary
    systems had an average government duration of 2.7
    years with a standard deviation of 1.5 years.
  • Test the hypothesis that European Governments and
    Non-European governments differ in average
    duration.

31
Hypo Test Steps
  • State the null and alternative hypothesis
  • -Null No Difference Between Euro govts and
    non-Euro gvts.
  • -Alternative European and non-European govts
    are different
  • Compute the standard error for the sampling
    distribution for difference of means

n1 16 x1 3.4 S1 1.2
n2 11 x2 2.7 S2 1.5
32
With a Direct Hypothesis Test
  • 3. Get an appropriate t value from the t table
    (2.060, for 25 d.f. ).
  • 4. Compute the t score for your difference

33
Conclusion
  • Step 5 The t-score we computed (1.29) is less
    than the critical t from the table (2.060), so we
    fail to reject the null hypothesis
  • In repeated samples of these sizes from the same
    population, more than 5 of samples would show a
    difference this big by sampling error alone

34
By the Pictures
-2.060
2.060
1.29
0
Hypothesis Test Units of Standard Deviation
35
Hypothesis Testing with Regression Coefficients
  • Step 1 State Hypothesis
  • Null ß 0 (No relationship between x and y)
  • Alternative ß ? 0 (Some relationship)
  • Step 2 Compute the Standard Error of the
    sampling distribution for b

36
Hypo Test Steps
  • Step 3 Choose critical t (t) from the table
  • Recall that if Se2 is normally distributed, we
    can use the t distribution with n-k-1 degrees of
    freedom
  • This gives us the range in which 95 of values b
    could take on by random chance (given our sample
    size) if the true population regression
    coefficient is zero

37
Hypo Test Steps
  • Step 4 Compute t score for your data
  • Step 5 If your t-score is greater than the
    critical value, t, from the table, we reject the
    null hypothesis. If your t score is less than
    the critical value, we fail to reject the null
    hypothesis

38
Example
  • Regression using a sample of size 50 yields the
    following equation
  • ADA Score 5.04 RegDems.00047
  • Std. Err. for a1.45, for b.00018
  • Step 1 State Nulls
  • Step 2 Standard Errors given
  • Step 3 Choose t 2.021

39
Example
  • Step 4 Calculate t from your data
  • Step 5 For a, t gt t, so we reject the null
    hypothesis (it is significant). For b, t gtt,
    so we again reject the null (it is also
    statistically significant).

40
Review Confidence Levels
  • We are used to setting a (level of risk) at .05.
    This gives a 95 level of confidence
  • We have also switched to 99 or 90 levels of
    confidence (a .01 or .1, respectively)
  • What are the tradeoffs involved?
  • Said another way, a represents Type I Error,
    while (1- a) represents Type II Error

41
New Concept p values
  • Some regression coefficients might be
    significant (we reject the null) at the 95
    confidence level (a .05), but not significant
    at the 99 level.
  • Others might be significant at the 99.99 level
    but we dont realize it if we only look at the
    95 level
  • What if we could know the exact smallest level
    of a at which we still reject the null
  • i.e., We reject at 95, we fail to reject at 99,
    we could search and find that it is significant
    at 97.4 level (a2.6), but not at 97.5 (a2.5)

42
How is this done?
  • Doing this many confidence intervals by hand
    would ultimately be painful
  • Remember, though that
  • So we can just go to the t-table and scan across
    the columns, and see what the best we can do is.
    Of course, we only have .20, .10, .05, .025, .01,
    and .001 on our tables, so we cannot be terribly
    precise
  • Suppose t 2.87, two-tailed. Assume 13 d.f.

43
  • t2.87
  • So our p value is less than .02, but more than .01

44
How is this done?
  • Stata does it automatically and with precision.
  • If this column lt .05, we reject the null

. regress turnout diplomau mdnincm Source
SS df MS Number of obs
426 ---------------------------------------
F( 2, 423) 31.37 Model 1.4313e11
2 7.1565e10 Prob gt F
0.0000 Residual 9.6512e11 423 2.2816e09
R-squared 0.1291 -----------------------
---------------- Adj R-squared 0.1250
Total 1.1082e12 425 2.6076e09 Root
MSE 47766 ------------------------------
-------------------------------------- turnout
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
---------------------------- diplomau 1101.359
504.4476 2.18 0.030 109.823 2092.895
mdnincm 1.111589 .4325834 2.65 0.009
.261308 1.961869 _cons 154154.4 9641.523
15.99 0.000 135203.1 173105.6 ------------
--------------------------------------------------
------
45
Substantive significance
  • A variable may be statistically significant at
    the .05 level (95 confidence level)
  • This does not mean this variable is very
    important. The coefficient could be significant
    but very small.
  • Example States try to reduce high school class
    sizes to improve the quality of education
  • We find these results

46
  • Dep. Var. Index for educational quality ranging
    from 0-100
  • Interpret Class Size
  • Statistically Significant
  • Substantively ?
  • Interpret Spending
  • Statistically Significant
  • Substantively Sig.
  • Interpret Med. Income
  • Not Statistically Sig.
  • Constant
  • Statistically Sig.
  • No independent interp.

Coefficient (95 C.I.)
Class Size -.104 (-.192, -.016)
Education Spending (per pupil) .012 (.008, .016)
Median Income (in 1,000s) .042 (-.0012, .0852)
Constant 24.35 (20.02, 28.68)
47
What you should know and be able to do
  • Execute a hypothesis test for the significance of
    regression coefficients by hand given b and the
    standard error of b (or a and the standard error
    of a)
  • Interpret the results of a by-hand hypothesis
    test
  • Interpret the hypothesis-test output in Stata,
    including t-scores and p-values
  • Explain what the standard error of b means
  • Explain hypothesis tests from the standpoint of
    repeated samples
  • Evaluate substantive Significance

48
Old trick for a new dog
  • We did one-tailed tests for difference of means
    when we knew that one group would be greater than
    the other
  • If we know that there is a positive relationship
    between x and y (b gt 0), we can specify a
    one-tailed test
  • Same holds for knowing that there is a negative
    relationship (b lt 0)

49
Guidelines for 1-tailed test
  • You must specify 1-tailed before you type regress
  • Stata doesnt do 1-tailed tests
  • You must convert Statas two-tailed test into a
    one-tailed test
  • Take the reported p-value (not the t-value!) and
    divide by two to get the one-tailed p-value (then
    compare that to .05 or .01 or whatever)

50
Example
  • Stata reports a p-value for education of .03 If
    we divide by two, the one-tailed p-value is .015
  • The most important difference will be for values
    between .10 and .05 (they become sig. at .05-.25
    level)

. regress turnout diplomau mdnincm Source
SS df MS Number of obs
426 ---------------------------------------
F( 2, 423) 31.37 Model 1.4313e11
2 7.1565e10 Prob gt F
0.0000 Residual 9.6512e11 423 2.2816e09
R-squared 0.1291 -----------------------
---------------- Adj R-squared 0.1250
Total 1.1082e12 425 2.6076e09 Root
MSE 47766 ------------------------------
-------------------------------------- turnout
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
---------------------------- diplomau 1101.359
504.4476 2.18 0.030 109.823 2092.895
mdnincm 1.111589 .4325834 2.65 0.009
.261308 1.961869 _cons 154154.4 9641.523
15.99 0.000 135203.1 173105.6 ------------
--------------------------------------------------
------
51
New Trick, New Dog
  • Remember, we are testing to see if the sample
    regression coefficient, b, is different from 0
    in the population
  • What if we wanted to test to make sure the
    coefficient was different from 1 in the
    population?

52
Example
  • The NES often calls people before and after the
    election and gauges their F.T. score for
    candidates.
  • We might be interested in knowing how their
    pre-election score effects their post-election
    score
  • If , there is no
    change
  • If , people
    increased their ratings of Bush
  • If , people
    decreased their ratings of Bush

53
Why test 1 and not 0?
  • We are sure there pre affects post, we want
    to know if there is change or if there is no
    change
  • If there is no change, there is no rally around
    the leader effect
  • We need to know if ß1, not just if b0

54
What is the difference?
  • All steps of hypothesis testing remain the same
    except
  • Null Hypothesis ß 1 (not ß 0)
  • Alternative Hypothesis ß ? 1 (instead of ß ? 0)
  • The t-score. Instead of dividing b by zero, we
    divide (b-1) by zero (if we are testing b against
    1).
  • Alternatively, we can look at our confidence
    interval to see if it contains 1

55
Given Stata Output
. regress postbush prebush Source SS
df MS Number of obs
1000 ----------------------------------------
F( 1, 998) 3179.96 Model 862797.186
1 862797.186 Prob gt F 0.0000
Residual 270780.456 998 271.323102
R-squared 0.7611 -------------------------
--------------- Adj R-squared 0.7609
Total 1133577.64 999 1134.71235 Root
MSE 16.472 ------------------------------
--------------------------------------- postbush
Coef. Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
----------------------------- prebush
1.003752 .0177998 56.39 0.000 .9688223
1.038681 _cons 19.19367 1.019167 18.83
0.000 17.19371 21.19362 ---------------------
------------------------------------------------
  1. Confidence interval contains 1, we fail to reject
    the null.

  2. , less than t, 1.96

56
What if my variables are not significant ?
  • Insignificant variables, for all we know, are
    unrelated to the Dependent Variable in the
    population.
  • They still represent the sample regression
    function
  • If you are convinced it really was sampling
    error, you can try collecting a new sample
  • Dont just drop out insignificant variables

57
Specification searches
  • You may find that including or excluding certain
    variables improves things.
  • This can lead to model searches where you try
    to find the model that makes your key variable
    work best, or gives the smallest RMSE
  • This is bad. The hypothesis tests assume you are
    testing one specification on one randomly drawn
    set of data
  • Each time you respecify, you increase the chances
    of Type I error (finding a significant results
    when there is none)

58
Example
  • I created 20 new variables. Each one consists of
    randomly selected numbers between 0 and 1.
  • I use these 20 random variables to predict
    turnout in the turnout 2000 dataset

59
. regress turnout blah1-blah20 output
ommitted ----------------------------------------
----------------------------- turnout Coef.
Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
----------------------------- blah1 6998.188
8636.168 0.81 0.418 -9979.126 23975.5
blah2 -8231.304 8514.165 -0.97 0.334
-24968.78 8506.171 blah3 -5606.377
9151.171 -0.61 0.540 -23596.1 12383.35
blah4 6530.433 9005.308 0.73 0.469
-11172.55 24233.42 blah5 -6244.087
8705.139 -0.72 0.474 -23356.98 10868.81
blah6 6738.672 8891.378 0.76 0.449
-10740.34 24217.69 blah7 -3470.607
8745.91 -0.40 0.692 -20663.65 13722.44
blah8 3793.932 8602.303 0.44 0.659
-13116.81 20704.67 blah9 18200.8
8605.186 2.12 0.035 1284.393
35117.21 blah10 -10410.87 8953.021 -1.16
0.246 -28011.07 7189.323 blah11 7139.257
8799.115 0.81 0.418 -10158.38
24436.9 blah12 -7153.182 8729.034 -0.82
0.413 -24313.05 10006.69 blah13 10615.99
8772.349 1.21 0.227 -6629.034
27861.01 blah14 5923.244 8771.629 0.68
0.500 -11320.36 23166.85 blah15 -8837.456
8436.582 -1.05 0.295 -25422.41
7747.503 blah16 1169.621 8719.185 0.13
0.893 -15970.89 18310.13 blah17 7458.636
8567.25 0.87 0.384 -9383.195
24300.47 blah18 22153.02 9047.698 2.45
0.015 4366.703 39939.33 blah19 1652.157
8698.105 0.19 0.849 -15446.91
18751.23 blah20 -3329.179 8815.673 -0.38
0.706 -20659.37 14001.01 _cons 193705.8
20601.31 9.40 0.000 153207
234204.7 -----------------------------------------
-----------------------------
60
Problem
  • Each has some probability (5/100 or 1/20) of
    being significant when the relationship exists
    because of sampling error.
  • We could choose the couple of variables that we
    know are unrelated (they are randomly generated)
    but are significant, and put them in
  • If we repeat with another 20 variables, we keep
    finding significant variables by sampling
    error.
  • If we drop the insignificant variable and keep
    the significant ones, we mess up the
    probabilities in the later analysis.

61
. regress turnout diplomau mdnincm blah9
blah18 Source SS df MS
Number of obs 426 ---------------------
------------------ F( 4, 421) 18.16
Model 1.6308e11 4 4.0770e10 Prob
gt F 0.0000 Residual 9.4516e11 421
2.2450e09 R-squared
0.1472 ---------------------------------------
Adj R-squared 0.1390 Total 1.1082e12
425 2.6076e09 Root MSE
47382 -------------------------------------------
-------------------------- turnout Coef.
Std. Err. t Pgtt 95 Conf.
Interval ---------------------------------------
----------------------------- diplomau
1143.341 500.6077 2.28 0.023 159.3394
2127.343 mdnincm 1.026752 .4316698 2.38
0.018 .1782557 1.875249 blah9 19881.04
7898.252 2.52 0.012 4356.114 35405.96
blah18 13559.12 8325.046 1.63 0.104
-2804.717 29922.95 _cons 139305.5
10783.39 12.92 0.000 118109.5
160501.4 -----------------------------------------
----------------------------
62
Moral of the Story
  • Be wary of specification searches
  • Specify your model before hand, then go to Stata
  • Dont drop out insignificant variables
  • If you try enough variables, eventually, you can
    get a regression full of variables that appear
    significant, but probably are not
  • Schemes to try different combinations of
    variables to maximize significance are generally
    called stepwise regression or regression using
    stepwise entry.
  • Such schemes have little use.

63
Testing multiple coefficients
  • We may be interested in knowing the probability
    that all of the coefficients are actually zero in
    the population.
  • How do we make this comparison?
  • Here is our model
  • Here is our model if all of the coefficients are
    zero

64
Joint Hypothesis Test
  • Why do this?
  • Gives us a sense if this combination of variables
    collectively does anything to explain y
  • If we have two independent variables correlated
    above r.9, we may find that one or both is
    insignificant due to collinearity. We can test
    to see if they jointly have an effect.
  • This is not the same as the hypothesis test for
    each variable (given by the t-test)
  • This doesnt say all of the variables are
    significant, but that jointly the regression
    predicts y better than using the mean alone

65
How does it work?
  • It sounds a lot like r2. It is
  • We compare how well the regression predicts y
    with how well the restricted model (the model
    where all of the coefficients are zero, resulting
    in a model based on the mean alone)
  • To avoid the pitfalls of r2, we want it to
    account for sample size and the number of
    independent variables.

66
Working Parts
  • The working parts for computing F are found in
    Stata output.

. regress hispthrm immithrm prej Source
SS df MS Number of obs
2059 ---------------------------------------
F( 2, 2056) 422.27 Model 217673.966
2 108836.983 Prob gt F
0.0000 Residual 529917.313 2056 257.741884
R-squared 0.2912 ----------------------
----------------- Adj R-squared 0.2905
Total 747591.279 2058 363.261069 Root
MSE 16.054 ------------------------------
--------------------------------------- hispthrm
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
----------------------------------- immithrm
.4837819 .0180266 26.84 0.000 .4484297
.5191341 prej .6161944 .0747776 8.24
0.000 .4695466 .7628421 _cons 19.85906
1.849749 10.74 0.000 16.23148
23.48664 -----------------------------------------
----------------------------
RSS / n-k-1 RMS
RegSS / k Reg MS
TSS / n-1 TMS
F 108,836.983 / 257.741884 422.27123
67
Working Parts
  • Of course, Stata also gives it directly

. regress hispthrm immithrm prej Source
SS df MS Number of obs
2059 ---------------------------------------
F( 2, 2056) 422.27 Model 217673.966
2 108836.983 Prob gt F
0.0000 Residual 529917.313 2056 257.741884
R-squared 0.2912 ----------------------
----------------- Adj R-squared 0.2905
Total 747591.279 2058 363.261069 Root
MSE 16.054 ------------------------------
--------------------------------------- hispthrm
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
----------------------------------- immithrm
.4837819 .0180266 26.84 0.000 .4484297
.5191341 prej .6161944 .0747776 8.24
0.000 .4695466 .7628421 _cons 19.85906
1.849749 10.74 0.000 16.23148
23.48664 -----------------------------------------
----------------------------
68
Samples and Populations
  • Of course, this F-statistic is based on our
    sample. It may be that in the population, the
    bs have no real effect
  • This test is not normally distributed. It has
    its own distribution, the F-distribution
  • The numerator has k degrees of freedom
  • The denominator has n-k-1 degrees of freedom
  • Using those degrees of freedom, you can take your
    value of F to an F table. If your F is greater
    than F from the table, the coefficients are
    jointly significant
  • If your F is smaller than F, none of your
    regression coefficients matter. This is bad
    news.
  • Rather than sending you to a table, Stata gives
    you a p-value

69
Working Parts
  • If this p-value is less than .05, your regression
    coefficients are not jointly 0 in the population
    (your regression has meaning)

. regress hispthrm immithrm prej Source
SS df MS Number of obs
2059 ---------------------------------------
F( 2, 2056) 422.27 Model 217673.966
2 108836.983 Prob gt F
0.0000 Residual 529917.313 2056 257.741884
R-squared 0.2912 ----------------------
----------------- Adj R-squared 0.2905
Total 747591.279 2058 363.261069 Root
MSE 16.054 ------------------------------
--------------------------------------- hispthrm
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
----------------------------------- immithrm
.4837819 .0180266 26.84 0.000 .4484297
.5191341 prej .6161944 .0747776 8.24
0.000 .4695466 .7628421 _cons 19.85906
1.849749 10.74 0.000 16.23148
23.48664 -----------------------------------------
----------------------------
Interpretation In the population, this set of
variables has a real effect on y.
70
What you should know and be able to do
  • Perform a 1-tailed hypothesis test given Stata
    output (or given b, std. err. of b, and a
    one-tailed t-table)
  • Perform Hypothesis tests against null hypotheses
    other than b 0.
  • Understand the consequences of adding or dropping
    variables (specification searches)
  • Conduct an F-test to see if the coefficients are
    jointly significant
Write a Comment
User Comments (0)
About PowerShow.com