Title: Properties of Estimators
1Properties of Estimators
2OLS review
Ordinary Least Squares minimizes the squared
errors from the slope. The standard error is the
average deviation from the slope
3Residuals
6
5
Slope
4
Political Tolerance
Mean
3
2
1
0
6
5
4
3
2
1
0
Education
4Residuals review
- Residuals of OLS analysis (errors of the slope)
have a mean of zero - This is true by definition they have been
computed by their minimization. - We also assume that they are distributed
normally.
5Regression results review
--------------------------------------------------
---------------------------- happy
Coef. Std. Err. t Pgtt
Beta ----------------------------------------
-------------------------------------
prestg80 -.0380391 .0209348 -1.82 0.103
-.518061 _cons
3.330371 .8050567 4.14 0.003
. --------------------------------------
----------------------------------------
6Residuals are variables
For each observation, they represent the squared
distance from the slope.
7Residuals and OLS
- Therefore they are distributed along a standard
normal distribution, mean of zero. - The standard deviation is not necessarily 1, but
it is assumed to be constant across all values of
x. - Foreshadowing if this assumption does not hold,
you are not advised to use OLS.
8What is the question that we ask in scientific
analysis?
- Are we wrong about our theory?
- Or how likely is it that we are wrong about our
theory? - Is there a non-zero relationship?
- How much better than the mean have we done in
predicting the dependent variable from the
independent variable?
9The Null Hypothesis
- The null hypothesis is that the relationship is
zero, that the slope is zero, that we are doing
no better than the mean. - We are trying to reject the null hypothesis.
10Confidence in point estimates
- We have a point estimate of y for each value of
x - The set of predicted values is a variable
- Predicted values comprise a slope, but the
values of the slope are only true for our sample - We do not know anything about the population.
11Error in estimation
- So, we know that there is error in our estimate.
We put bounds around that estimate. - So, to reject the null hypothesis, neither the
upper nor lower bound of our estimate is likely
to contain zero.
12Strange question to ask
- How likely is it that the true value from the
population is zero? (not different from the mean
of y) - How likely is it that the true value of the slope
is NOT zero?
13A Caveat
- Standardization review
- Z scores
- Normal distribution
- Standard normal distribution
14Standardized variable review
- Z scores are linear transformations of variables
- Z score (x) (x-mean of x) /standard deviation
of x
15Z scores
- Z scores always have
- a mean of zero
- a standard deviation of 1
-
16Normal distribution review
17- Approximately 68 percent of the area under a
standard normal curve lies between the values of
the mean and the standard deviation and the
mean.
18- Approximately 95 of the area lies between 2
standard deviations and the mean.
19- Approximately 99.7 lies between 3 standard
deviations and the mean.
20Standard normal distribution
21Attributes of standard normal
- Mean is zero
- Standard deviation is 1
- 67 of the area lies between -1 and 1
- 95 of the area lies between -2 and 2
- 99 of the area lies between -3 and 3
2295 confidence interval
- Generally, we want to be at least 95 confident
that our estimate does not include zero. - So, to be 95 confident, then the slope must be
two standard deviations from the mean of the
standard normal curve, which is zero.
23Review Central limit theorem
- The central limit theorem is based on a theory of
repeated samples - A 95 confidence interval means that if this
process of estimation occurred in 100 samples
from the same population, 5 times out of a
hundred, this estimate would be zero.
24We are trying to reject the hypothesis that the
relationship is zero
- So, we are more confident as we believe that the
slope is not zero. - We know that the area under the normal curve at 2
standard deviations away from zero (the mean) is
2.5 of the area of the curve (approximately). - We also know that 2 standard deviations away from
the mean in the other direction is 2.5 of the
area of the curve.
25T statistic
If the slope falls out of the range of 2 standard
deviations from 0 then we can say that we are 95
confident that the relationship is not zero.
26Formula for t
- T slope/standard error
- If the t is at least 2, then it is two standard
deviations from the mean of the curve which is
zero (why is it 0?), then we are 95 confident
that the relationship is NOT zero - Significance is a linear transformation of the t
statistic based on the theory of the normal
curve. - Also known as probability values (p).
27How confident are we?
- If the slope falls within two standard
deviations from zero, then we have a difficult
time saying that we are confident. - Since we can say with precision what the
probability is that the relationship from the
population would be zero if we repeated samples,
then we estimate how confident we are.
28T 1
- Approximately 68 percent of the area under a
normal curve lies between the values of the mean
and the standard deviation and the mean. - If t 1, then we are 68 confident.
- That is not very confident.
29T 3
Approximately 99.7 lies between 3 standard
deviations and the mean. If t 3, then the
theory (from which theorem?) is that if we
repeated samples, 99.7 of the time, the sample
slope would not be zero.
30One tailed versus two tailed test
95
2.5
2.5
You can use theory to rule out one of the areas
covering 2.5. If you know the slope should be
positive, then you can cross out the 2.5 on the
left. Then you are 97.5 confident that the
relationship is not zero.
31One tailed versus two tailed test
95
2.5
2.5
You can use theory to rule out one of the areas
covering 2.5. If you know the slope should be
negative, then you can cross out the 2.5 on the
left. Then you are 97.5 confident that the
relationship is not zero.
32Defining the meaning of 95 confidence
If a certain interval is a 95 confidence
interval, then we can say that if we repeated the
procedure of drawing random samples and computing
confidence intervals over and over again, 95 of
those confidence intervals include the true value
from the population. This is not to say that we
are 95 confident that the true value lies
between the upper and lower bound.
33Defining the meaning of 95 confidence
- Instead, I am 95 confident that a confidence
interval covers the true value from the
population, based not on this single confidence
interval from this single test, - but rather
- as a result of what would happen were I to repeat
the process of drawing samples and doing this
test over and over again.
34Happiness and occupational prestige
. regr happy prestg80 Source SS
df MS Number of obs
11 -------------------------------------------
F( 1, 9) 3.30 Model
1.31753739 1 1.31753739 Prob gt F
0.1026 Residual 3.59155351 9
.399061502 R-squared
0.2684 ------------------------------------------
- Adj R-squared 0.1871 Total
4.90909091 10 .490909091 Root
MSE .63171 ------------------------------
------------------------------------------------
happy Coef. Std. Err. t
Pgtt 95 Conf. Interval ------------------
--------------------------------------------------
--------- prestg80 -.0380391 .0209348
-1.82 0.103 -.085397 .0093187
_cons 3.330371 .8050567 4.14 0.003
1.509207 5.151536 ----------------------------
--------------------------------------------------
35Effect of Index of Signals on the Number of Cases
on the U.S Supreme Court Agenda, 1953-1995
8
7
6
4.62
5
3.85
Upper bound of the 95 confidence
interval Estimate Lower bound of the 95
confidence interval
4
3
2.11
2
1.27
1.19
1.34
1
0
-1
1
2
3
4
5
6
-2
Lag Year
36The Effect of Supreme Court Signals on Amicus
Briefs at Courts of Appeals
37Upper bound 95 Confidence Interval
Point Estimate - slope
Lower bound 95 Confidence Interval
38Upper bound 95 Confidence Interval
Point Estimate - slope
Lower bound 95 Confidence Interval