Title: 45733: lecture 9 chapter 8
145-733 lecture 9 (chapter 8)
2Interval estimation, intro
- There is a population we are interested in
- We are interested in variables in this pop
- Variables fully described by distribution
- Distribution summarized by a parameter
- Parameter may be calculated from census
- Parameter may be estimated from sample
- Most of statistics is about figuring out
parameter from information in sample
3Interval estimation, intro
- Parameter may be estimated from sample
- Estimate vs. estimator
- Some estimators better than others
- Biasedness
- Efficiency
- Mean squared error
4Interval estimation, intro
- Now, suppose we have settled on an estimator
which we think is good - Rule turning sample to estimate
- Need a sample in order to calculate estimate
- So, now we go out and collect a sample
- Using the sample, we calculate an estimate
5Interval estimation, intro
- The topic of interval estimation
- Given a sample, estimator and therefore and
estimate - How much does our estimate tell us about the
parameter we are trying to figure out? - An interval estimate is a range of values between
which a parameter is likely to lie
6Interval estimation, intro
- Example
- How do our salespeoples salaries compare to the
industrys? - Population salespeople in our industry
- Variable S, annual salary
- Parameter E(S)
- To estimate, we take a sample of n salespeople
and find out the value of S for each
7Interval estimation, intro
- Example
- Suppose we take a sample of 9 people.
- A good estimator (unbiased, anyway) of E(S) is
the sample mean. - Suppose in this sample, the sample mean is 64.5
thousand dollars
8Interval estimation, intro
- Example
- Is E(S), the true parameter, likely to be exactly
64.5? - Based on our sample and estimator, could E(S) be
- 60?
- 70?
- 35?
9Interval estimation, intro
- Example
- 64.5 is our best guess at the value of E(S)
- What we would like is a range of values within
which we are pretty sure p falls - For example I am 95 sure that E(S) falls
between 27.6 and 101.4 - For example I am 80 sure that E(S) falls
between 42.0 and 87.0
10What is a confidence interval?
- A confidence interval is both
- A range of values into which a parameter likely
falls - A likelihood that the parameter falls into that
range - For example
- For example I am 95 sure that E(S) falls
between 27.6 and 101.4 - For example I am 80 sure that E(S) falls
between 42.0 and 87.0
11What is a confidence interval?
- A confidence interval
- Depends on
- The sample used to calculate it
- The distribution of the underlying random
variables - The estimator it is based on
- The width of the interval depends on
- All of the above
- How certain we wish to be
12What is a confidence interval?
- Width of a confidence interval
- Consider our example again
- For example I am 95 sure that E(S) falls
between 27.6 and 101.4 - For example I am 80 sure that E(S) falls
between 42.0 and 87.0 - To be more sure, I must widen the confidence
interval - Many-handed economists
13Common types of CI
- CI for the population mean using the sample mean
- CI for the population variance using the sample
variance - CI for the population proportion using the sample
proportion - CI for the population median (other percentiles)
using the sample median (other percentiles)
14Common types of CI
- CI for the difference in population means using
the difference in sample means - CI for the difference in population variances
using the sample variances - CI for the difference in population proportions
using the difference in sample proportions - CI for difference in population medians (other
percentiles) using the difference in sample
medians (other percentiles)
15CI for the mean of a population
- This is by far the most common type of CI to
calculate - We want to know E(X)?x, the population mean of a
random variable X - We have a random sample X1, X2,, Xn
- We have calculated
- Sample mean
- Sample standard deviation
16CI for the mean of a population
- Our best guess at the population mean is X-bar,
the sample mean - It is unbiased
- It can be shown (though we dont do it) that the
sample mean is the best estimator of the
population mean under certain circumstances - But what is a range of reasonable values that
E(X) could be?
17CI for the mean, N() known ?
- This is a hard problem, lets start by making
some assumptions - Lets assume that X is distributed Normal with
mean ?x and variance - Lets further assume that we know
18CI for the mean, N() known ?
- Now, from our prior discussions, we know that
19CI for the mean, N() known ?
- Now, one thing we know how to do is calculate
probabilities about the normal
20CI for the mean, N() known ?
- What we would really like to do is calculate a
probability like - But this is stupid
- There are no random variables in there!
- The probability is either one or zero, and we
dont know which!
21CI for the mean, N() known ?
- Our strategy will be to start with a probability
we can calculate - And try to turn it into something like what we
want
22CI for the mean, N() known ?
23CI for the mean, N() known ?
24CI for the mean, N() known ?
- Almost everything in that formula can now be
calculated - We know X-bar from our estimation
- We are assuming we know the variance
- We can use the normal table to look up the Phis
- But, what about a? What is it?
25CI for the mean, N() known ?
- Now, we need to choose a width or size or for
the confidence interval - The width of our confidence interval represents
how certain we want to be about our result (what
of the time we want to be right) - Typical choices are
- 99, 95, 90, 80
- The width of the confidence interval determines a
26CI for the mean, N() known ?
- The width of the confidence interval determines a
27CI for the mean, N() known ?
- The width of the confidence interval determines a
-a
a
0
28CI for the mean, N() known ?
- The width of the confidence interval determines a
- For example, suppose we want a 90 CI
- We want to find a so that P-altZlta0.90
- This is a so that PZlta0.95
- From the table, this is 1.96
29CI for the mean, N() known ?
- The width of the confidence interval determines a
30CI for the mean, N() known ?
- An example
- Recall our salary example
- Suppose we know that salaries are distributed
normally with mean unknown but standard deviation
equal to 15 - Lets calculate a 95 CI for E(S)
- P-altZlta0.95
31CI for the mean, N() known ?
- An example
- Recall our salary example
- Suppose we know that salaries are distributed
normally with mean unknown but standard deviation
equal to 15 - Lets calculate a 90 CI for E(S)
- P-altZlta0.90
32Interpreting a CI
- What does a CI mean?
- Is the CI a probability statement about the
population mean? - Suppose we say that our estimate of mean family
income in the US is 57,000 - Suppose we calculate a 95 CI for our estimate to
be 56,000 to 58,000 - Does this mean that there is a 95 probability
that the true population mean is between 56K and
58K - NO!
33Interpreting a CI
- What does a CI mean?
- P56ltE(income)lt58 is either
- 1 if E(income) is between 56 and 58
- 0 if E(income) is not between 56 and 58
- There are NO RANDOM VARIABLES in this probability
statement
34Interpreting a CI
- What does a CI mean?
- But our calculation of a CI looks is a
probability statement about something
35Interpreting a CI
- What does a CI mean?
- It is a probability statement about
- X-bar
- And about the CI itself!
- The endpoints of a CI are random variables
36Interpreting a CI
- What does a CI mean?
- So, a CI is a random interval if you like
- Sometimes this random interval will contain the
true value of the parameter - Sometimes this random interval will not contain
the true value of the parameter
37Interpreting a CI
- What does a CI mean?
- If you construct it properly, the (random) 95 CI
will contain the true parameter 95 of the time - Imagine taking 1000 separate samples from the
same population - Construct a 95 CI for each of the 1000 samples
- About 950 of those 1000 CIs will contain E(X)
- (picture)
38CI for the mean, N() unknown ?
- Assuming that we know the variance is a bit odd
- Lets try to drop that assumption, now
39CI for the mean, N() unknown ?
- Recall the basis for our calculation was that (if
we knew the variance), we could get from the
normal table
40CI for the mean, N() unknown ?
- Now, we dont know the standard error, so we lack
one piece of info. - But, we know how to make a good estimate of the
standard error
41CI for the mean, N() unknown ?
- Can we calculate some probability like this
42CI for the mean, N() unknown ?
- Recall the basis for the earlier calculation
- There is a similar fact
43CI for the mean, N() unknown ?
- The t-distribution
- Also called Students t distribution
- Looks very similar to the standard normal
- Slightly higher variance than standard normal
- Has one parameter called degrees of freedom
- As the degrees of freedom rise, the variance of
the t-distribution goes down - As the degrees of freedom approach infinity, the
t-distribution becomes identical to the normal
44CI for the mean, N() unknown ?
- Can we calculate some probability like this
- Yes, we just need to have a t-table like the
normal table
45CI for the mean, N() unknown ?
- Example
- Recall our salary example
- Suppose our sample is (in thousands)55,62,43,77
,89,61 - We know the mean is 64.5
46CI for the mean, N() unknown ?
- Example
- Lets calculate the sample variance
47CI for the mean, N() unknown ?
- Example
- Now, lets calculate a 90 CI
- Sample mean is 64.5
- Sample standard error is 6.65
- We want a 90 CI, so we start with
48CI for the mean, unknown ?
- Often, data really do come from a normal or
near-normal distribution - More often, perhaps, they do not
- So, if we have data (a variable X) which is not
normally distributed, what should we do?
49CI for the mean, unknown ?
- Recall, that the important probability
calculation we need to be able to do is
50CI for the mean, unknown ?
- The key fact we needed to know in order to do
this calculation is NOT the normality of X, but
51CI for the mean, unknown ?
- If, somehow, we could know that
- Even when X is not normal, then we could again do
the calculation of a confidence interval
52CI for the mean, unknown ?
- The central limit theorem comes to the rescue.
- Recall that the CLT says
53CI for the mean, unknown ?
- A modification of the CLT also says
- So that, as long as the sample size is large, we
can proceed as if X is distributed normally, and
only a small error results
54CI for the mean, unknown ?
55CI for the mean, unknown ?
- Example
- Problem 6 on page 290
56CI for the proportion, large n
- Another parameter we are often interested in is
the p from a Bernoulli random variable - An unbiased (and in some contexts, best)
estimator for this is the sample proportion
57CI for the proportion, large n
- Recall that for large n, the sample proportion is
distributed approximately normal - With mean p
- With variance p(1-p)/n
58CI for the proportion, large n
- So we could base a CI on
- Oops, that requires that we already know p!
59CI for the proportion, large n
- But, we have a good estimator of p in p-hat
- We know that p-hat is equal to p in expectation
- We know that p-hats variance goes to zero as n
goes to infinity - So, for large n, p-hat is a very good estimator
for p
60CI for the proportion, large n
- So, we have a right to hope that
- This turns out to be true, also
61CI for the proportion, large n
- Example pg 299, problem 22
62CI for the variance, normal X
- It is often of interest to calculate a confidence
interval for the population variance of a
variable - Recall, we based confidence intervals for the
mean on
63CI for the variance, normal X
- Well, we know a similar fact about variances from
a normal population
64CI for the variance, normal X
- Using the chi-squared table, it is not too hard
to calculate things like
65CI for the variance, normal X
- Because the chi-squared distribution is always
positive, it makes no sense to pick a and a. - But how should we choose a and b?
- Many combinations of a and b would give the
probability equal to say 95
66CI for the variance, normal X
- The usual way of choosing a and b
- Choose a and b so that
- First, the confidence interval has the chosen
width - Second, so that
- (draw picture)
67CI for the variance, normal X
68CI for the variance, normal X
- Example problem 32, page 300
69CI for the variance, normal X
- Some review problems
- Page 319 58
- Page 291 12 (add the variance)