45733: lecture 9 chapter 8 - PowerPoint PPT Presentation

About This Presentation
Title:

45733: lecture 9 chapter 8

Description:

Parameter may be calculated from census. Parameter may be estimated from sample ... Recall the basis for our calculation was that (if we knew the variance), we ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 70
Provided by: andre4
Category:

less

Transcript and Presenter's Notes

Title: 45733: lecture 9 chapter 8


1
45-733 lecture 9 (chapter 8)
  • Interval Estimation

2
Interval estimation, intro
  • There is a population we are interested in
  • We are interested in variables in this pop
  • Variables fully described by distribution
  • Distribution summarized by a parameter
  • Parameter may be calculated from census
  • Parameter may be estimated from sample
  • Most of statistics is about figuring out
    parameter from information in sample

3
Interval estimation, intro
  • Parameter may be estimated from sample
  • Estimate vs. estimator
  • Some estimators better than others
  • Biasedness
  • Efficiency
  • Mean squared error

4
Interval estimation, intro
  • Now, suppose we have settled on an estimator
    which we think is good
  • Rule turning sample to estimate
  • Need a sample in order to calculate estimate
  • So, now we go out and collect a sample
  • Using the sample, we calculate an estimate

5
Interval estimation, intro
  • The topic of interval estimation
  • Given a sample, estimator and therefore and
    estimate
  • How much does our estimate tell us about the
    parameter we are trying to figure out?
  • An interval estimate is a range of values between
    which a parameter is likely to lie

6
Interval estimation, intro
  • Example
  • How do our salespeoples salaries compare to the
    industrys?
  • Population salespeople in our industry
  • Variable S, annual salary
  • Parameter E(S)
  • To estimate, we take a sample of n salespeople
    and find out the value of S for each

7
Interval estimation, intro
  • Example
  • Suppose we take a sample of 9 people.
  • A good estimator (unbiased, anyway) of E(S) is
    the sample mean.
  • Suppose in this sample, the sample mean is 64.5
    thousand dollars

8
Interval estimation, intro
  • Example
  • Is E(S), the true parameter, likely to be exactly
    64.5?
  • Based on our sample and estimator, could E(S) be
  • 60?
  • 70?
  • 35?

9
Interval estimation, intro
  • Example
  • 64.5 is our best guess at the value of E(S)
  • What we would like is a range of values within
    which we are pretty sure p falls
  • For example I am 95 sure that E(S) falls
    between 27.6 and 101.4
  • For example I am 80 sure that E(S) falls
    between 42.0 and 87.0

10
What is a confidence interval?
  • A confidence interval is both
  • A range of values into which a parameter likely
    falls
  • A likelihood that the parameter falls into that
    range
  • For example
  • For example I am 95 sure that E(S) falls
    between 27.6 and 101.4
  • For example I am 80 sure that E(S) falls
    between 42.0 and 87.0

11
What is a confidence interval?
  • A confidence interval
  • Depends on
  • The sample used to calculate it
  • The distribution of the underlying random
    variables
  • The estimator it is based on
  • The width of the interval depends on
  • All of the above
  • How certain we wish to be

12
What is a confidence interval?
  • Width of a confidence interval
  • Consider our example again
  • For example I am 95 sure that E(S) falls
    between 27.6 and 101.4
  • For example I am 80 sure that E(S) falls
    between 42.0 and 87.0
  • To be more sure, I must widen the confidence
    interval
  • Many-handed economists

13
Common types of CI
  • CI for the population mean using the sample mean
  • CI for the population variance using the sample
    variance
  • CI for the population proportion using the sample
    proportion
  • CI for the population median (other percentiles)
    using the sample median (other percentiles)

14
Common types of CI
  • CI for the difference in population means using
    the difference in sample means
  • CI for the difference in population variances
    using the sample variances
  • CI for the difference in population proportions
    using the difference in sample proportions
  • CI for difference in population medians (other
    percentiles) using the difference in sample
    medians (other percentiles)

15
CI for the mean of a population
  • This is by far the most common type of CI to
    calculate
  • We want to know E(X)?x, the population mean of a
    random variable X
  • We have a random sample X1, X2,, Xn
  • We have calculated
  • Sample mean
  • Sample standard deviation

16
CI for the mean of a population
  • Our best guess at the population mean is X-bar,
    the sample mean
  • It is unbiased
  • It can be shown (though we dont do it) that the
    sample mean is the best estimator of the
    population mean under certain circumstances
  • But what is a range of reasonable values that
    E(X) could be?

17
CI for the mean, N() known ?
  • This is a hard problem, lets start by making
    some assumptions
  • Lets assume that X is distributed Normal with
    mean ?x and variance
  • Lets further assume that we know

18
CI for the mean, N() known ?
  • Now, from our prior discussions, we know that

19
CI for the mean, N() known ?
  • Now, one thing we know how to do is calculate
    probabilities about the normal

20
CI for the mean, N() known ?
  • What we would really like to do is calculate a
    probability like
  • But this is stupid
  • There are no random variables in there!
  • The probability is either one or zero, and we
    dont know which!

21
CI for the mean, N() known ?
  • Our strategy will be to start with a probability
    we can calculate
  • And try to turn it into something like what we
    want

22
CI for the mean, N() known ?

23
CI for the mean, N() known ?

24
CI for the mean, N() known ?
  • Almost everything in that formula can now be
    calculated
  • We know X-bar from our estimation
  • We are assuming we know the variance
  • We can use the normal table to look up the Phis
  • But, what about a? What is it?

25
CI for the mean, N() known ?
  • Now, we need to choose a width or size or for
    the confidence interval
  • The width of our confidence interval represents
    how certain we want to be about our result (what
    of the time we want to be right)
  • Typical choices are
  • 99, 95, 90, 80
  • The width of the confidence interval determines a

26
CI for the mean, N() known ?
  • The width of the confidence interval determines a

27
CI for the mean, N() known ?
  • The width of the confidence interval determines a

-a
a
0
28
CI for the mean, N() known ?
  • The width of the confidence interval determines a
  • For example, suppose we want a 90 CI
  • We want to find a so that P-altZlta0.90
  • This is a so that PZlta0.95
  • From the table, this is 1.96

29
CI for the mean, N() known ?
  • The width of the confidence interval determines a

30
CI for the mean, N() known ?
  • An example
  • Recall our salary example
  • Suppose we know that salaries are distributed
    normally with mean unknown but standard deviation
    equal to 15
  • Lets calculate a 95 CI for E(S)
  • P-altZlta0.95

31
CI for the mean, N() known ?
  • An example
  • Recall our salary example
  • Suppose we know that salaries are distributed
    normally with mean unknown but standard deviation
    equal to 15
  • Lets calculate a 90 CI for E(S)
  • P-altZlta0.90

32
Interpreting a CI
  • What does a CI mean?
  • Is the CI a probability statement about the
    population mean?
  • Suppose we say that our estimate of mean family
    income in the US is 57,000
  • Suppose we calculate a 95 CI for our estimate to
    be 56,000 to 58,000
  • Does this mean that there is a 95 probability
    that the true population mean is between 56K and
    58K
  • NO!

33
Interpreting a CI
  • What does a CI mean?
  • P56ltE(income)lt58 is either
  • 1 if E(income) is between 56 and 58
  • 0 if E(income) is not between 56 and 58
  • There are NO RANDOM VARIABLES in this probability
    statement

34
Interpreting a CI
  • What does a CI mean?
  • But our calculation of a CI looks is a
    probability statement about something

35
Interpreting a CI
  • What does a CI mean?
  • It is a probability statement about
  • X-bar
  • And about the CI itself!
  • The endpoints of a CI are random variables

36
Interpreting a CI
  • What does a CI mean?
  • So, a CI is a random interval if you like
  • Sometimes this random interval will contain the
    true value of the parameter
  • Sometimes this random interval will not contain
    the true value of the parameter

37
Interpreting a CI
  • What does a CI mean?
  • If you construct it properly, the (random) 95 CI
    will contain the true parameter 95 of the time
  • Imagine taking 1000 separate samples from the
    same population
  • Construct a 95 CI for each of the 1000 samples
  • About 950 of those 1000 CIs will contain E(X)
  • (picture)

38
CI for the mean, N() unknown ?
  • Assuming that we know the variance is a bit odd
  • Lets try to drop that assumption, now

39
CI for the mean, N() unknown ?
  • Recall the basis for our calculation was that (if
    we knew the variance), we could get from the
    normal table

40
CI for the mean, N() unknown ?
  • Now, we dont know the standard error, so we lack
    one piece of info.
  • But, we know how to make a good estimate of the
    standard error

41
CI for the mean, N() unknown ?
  • Can we calculate some probability like this

42
CI for the mean, N() unknown ?
  • Recall the basis for the earlier calculation
  • There is a similar fact

43
CI for the mean, N() unknown ?
  • The t-distribution
  • Also called Students t distribution
  • Looks very similar to the standard normal
  • Slightly higher variance than standard normal
  • Has one parameter called degrees of freedom
  • As the degrees of freedom rise, the variance of
    the t-distribution goes down
  • As the degrees of freedom approach infinity, the
    t-distribution becomes identical to the normal

44
CI for the mean, N() unknown ?
  • Can we calculate some probability like this
  • Yes, we just need to have a t-table like the
    normal table

45
CI for the mean, N() unknown ?
  • Example
  • Recall our salary example
  • Suppose our sample is (in thousands)55,62,43,77
    ,89,61
  • We know the mean is 64.5

46
CI for the mean, N() unknown ?
  • Example
  • Lets calculate the sample variance

47
CI for the mean, N() unknown ?
  • Example
  • Now, lets calculate a 90 CI
  • Sample mean is 64.5
  • Sample standard error is 6.65
  • We want a 90 CI, so we start with

48
CI for the mean, unknown ?
  • Often, data really do come from a normal or
    near-normal distribution
  • More often, perhaps, they do not
  • So, if we have data (a variable X) which is not
    normally distributed, what should we do?

49
CI for the mean, unknown ?
  • Recall, that the important probability
    calculation we need to be able to do is

50
CI for the mean, unknown ?
  • The key fact we needed to know in order to do
    this calculation is NOT the normality of X, but

51
CI for the mean, unknown ?
  • If, somehow, we could know that
  • Even when X is not normal, then we could again do
    the calculation of a confidence interval

52
CI for the mean, unknown ?
  • The central limit theorem comes to the rescue.
  • Recall that the CLT says

53
CI for the mean, unknown ?
  • A modification of the CLT also says
  • So that, as long as the sample size is large, we
    can proceed as if X is distributed normally, and
    only a small error results

54
CI for the mean, unknown ?
  • So, we can use

55
CI for the mean, unknown ?
  • Example
  • Problem 6 on page 290

56
CI for the proportion, large n
  • Another parameter we are often interested in is
    the p from a Bernoulli random variable
  • An unbiased (and in some contexts, best)
    estimator for this is the sample proportion

57
CI for the proportion, large n
  • Recall that for large n, the sample proportion is
    distributed approximately normal
  • With mean p
  • With variance p(1-p)/n

58
CI for the proportion, large n
  • So we could base a CI on
  • Oops, that requires that we already know p!

59
CI for the proportion, large n
  • But, we have a good estimator of p in p-hat
  • We know that p-hat is equal to p in expectation
  • We know that p-hats variance goes to zero as n
    goes to infinity
  • So, for large n, p-hat is a very good estimator
    for p

60
CI for the proportion, large n
  • So, we have a right to hope that
  • This turns out to be true, also

61
CI for the proportion, large n
  • Example pg 299, problem 22

62
CI for the variance, normal X
  • It is often of interest to calculate a confidence
    interval for the population variance of a
    variable
  • Recall, we based confidence intervals for the
    mean on

63
CI for the variance, normal X
  • Well, we know a similar fact about variances from
    a normal population

64
CI for the variance, normal X
  • Using the chi-squared table, it is not too hard
    to calculate things like

65
CI for the variance, normal X
  • Because the chi-squared distribution is always
    positive, it makes no sense to pick a and a.
  • But how should we choose a and b?
  • Many combinations of a and b would give the
    probability equal to say 95

66
CI for the variance, normal X
  • The usual way of choosing a and b
  • Choose a and b so that
  • First, the confidence interval has the chosen
    width
  • Second, so that
  • (draw picture)

67
CI for the variance, normal X
  • Once a and b are chosen

68
CI for the variance, normal X
  • Example problem 32, page 300

69
CI for the variance, normal X
  • Some review problems
  • Page 319 58
  • Page 291 12 (add the variance)
Write a Comment
User Comments (0)
About PowerShow.com