Confidence Interval Estimation - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Confidence Interval Estimation

Description:

A clothing company produces men's jeans. The jeans are made and sold with either a regular cut or a boot cut. ... Only 34 of the sales were for boot-cut jeans. ... – PowerPoint PPT presentation

Number of Views:633
Avg rating:3.0/5.0
Slides: 71
Provided by: joeycook
Category:

less

Transcript and Presenter's Notes

Title: Confidence Interval Estimation


1
Confidence Interval Estimation
  • For statistical inference in
  • decision making
  • Chapter 6

2
Objectives
  • Central Limit Theorem
  • Confidence Interval Estimation of the Mean (s
    known)
  • Interpretation of the Confidence Interval
  • Confidence Interval Estimation of the Mean (s
    unknown)
  • Confidence Interval Estimation for the Proportion
  • Determining Sample Size

3
Central Limit Theorem
  • Irrespective of the shape of the underlying
    distribution of the population, by increasing the
    sample size, sample means proportions will
    approximate normal distributions if the sample
    sizes are sufficiently large.

4
Central Limit Theorem in action
5
How large must a sample be for the Central Limit
theorem to apply?
  • The sample size varies according to the shape of
    the population.
  • However, for our use, a sample size of 30 or
    larger will suffice.

6
Must sample sizes be 30 or larger for populations
that are normally distributed?
  • No. If the population is normally distributed,
    the sample means are normally distributed for
    sample sizes as small as n1.

7
Why not just always pick a sample size of 30?
8
How can I tell the shape of the underlying
population?
  • CHECK FOR NORMALITY
  • Use descriptive statistics. Construct
    stem-and-leaf plots for small or moderate-sized
    data sets and frequency distributions and
    histograms for large data sets.
  • Compute measures of central tendency (mean and
    median) and compare with the theoretical and
    practical properties of the normal distribution.
    Compute the interquartile range. Does it
    approximate the 1.33 times the standard
    deviation?
  • How are the observations in the data set
    distributed? Do approximately two thirds of the
    observations lie between the mean and plus or
    minus 1 standard deviation? Do approximately
    four-fifths of the observations lie between the
    mean and plus or minus 1.28 standard deviations?
    Do approximately 19 out of every 20 observations
    lie between the mean and plus or minus 2 standard
    deviations?

9
Why do I care if X-bar, the sample mean, is
normally distributed?
10
Because I want to use Z scores to analyze sample
means.
  • But to use Z scores, the data must be normally
    distributed.
  • Thats where the Central Limit Theorem steps in.
  • Recall that the Central Limit Theorem states that
    sample means are normally distributed regardless
    of the shape of the underlying population if the
    sample size is sufficiently large.

11
Recall from Chapter 5
  • Z (X - µ) s
  • If sample means are normally distributed, the Z
    score formula applied to sample means would be
  • Z X-bar - µX-bar s X-bar

12
Background
  • To determine µX-bar, we would need to randomly
    draw out all possible samples of the given size
    from the population, compute the sample means,
    and average them. This task is unrealistic.
    Fortunately, µX-bar equals the population mean µ,
    which is easier to access.
  • Likewise, computing the value of sX-bar, we would
    have to take all possible samples of a given size
    from a population, compute the sample means, and
    determine the standard deviation of sample means.
    This task is also unrealistic. Fortunately,
    sX-bar can be computed by using the population
    standard deviation divided by the square root of
    the sample size.

13
Note
  • As the sample size increases,
  • the standard deviation of the sample means
    becomes smaller and smaller
  • because the population standard deviation is
    being divided by larger and larger values of the
    square root of n.

14
The ultimate benefit of the central limit theorem
is a useful version of the Z formula for sample
means.
15
Z Formula for Sample MeansZ X-bar - µ s
/ v n
16
Example
  • The mean expenditure per customer at a tire store
    is 85.00, with a standard deviation of 9.00.
  • If a random sample of 40 customers is taken,
    what is the probability that the sample average
    expenditure per customer for this sample will be
    87.00 or more?

17
Because the sample size is greater than 30, the
central limit theorem says the sample means are
normally distributed.
  • Z X-bar - µ s / v n
  • Z 87.00 - 85.00 9.00 / v 40
  • Z 2.00 / 1.42 1.41

18
  • For Z 1.41 in the Z distribution table, the
    probability is .4207.
  • This represents the probability of getting a
    mean between 87.00 and the population mean
    85.00.
  • Solving for the tail of the distribution
    yields
  • .5000 - .4207 .0793
  • This is the probability of X-bar 87.00.

19
Interpretations
  • Therefore, 7.93 of the time, a random sample of
    40 customers from this population will yield a
    mean expenditure of 87.00 or more.
  • OR
  • From any random sample of 40 customers, 7.93 of
    them will spend on average 87.00 or more.

20
Interpretations
  • Therefore, 7.93 of the time, a random sample
    of 40 customers from this population will yield a
    mean expenditure of 87.00 or more.
  • From any random sample of 40 customers, 7.93
    of them will spend on average 87.00 or more.

21
Solve
  • Suppose that during any hour in a large
    department store, the average number of shoppers
    is 448, with a standard deviation of 21 shoppers.
  • What is the probability that a random sample of
    49 different shopping hours will yield a sample
    mean between 441 and 446 shoppers?

22
Statistical Inference
23
Statistical Inference facilitates decision making.
24
Via sample data, we can estimate something
about our population, such as its average value
µ, by using the corresponding sample mean,
X-bar.
25
Recall that µ, the population mean to be
estimated, is a parameter, while X-bar, the
sample mean, is a statistic.
26
Point Estimate
  • A point estimate is a statistic taken from a
    sample and is used to estimate a population
    parameter.
  • However, a point estimate is only as good as the
    sample it represents. If other random samples
    are taken from the population, the point
    estimates derived from those samples are likely
    to vary.
  • Because of variation in sample statistics,
    estimating a population parameter with a
    confidence interval is often preferable to using
    a point estimate.

27
Confidence Interval
  • A confidence interval is a range of values within
    which it is estimated with some confidence the
    population parameter lies.
  • Confidence intervals can be one or two-tailed.

28
Confidence Interval to Estimate µ
  • By rearranging the Z formula for sample means, a
    confidence interval formula is constructed
  • X-bar /- Z a/2 s / v n
  • Where
  • a the area under the normal curve outside the
    confidence interval
  • a/2 the area in one-tail of the distribution
    outside the confidence interval

29
  • The confidence interval formula yields a range
    (interval) within which we feel with some
    confidence the population mean is located.
  • It is not certain that the population mean is in
    the interval unless we have a 100 confidence
    interval that is infinitely wide, so wide that it
    is meaningless.

30
Confidence interval estimates for five different
samples of n25, taken from a population where
µ368 and s15
31
Common levels of confidence intervals used by
analysts are 90, 95, 98, and 99.
32
95 Confidence Interval
  • For 95 confidence, a .05 and a / 2 .025. The
    value of Z.025 is found by looking in the
    standard normal table under .5000 - .025 .4750.
    This area in the table is associated with a Z
    value of 1.96.
  • An alternate method multiply the confidence
    interval, 95 by ½ (since the distribution is
    symmetric and the intervals are equal on each
    side of the population mean.
  • (½) (95) .4750 (the area on each side of the
    mean) has a corresponding Z value of 1.96.

33
In other words, of all the possible X-bar values
along the horizontal axis of the normal
distribution curve, 95 of them should be within
a Z score of 1.96 from the mean.
34
Margin of Error
  • Z s / v n

35
Example
  • A business analyst for cellular telephone company
    takes a random sample of 85 bills for a recent
    month and from these bills computes a sample mean
    of 153 minutes. If the company uses the sample
    mean of 153 minutes as an estimate for the
    population mean, then the sample mean is being
    used as a POINT ESTIMATE. Past history and
    similar studies indicate that the population
    standard deviation is 46 minutes.
  • The value of Z is decided by the level of
    confidence desired. A confidence level of 95 has
    been selected.

36
153 /- 1.96( 46/ v 85) 143.22 µ 162.78
  • The confidence interval is constructed from the
    point estimate, 153 minutes, and the margin of
    error of this estimate, / - 9.78 minutes.
  • The resulting confidence interval is 143.22 µ
    162.78.
  • The cellular telephone company business analyst
    is 95 confident that the average length of a
    call for the population is between 143.22 and
    162.78 minutes.

37
Interpreting a Confidence Interval
  • For the previous 95 confidence interval, the
    following conclusions are valid
  • I am 95 confident that the average length of a
    call for the population µ, lies between 143.22
    and 162.78 minutes.
  • If I repeatedly obtained samples of size 85, then
    95 of the resulting confidence intervals would
    contain µ and 5 would not. QUESTION Does this
    confidence interval 143.22 to 162.78 contain µ?
    ANSWER I dont know. All I can say is that this
    procedure leads to an interval containing µ 95
    of the time.
  • I am 95 confident that my estimate of µ namely
    153 minutes is within 9.78 minutes of the actual
    value of µ. RECALL 9.78 is the margin of error.

38
Be Careful! The following statement is NOT true
  • The probability that µ lies between 143.22 and
    162.78 is .95.
  • Once you have inserted your sample results into
    the confidence interval formula, the word
    PROBABILITY can no longer be used to describe the
    resulting confidence interval.

39
Confidence Interval Estimation of the Mean (s
Unknown)
  • In reality, the actual standard deviation of the
    population, s, is usually unknown.
  • Therefore, we use s (sample standard deviation)
    to compute the confidence interval for the
    population mean, µ.
  • However, by using s in place of s, the
    standard normal Z distribution no longer applies.
  • Fortunately, the t-distribution will work,
    provided the population we obtain the sample is
    normally distributed.

40
Assumptions necessary to use t-distribution
  • Assumes random variable x is normally distributed
  • However, if sample size is large enough ( gt 30),
    t-distribution can be used when s is unknown.
  • But if sample size is small, evaluate the shape
    of the sample data using a histogram or
    stem-and-leaf.
  • As the sample size increases, the t-distribution
    approaches the Z distribution.

41
Confidence Interval using a t-distribution
  • X-bar /- t a,n-1 s / v n
  • a confidence interval
  • n-1 degrees of freedom

42
Example
  • As a consultant I have been employed to estimate
    the average amount of comp time accumulated per
    week for managers in the aerospace industry.
  • I randomly sample 18 managers and measure the
    amount of extra time they work during a specific
    week and obtain the following results (in hours).
    Assume a 90 confidence interval.
  • AEROSPACE DATA
  • 6 21 17 20 7 0 8 16 29
  • 3 8 12 11 9 21 25 15 16

43
Solution
  • To construct a 90 confidence interval to
    estimate the average amount of extra time per
    week worked by a manager in the aerospace
    industry, I assume that comp time is normally
    distributed in the population.
  • The sample size is 18, so df 17.
  • A 90 level of confidence results in an a / 2
    .05 area in each tail.
  • The table t-value is t .05,17 1.740.

44
  • With a sample mean of 13.56 hours, and a
    sample standard deviation of 7.8 hours, the
    confidence interval is computed
  • X-bar /- t a/2, n-1 S / v n
  • 13.56 /- 1.740 ( 7.8 / v 18) 13.56 /- 3.20
  • 10.36 µ 16.76

45
Interpretation
  • The point estimate for this problem is 13.56
    hours, with an error of /- 3.20 hours.
  • I am 90 confident that the average amount of
    comp time accumulated by a manager per week in
    this industry is between 10.36 and 16.76 hours.

46
Recommendations
  • From these figures, the aerospace industry could
    attempt to build a reward system for such extra
    work or evaluate the regular 40-hour week to
    determine how to use the normal work hours more
    effectively and thus reduce comp time.

47
Solve
  • I own a large equipment rental company and I want
    to make a quick estimate of the average number of
    days a piece of ditch digging equipment is rented
    out per person per time. The company has records
    of all rentals, but the amount of time required
    to conduct an audit of all accounts would be
    prohibitive.
  • I decide to take a random sample of rental
    invoices.
  • Fourteen different rentals of ditch diggers are
    selected randomly from the files.
  • Use the following data to construct a 99
    confidence interval to estimate the average
    number of days that a ditch digger is rented and
    assume that the number of days per rental is
    normally distributed in the population.

48
Ditch Digger Data
  • 3 1 3 2 5 1 2 1 4 2 1 3 1 1

49
Stay-tuned
50
Estimating the Population Proportion
  • For most businesses, estimating market share
    (their proportion of the market) is important b/c
    many company decisions evolve from market share
    information
  • What proportion of my customers pay late?
  • What proportion dont pay at all?
  • What proportion of the produced goods are
    defective?
  • What proportion of the population has cats/ dogs/
    horses/ kids/ exercises/ reads?

51
Confidence Interval Estimate for the Proportion
  • ps /- Zv ps(1-ps) / n
  • ps - Zvps(1-ps) /n p ps Zvps(1-ps) /n
  • ps sample proportion X / n number of
    successes sample size. This is the POINT
    ESTIMATE.
  • p population proportion
  • Z critical value from the standardized normal
    distribution
  • n sample size

52
ps /- Zv ps(1-ps) / n
  • NOTE This formula can be applied only when np
    and n(1-p) are at least 5.

53
Example
  • A study of 87 randomly selected companies with a
    telemarketing operation revealed that 39 of the
    sampled companies had used telemarketing to
    assist them in order processing.
  • Using this information, how could a researcher
    estimate the population proportion of
    telemarketing companies that use their
    telemarketing operation to assist them in order
    processing?

54
Solution
  • The sample proportion .39.
  • This is the point estimate of the population
    proportion, p.
  • The Z value for 95 confidence is 1.96.
  • The value of (1-p) 1 - .39 .61.

55
ps /- Zv ps(1-ps) / n
  • ps - Zvps(1-ps) /n p ps Zvps(1-ps) /n
  • The confidence interval estimate is
  • .39 1.96v(.39) (.61) / 87 p .39
    1.96v(.39) (.61) / 87
  • .39 - .10 p .39 .10
  • .29 p .49

56
Interpretation
  • We are 95 confident that the population
    proportion of telemarketing firms that use their
    operation to assist order processing is somewhere
    between .29 and .49.
  • There is a point estimate of .39 with a margin of
    error of /- .10.

57
Solve
  • A clothing company produces mens jeans. The
    jeans are made and sold with either a regular cut
    or a boot cut.
  • In an effort to estimate the proportion of their
    mens jeans market in Oklahoma City that is for
    boot-cut jeans, the analyst takes a random sample
    of 212 jeans sales from the companys two
    Oklahoma City retail outlets.
  • Only 34 of the sales were for boot-cut jeans.
  • Construct a 90 confidence interval to estimate
    the proportion of the population in Oklahoma City
    who prefer boot-cut jeans.

58
Solution
  • ps 34/212 .16
  • A point estimate for boot-cut jeans is .16 or
    16.
  • The Z value for 90 level of confidence is 1.645.
  • The confidence interval estimate is
  • ps - Zvps(1-ps) /n p ps Zvps(1-ps) /n
  • .16 1.645v(.16) (.84) / 212 p .16
    1.645v(.16) (.84) / 212
  • .16 - .04 P .16 .04
  • .12 P .20
  • We are 90 confident that the proportion of
    boot-cut jeans is between 12 and 20 .

59
Estimating Sample Size
  • The amount of sampling error you are willing to
    accept and the level of confidence desired,
    determines the size of your sample.

60
Sample size when Estimating µ
  • n Z2s2 / e2
  • e Z (s / v n

61
To determine sample size
  • Know the desired confidence level, which
    determines the value of Z (the critical value
    from the standardized normal distribution.
    Determining the confidence level is subjective.
  • Know the acceptable sampling error, e. The amount
    of error that can be tolerated.
  • Know the standard deviation, s. If unknown,
    estimate by
  • past data
  • educated guess
  • estimate s s range/4 This estimate is
    derived from the empirical rule stating that
    approximately 95 of the values in a normal
    distribution are within /- 2s of the mean,
    giving a range within which most of the values
    are located.

62
Example
  • Suppose the marketing manager wishes to estimate
    the population mean annual usage of home heating
    oil to within /- 50 gallons of the true value,
    and he wants to be 95 confident of correctly
    estimating the true mean.
  • On the basis of a study taken the previous year,
    he believes that the standard deviation can be
    estimated as 325 gallons.
  • Find the sample size needed.

63
Solution
  • With e 50, s 325, and 95 confidence (Z
    1.96)
  • n Z2s2 /e2 (1.96)2 (325)2 / (50)2
  • n 162.31
  • Therefore, n 163. As a general rule for
    determining sample size, always round up to the
    next integer value in order to slightly over
    satisfy the criteria desired.

64
Solve
  • Suppose you want to estimate the average age of
    all Boeing 727 airplanes now in active domestic
    U.S. service.
  • You want to be 95 confident, and you want your
    estimate to be within 2 years of the actual
    figure.
  • The 727 was first placed in service about 30
    years ago, but you believe that no active 727s in
    the U.S. domestic fleet are more than 25 years
    old.
  • How large a sample should you take?

65
Solution
  • With E 2 years,
  • Z value for 95 1.96,
  • and s unknown,
  • it must be estimated by using s range 4. As
    the range of ages is 0 to 25 years, s 25 4
    6.25.

66
n Z2s2 /e2
  • n Z2s2 /e2 (1.96)2 (6.25)2 / (2)2
  • 37.52 airplanes.
  • Because you cannot sample 37.52 units, the
    required sample size is 38.
  • If you randomly sample 38 planes, you can
    estimate the average age of active 727s within 2
    years and be 95 confident of the results.

67
Solve
  • Determine the sample size necessary to estimate µ
    when values range from 80 to 500, error is to be
    within 10, and the confidence level is 90 .
  • n Z2s2 /e2
  • Answer 200

68
Determining sample size for proportion
  • n Z2p(1-p) /e2
  • p population proportion (if unknown, analysts
    use .5 as an estimate of p in the formula)
  • e error of estimation equal to (ps p) the
    difference between the sample proportion and the
    parameter to be estimated, p. Represents amount
    of error willing to tolerate.

69
Solve
  • The Packer, a produce industry trade publication,
    wants to survey Americans and ask whether they
    are eating more fresh fruits and vegetables than
    they did 1 year ago.
  • The organization wants to be 90 confident in
    its results and maintain an error within .05. How
    large a sample should it take?

70
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com