Inferences Based on a Single Sample - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Inferences Based on a Single Sample

Description:

In the 2004 presidential election, Ralph Nader had about 0.34% of the vote. Suppose an exit poll was taken to estimate Nader's share of the vote, with a ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 39
Provided by: computi251
Category:

less

Transcript and Presenter's Notes

Title: Inferences Based on a Single Sample


1
Chapter 7
  • Inferences Based on a Single Sample

2
Parameters and Statistics
  • A parameter is a numeric characteristic of a
    population or distribution, usually symbolized by
    a Greek letter, such as µ, the population mean.
  • Inferential Statistics uses sample information to
    estimate parameters.
  • A Statistic is a number calculated from data.
  • There are usually statistics that do the same job
    for samples that the parameters do for
    populations, such as , the sample mean.

3
Using Samples for Estimation
µ
Sample (known statistic)
Population (unknown parameter)
estimate
4
The Idea of Estimation
  • We want to find a way to estimate the population
    parameters.
  • We only have information from a sample, available
    in the form of statistics.
  • The sample mean, , is an estimator of the
    population mean, µ.
  • This is called a point estimate because it is
    one point, or a single value.

5
Interval Estimation
  • There is variation in , since it is a random
    variable calculated from data.
  • A point estimate doesnt reveal anything about
    how much the estimate varies.
  • An interval estimate gives a range of values that
    is likely to contain the parameter.
  • Intervals are often reported in polls, such as
    56 4 favor candidate A. This suggests we
    are not sure it is exactly 56, but we are quite
    sure that it is between 52 and 60.
  • 56 is the point estimate, whereas (52, 60) is
    the interval estimate.

6
The Confidence Interval
  • A confidence interval is a special interval
    estimate involving a percent, called the
    confidence level.
  • The confidence level tells how often, if samples
    were repeatedly taken, the interval estimate
    would surround the true parameter.
  • We can use this notation (L,U) or (LCL,UCL).
  • L and U stand for Lower and Upper endpoints. The
    longer versions, LCL and UCL, stand for Lower
    Confidence Limit and Upper Confidence Limit.
  • This interval is built around the point estimate.

7
Theory of Confidence Intervals
  • Alpha (a) represents the probability that when
    the sample is taken, the calculated CI will miss
    the parameter.
  • The confidence level is given by (1-a)100, and
    used to name the interval, so for example, we may
    have a 90 CI for µ.
  • After sampling, we say that we are, for example,
    90 confident that we have captured the true
    parameter. (There is no probability at this
    point. Either we did or we didnt, but we dont
    know.)

8
How to Calculate CIs
  • Many CIs have the following basic structure
  • P TS
  • Where P is the parameter estimate,
  • T is a table value equal to the number of
    standard deviations needed for the confidence
    level,
  • and S is the standard deviation of the estimate.
  • The quantity TS is also called the Error Bound
    (B) or Margin of Error.
  • The CI should be written as (L,U) where
    L P-TS, and U PTS.
  • Dont forget to convert your P TS expression to
    confidence interval form, including parentheses!

9
A Confidence Interval for µ
  • If s is known, and
  • the population is normally distributed,or ngt30
    (so that we can say is approximately
    normally distiributed), gives the endpoints
    for a (1- a)100 CI for µ
  • Note how this corresponds to the P TS formula
    given earlier.

10
Distribution Details
  • What is ?
  • a is the significance level, P(CI will miss)
  • The subscript on z refers to the upper tail
    probability, that is, P(Zgtz).
  • To find this value in the table, look up
    thez-value for a probability of .5-a/2.
  • Examples

11
Example Estimation of µ (? Known)
  • A random sample of 25 items resulted in a sample
    mean of 50. Construct a 95 confidence interval
    estimate for ? if ? 10.

12
Confidence Interval Estimates
Confidence
Intervals
Proportion
Mean
Variance
?
Unknown
??
Known
13
Estimation of m (s unknown)
  • We now turn to the situation where s is unknown
    but the sample size is large or the sample
    population is normal.
  • Since s is unknown, we use s in its place.
  • However, without knowing s, we are not able to
    make use of the z table in building a confidence
    interval.
  • Instead, we will use a distribution called t
    (Students t).
  • The t distribution is symmetric and bell-shaped
    like the standard normal, and also has a m0, but
    sgt1, so the shape is flatter in the middle and
    thicker in the tails.

14
  • Students t-Distributions
  • Degrees of Freedom, df
  • A parameter that identifies each different
    distribution of Students t-distribution. For
    the methods presented in this chapter, the value
    of df will be the sample size minus 1, df n - 1.

Normal distribution
Students t, df 15
Students t, df 5
15
Using t
  • As the previous graph shows, the t distribution
    has another parameter, called degrees of freedom
    (df). So this is actually a family of
    distributions, with different df values.
  • The higher the df, the closer the t distribution
    comes to the standard normal.
  • For our purposes, dfn-1. It is actually related
    to the denominator in the formula for s2.
  • There is a t-table in the back of the book. It
    is different from the z-table, so we have to
    understand how it works.

16
The t table
  • Refer to the table. First you will notice the
    left-hand column is for df.
  • When df 100, the z-table can be used, because
    the values will be very close.
  • This table gives tail probabilities, similar to
    z(a). However, only a selection of probabilities
    is given, across the top of the table.
  • The interior of the table gives the t-values, so
    it is arranged almost opposite of the z-table.
  • The notation used for t-values is t(df,a).
  • Just like z(a), a refers to the upper tail
    probability.

17
  • t-Distribution Showing t(df, a)

18
  • Example Find the value of t(12, 0.025).

Portion of t-table
19
Confidence Intervals
  • When we build our confidence interval, a refers
    to the probability in both tails.
  • This is not the same a used in looking up the
    distribution! So what we have to look up is
    actually a/2, because thats the upper tail
    probability.
  • And so we come to the formula for a (1-a)100 CI
    for m when s is unknown

20
  • Example A study is conducted to learn how long
    it takes the typical tax payer to complete his or
    her federal income tax return. A random sample
    of 17 income tax filers showed a mean time (in
    hours) of 7.8 and a standard deviation of 2.3.
    Find a 95 confidence interval for the true mean
    time required to complete a federal income tax
    return. Assume the time to complete the return
    is normally distributed.
  • Solution
  • 1. Parameter of Interest the mean time required
    to complete a federal income tax return.
  • 2. Confidence Interval Criteria
  • a. Assumptions Sampled population assumed
    normal, s unknown.
  • b. Distribution table value t will be used.
  • c. Confidence level 1 - a 0.95

21
  • 3. The Sample Evidence
  • 4. Calculations
  • 5. (6.62, 8.98) is the 95 confidence interval
    for µ.

22
Confidence Interval for a Proportion
  • Assumptions
  • Population Follows Binomial Distribution
  • Normal Approximation Can Be Used if
  • does not Include 0
    or 1
  • Or (older guideline)
  • Confidence Interval Estimate

23
Example
  • A random sample of 400 graduates showed 32 went
    to grad school. Set up a 95 confidence interval
    estimate for p.

24
New Method
  • A new method (Agresti Coull, 1998) can be used
    to avoid the problems with extreme ps. There is
    no need to check the np or nq values with this
    method.
  • Define
  • Then a (1-a)100 CI for p is given by

25
Example
  • In the 2004 presidential election, Ralph Nader
    had about 0.34 of the vote. Suppose an exit
    poll was taken to estimate Naders share of the
    vote, with a sample size of 200, and 2 people
    indicated they voted for Nader.
  • Note that with the traditional method,
    so the formula is not valid.
  • Use the p method to construct a 95 CI for p.

26
Choosing CI Formulas
27
Sample Size Calculation
  • We may wish to decide upon a sample size so that
    we can get a confidence interval with a
    pre-determined width.
  • This is common in polls, where the margin of
    error is usually decided in advance.
  • All CIs we have seen so far have the form PB,
    where B is the margin of error.
  • We want to fix B in advance.

28
Sample Size for Estimating µ, s Known
  • Suppose X is a random variable with s10 and we
    want a 90 CI to have a Bound, or Margin of
    Error, of 3.
  • Use the formula .
  • Fill in the numbers
  • Solve
  • This is the minimum sample size, but we need a
    whole number, so round up to n31.

29
Sample Size for Estimating µ, s Unknown
  • If s is unknown, the confidence interval will be
    calculated using the t distribution, unless n is
    very large.
  • But the degrees of freedom depend on n, which we
    dont know.
  • The calculation also depends on s, which we dont
    know until after sampling.
  • We must have an initial guess for s, and then use
    the normal distribution to approximate the t
    distribution, since it does not require knowing n.

30
Example (s unknown)
  • A manufacturer needs to be able to estimate the
    width of a new part to within 2mm with 95
    confidence. There is not enough history to know
    what s would be, so a pilot study is run by
    measuring 6 parts, and finding s3.4mm.
  • Rounding up to the next whole number gives n12.

31
Sample Size for Estimating p, a Population
Proportion
  • With a population proportion, we also have a
    problem in getting the standard deviation part of
    the Margin of Error, since it depends on p, the
    thing we are trying to estimate.
  • There are two possibilities
  • 1) We may have a preliminary guess about p that
    we can use, or
  • 2) We can use p.5 because that maximizes the
    standard deviation.
  • The sample size will be calculated from the
    desired margin of error, or error bound.

32
Example (proportion)
  • A pollster wants to do a simple random sample to
    estimate the proportion of the population
    favoring an increase in property taxes for school
    funding. He wants a margin of error of 3, with
    90 confidence. The general belief is that it
    will be a close election, so an initial value of
    p.5 is reasonable.
  • Rounding up to the next whole number gives n752.

33
Misc. Notes
  • The CI for µ formula using z is also called the
    Large Sample CI. It is valid when s is known,
    for any sample size, but it also serves as an
    approximation of the t formula (using s) when n
    is large. How large? Many books say n30. I
    recommend making use of the t table up to n100
    since that is how far it goes. Statistical
    computer programs will always calculate t values,
    regardless of how large n is, for the s unknown
    case.

34
Misc. Notes
  • The CI for µ formula using t is also called the
    Small Sample CI, but only because the other one
    is called Large Sample. It is valid for any
    sample size when s is unknown and the population
    is normal.
  • We do not cover methods for small samples that do
    not come from a normal population in this course
    (non-parametric methods).

35
Misc. Notes
  • The t table is limited because it does not have a
    very good selection of probabilities. It also
    jumps in the df column. It is possible to use
    the closest value or interpolate when you cant
    find what you need, but a better option is to use
    the Excel functions, TDIST and TINV.
  • However, you have to be VERY careful about what
    Excel is giving you.

36
Excels TDIST function
  • TDIST takes a t value and returns the tail
    probability. You can choose one or two tails.

37
Excels TINV Function
  • The TINV Function takes a two-tailed probability
    and returns a t-value (just what we need now).

38
Excel Function Comparison
  • The NORMSINV Function, by contrast, takes a
    left-tailed probability and returns a z-value.
    This means you have to enter a/2 and take the
    negative, or else use 1- a/2 as the argument.
Write a Comment
User Comments (0)
About PowerShow.com