Title: Estimation in Sampling
1Estimation in Sampling
2Conceptual Setting
- How do we come to conclusions from empirical
evidence? - Isnt common sense enough?
- Why?
- Systematic methods for drawing conclusions from
data - Statistical inference
- Inductive versus Deductive Reasoning
3Drawing Conclusions
- Statistical inference
- Based on the laws of probability
- What would happen if?
- You ran your experiment hundreds of times
- You repeated your survey over and over again
- Statistic and Parameter
- The proportion of the population who are
ltdisabledgt usually denoted by p - In a SRS of 1000 people, the proportion of the
people who are ltdisabledgt usually denoted by
(p -hat)
4Estimating with Confidence
- Say you are conducting an opinion poll
- SRS of 1000 adult television viewers
- You ask these folks if they trust Walter Cronkite
when he delivers the nightly news - Out of 1000, 570 say, they trust him
- 57 of the people trust Walter
- is 0.57
- If you collect another set of 1000 television
viewers, what will the rating be?
5Confidence Statement
- We need to add a confidence statement
- We need to say something about the margin of
error - Confidence statements are based on the
distribution of the values of the sample
proportion that would occur if many
independent SRS were taken from the same
population - The sampling distribution of the statistic
6Terminology Review
- Sample
- Population
- Statistic
- a numerical characteristic associated with a
sample - Parameter
- A numerical characteristic associated with the
population - Sampling error
- The need for interval estimation
7Point Estimation
- Point estimation of a parameter is the value of a
statistic that is used to estimate the parameter - Compute statistic (e.g., mean)
- Use it to estimate corresponding population
parameter - Point Estimators of Population Parameters(see
next slide)
8Point Estimators for Population Parameters
Population Sample Calculating Paramete
r statistic formula
9Interval Estimation
- Sample point estimators are usually not
absolutely precise - How close or how distant is the calculated sample
statistic from the population parameter - We can say that the sample statistic is within a
certain range or interval of the population
parameter. - The determination of this range is the basis for
interval estimation
10Interval Estimation (2)
- A confidence interval (CI) represents the level
of precision associated with a population
estimate - Width of the interval is determined by
- Sample size,
- variability of the population, and
- the probability level or the level of confidence
selected
11Sampling Distributionof the Mean
- The distribution of all possible sample means for
a sample of a given size - Use the mean of a sample to estimate and draw
conclusions about the mean of that entire
population - So we have samples of a particular size
- We need formulas to determine the mean and the
standard deviation of all possible sample means
for samples of a given size from a population
12Sample and Population Mean
- For samples of size n, mean of the variable
- Is equal to the mean of the variable under
consideration - Mean of all possible sample means is equal to the
population mean
13Sample Standard Deviation
- For samples of size n, the standard deviation of
the variable - Is equal to the standard deviation of the
variable under consideration, divided by the
square root of the sample size - For each sample size, the standard deviation of
all possible sample means equals the population
standard deviation divided by the square root of
the sample size
14Central Limit Theorem
- Suppose all possible random samples of size n are
drawn from an infinitely large, normally
distributed population having a mean and a
standard deviation - The frequency distribution of these sample means
will have - A mean of (the population mean)
- A normal distribution around this population mean
- A standard deviation of
15Sampling Error
- Standard Error of the mean (SEM) is a basic
measure for the amount of sampling error -
- SEM indicates how much a typical sample mean is
likely to differ from a true population mean - Sample size, and population standard deviation
affect the sampling error
16Sampling Error (2)
- The larger the sample size, the smaller the
amount of sampling error - The larger the standard deviation, the greater
the amount of sampling error
17Finite Population Correction Factor
- The frequency distribution of the sample means is
approximately normal if the sample size is large - N lt 30 (small sample) N gt 30 (large sample)
- If you have a finite population, then you need to
introduce a correction, i.e., the fpc rule/factor
in the estimation process - where fpc finite population correction
- n sample size
- N population size
18Standard Error of the Mean for Finite Populations
- When including the fpc should be
- In general, you include the fpc in the
population estimates only when the ratio of
sample size to population size exceeds 5 or - when n / N gt 0.05
19Constructing Confidence Intervals
- A random sample of 50 commuters reveals that
their average journey-to-work distance was 9.6
miles - A recent study has determined that the std.
deviation of journey-to-work distance is
approximately 3 miles - What is the CI around this sample mean of 9.6
that guarantees with 90 certainty that the true
population mean is enclosed within that interval?
20Confidence Intervalfor the Mean
-
- Z value associated with a 90 confidence level
(Z 1.65) - The sample mean is the best estimate of the true
population mean - CI
- 9.6 1.65 (3/ ) 10.30 miles
- 9.6 - 1.65 (3/ ) 8.90 miles
21Confidence Interval
- We say that the sample statistic is within a
certain range or interval of the population
parameter - e.g., in our sample, 57 of the viewers thought
Walter Cronkite is trustworthy - In the general population, between 54 and 60 of
viewers think that Walter Cronkite is trustworthy - Or, in our sample, the average commuting distance
was 9.6 miles - In the population, we calculated that the average
commute is likely to be somewhere between 8.9
miles and 10.3 miles
22Confidence Level
- Gives you an understanding of how reliable your
previous statement regarding the confidence
interval is - The probability that the interval actually
includes the population parameter - For example, the confidence level refers to the
probability that the interval (8.9 miles to 10.3
miles) actually encompasses the TRUE population
mean (90, 95, 99.7) - Confidence Level probability is 1 - ?
23Significance Level
- ? (alpha)
- The probability that the interval that surrounds
the sample statistic DOES NOT include the
population parameter - E.g., the probability that the average commuting
distance does not fall between 8.9 miles and 10.3
miles - ? 0.10 (90) 0.05 (95) 0.01 (99.7)
- Confidence Interval width -- increases
24Sampling Error
- Total sampling error ?
- Probability that the sample statistic will fall
into either tail of the distribution is - ?/2
- If you want 99.7 confidence (i.e., low error),
then you have to settle for giving a less precise
estimate (the CI is wider)
25If the Standard Deviationis Unknown
- If we dont know the population mean, its likely
we dont know the standard deviation - What you are likely to have is the variance and
standard deviation of your sample - Also, you have a small population, so you have to
use the finite population correction factor that
was discussed earlier - Once you have the formula for standard error,
then you can proceed as before to determine the
confidence interval
26Standard Error
27Students T Distribution
- William Gosset (1876-1937)
- Published his contributions to statistical theory
under a pseudonym - Students t distribution is used in performing
inferences for a population mean, when, - The population being sampled is approximately
normally distributed - The population standard deviation is unknown
- And the sample size is small (n lt 30)
28Characteristics of the t - Distribution
- A t curve is symmetric, bell shaped
- Exact shape of distribution varies with sample
size - When n nears 30, the value of t approaches the
standard normal Z value - A particular distribution is identified by
defining its degrees of freedom (df) - For a t distribution, df (n -1)
29Properties of t Curves
- The total area under a t curve 1
- A t curve extends indefinitely in both
directions, approaching, but never touching the
horizontal axis - A t-curve is symmetrical about 0
- As the degrees of freedom become larger, t
curves look increasingly like the standard normal
curve - We need to use a t-table and look for values of
t, instead of Z to determine the confidence
interval
30Calculating various CIs
- Sampling
- SRS, systematic, or stratified
- Parameters
- Mean, total, or proportion
- Six situations
- Consider whether to use fpc
- when n/N gt 0.05
- Consider whether to use Z or t
- when n lt 30
31If Random or Systematic Sample
- Estimate of Population Mean
- Best estimate is ?
- Estimate of sampling error
- Standard error of the mean (inc. fpc)
32If Stratified Sample
- Estimate of population mean
- Still equal to sample mean but
- Std. Error of the mean (inc. fpc)
Where mnumber of strata i refers to a
particular stratum
33Minimum Sample Size
- Before going out to the field, you want to know
how big the sample ought to be for your research
problem - Sample must be large enough to achieve precision
and CI width that you desire - Formulas to determine the three basic population
parameters with random sampling
34Sample Size Selection - Mean
- Your goal is to determine the minimum sample size
- You want to situate the estimated population
mean, in a specified CI
E amount of error you are willing to tolerate
35(No Transcript)
36(No Transcript)
37Example 1
- We are looking at Neighborhood X
- 3,500 households
- Sample size 25 households
- Sample mean 2.73
- Sample variance 2.6
- CI 90
- Find the mean number of people per household
38Example 2
- Sample of 30 households
- Sample standard deviation is 1.25
- What sample size is needed to estimate the mean
number of persons per household in neighborhood X
- and be 90 confident that your estimate will be
within 0.3 persons of the true population mean?