Title: Inference About a Population
1Inference About a Population
2Introduction
- In this chapter we utilize the approach developed
before to describe a population. - Identify the parameter to be estimated or tested.
- Specify the parameters estimator and its
sampling distribution. - Construct a confidence interval estimator or
perform a hypothesis test.
3Introduction
- We shall develop techniques to estimate and test
three population parameters. - The expected value m
- The variance s2
- The population proportion p (for qualitative data)
412.1 Inference About a Population Mean When the
Population Standard Deviation is Unknown
- Recall By the central limit theorem, when s2
is known is normally distributed if - the sample is drawn from a normal population, or
- the population is not normal but the sample is
sufficiently large. - When s2 is unknown, another random variable
describes the distribution of
5The t - Statistic
When s2 is unknown, we use s2 instead, and the Z
statistic is then replaced by the t-statistic
t
Z
s
When the sampled population is normally
distributed, the statistic t is Student t
distributed. See next.
6The t - Statistic
Using the t-table
t
s
The Student- t distribution is mound-shaped, and
symmetrical around zero.
The degrees of freedom determine the
distribution shape
n1 lt n2
0
7Testing m when s is unknown
- Example 12.1 - Productivity of newly hired
Trainees
8Testing m when s is unknown
- Example 12.1
- In order to determine the number of workers
required to meet demand, the productivity of
newly hired trainees is studied. - It is believed that trainees can process and
distribute more than 450 packages per hour within
one week of hiring. - Can we conclude that this belief is correct,
based on productivity observation of 50 trainees
(raw data is presented later in the file
Xm12-01).
9Testing m when s is unknown
- Example 12.1 Solution
- The problem objective is to describe the
population of the number of packages processed in
one hour. - The data is quantitative.
- H0m 450 H1m gt 450
- The t statistic d.f. n - 1 49
We want to prove that the trainees reach 90
productivity of experienced workers
10Testing m when s is unknown
- Solution - continued
- Observe H1 has the form of m gt m0, thus
- The rejection region is
After transforming into a t-statistic we
express the rejection region in terms of the
statistic t.
t ³ ta,n-1
11Testing m when s is unknown
- Solution continued (solving by hand)The
rejection region is t gt ta,n 1.ta,n - 1
t.05,49
12Testing m when s is unknown
Rejection region
1.676
1.89
- Since 1.89 gt 1.676 we reject the null hypothesis
in favor of the alternative. - Conclusion There is sufficient evidence to infer
that the mean productivity of trainees one week
after being hired is greater than 450 packages at
.05 significance level.
13Testing m when s is unknown
Xm12-01.xls
Using Data Analysis Plus and the p-value
approachto test the mean.
t-Test Mean t-Test Mean
Packages
Mean 460.38
Standard Deviation Standard Deviation 38.8271
Hypothesized Mean Hypothesized Mean 450
df 49
t Stat 1.8904
P(Tltt) one-tail P(Tltt) one-tail 0.0323
t Critical one-tail t Critical one-tail 1.6766
1.89
Since .02323 lt .05, we reject the null hypothesis
in favor of the alternative. There is sufficient
evidence to infer that the mean productivity of
trainees one week after being hired is greater
than 450 packages at .05 significance level.
14Estimating m when s is unknown
- Confidence interval estimator of m when s2 is
unknown
15Estimating m when s is unknown
- Example 12.2
- An investor is trying to estimate the return on
investment in companies that won quality awards
last year. - A random sample of 83 such companies is selected,
and the return on investment is calculated had he
invested in them. - Construct a 95 confidence interval for the mean
return.
16Estimating m when s is unknown
- Solution (solving by hand)
- The problem objective is to describe the
population of annual returns from buying shares
of quality award-winners. - The data is quantitative.
- Solving by hand
- From the data we determine
t.025,82_at_ t.025,80
17Estimating m when s is unknown
Using Data Analysis Plus
t - estimate Mean Returns Mean 15.0172 Stand
ard Deviation 8.3054 LCL 13.0237 UCL 16.830
7
18Checking the required conditions
- We need to check that the population is normally
distributed, or at least not extremely
non-normal. - There are statistical methods that can be used to
test for normality (to be introduced later in the
book, but not discussed here). - From the sample histograms we see
19A Histogram for XM-11- 01
Packages
A Histogram for XM-11- 02
Returns
2012.2 Inference About a Population Variance
- Some times we are interested in making inference
about the variability of processes. - Examples
- The consistency of a production process for
quality control purposes. - To evaluate the risk associated with different
investments. - To draw inference about variability, the
parameter of interest is s2.
21Inference About a Population Variance
- The population variance can be estimated or its
value tested using the sample variance s2. - The sample variance s2 is an unbiased, consistent
and efficient point estimator for s2. - The inference about s2 is made by using a sample
statistic that incorporates s2 and s2.
22Inference About a Population Variance
- This statistic is .
- It has a distribution called Chi-squared, if the
population is normally distributed.
23Inference About a Population Variance
The degfrees of freedom (df)determines the
distribution shape
24Testing the population variance Left hand tail
test
- Example 1 (operation management application)
- A container-filling machine is believed to fill 1
liter containers so consistently, that the
variance of the filling will be less than 1 cc
(.001 liter). - To test this belief a random sample of 25 1-liter
fills was taken, and the results recorded
(Xm12-03.xls) - Do these data support the belief that the
variance is less than 1cc at 5 significance
level?
25Testing the population variance
- Solution
- The problem objective is to describe the
population of 1-liter fills from a filling
machine. - The data are quantitative, and we are interested
in the variability of the fills. - The two hypotheses are H0 s2 1
- H1 s2 lt1
We want to prove that the process is consistent
26Testing the population variance
- Solution
- The two hypotheses are H0 s2 1
- H1 s2 lt1
27Testing the population variance
28Testing the population variance
2
-
Using the c2 table
78
.
20
s
)
1
n
(
2
c
,
78
.
20
2
2
s
1
c
c
2
2
.
8484
.
13
-
-
a
-
1
25
,
95
.
1
n
,
1
13.84
20.78
Since 20.78gt13.8484 do not rejectthe null
hypothesis
There is insufficient evidence to reject the
hypothesis that the variance is equal to 1cc.
29Testing the population variance Right hand
tail test Two tail test
- A right hand tail test
- H0 s2 valueH1 s2 gt value
- Rejection region
Click
30Testing the population variance Right hand
tail test Two tail test
- A right hand tail test
- H0 s2 valueH1 s2 gt value
- Rejection region
- A two tail test
- H0 s2 valueH1 s2 ¹ value
- Rejection region
31Estimating the population variance
- From the following probability
statement P(c21-a/2 lt c2 lt c2a/2) 1-awe
have (by substituting c2 (n - 1)s2/s2.)
This is the confidence interval for s2 with 1-a
confidence level.
32Estimating the population variance
- Example 2
- Estimate the variance of fills in example 12.3
with 99 confidence. - Solution
- We have (n-1)s2 20.78.From the Chi-squared
table we havec2a/2,n-1 c2.005, 24 45.5585
c21-a/2,n-1 c2.0995, 24 9.88623
33Estimating the population variance
- The confidence interval is
3412.4 Inference About a Population Proportion
- When the population consists of nominal or
categorical data, the only inference we can make
is about the proportion of occurrence of a
certain value. - The parameter p was used before to calculate
these proportion under the binomial distribution.
3512.4 Inference About a Population Proportion
- Statistic and sampling distribution
- the statistic used when making inference about
p is
36Testing and estimating the Proportion
- Interval estimator for p (1-a confidence level)
37Testing the Proportion
- Example 12.5 (Predicting the winner in election
day) - Voters are asked by a certain network to
participate in an exit poll in order to predict
the winner on election day. - Based on the data presented in Xm12.5.xls (where
1Democrat, and 2Republican), can the network
conclude that the republican candidate will win
the state college vote?
38Testing the Proportion
- Solution
- The problem objective is to describe the
population of votes in the state. - The parameter to be tested is p.
- Success is defined as Republican vote.
- The hypotheses are
- H0 p .5
- H1 p gt .5
More than 50 vote republican
39Testing the Proportion
- Solving by hand
- The rejection region is z gt za z.05 1.645.
- From file Xm12.5.xls we count 407 success. Number
of voters participating is 765. - The sample proportion is
- The value of the test statistic is
- The p-value is P(Zgt1.77) .0382
40Testing the Proportion
Using Data Analysis Plus we have
lt 0.05
There is sufficient evidence to reject the null
hypothesis in favor of the alternative
hypothesis. At 5 significance level we can
conclude that more than 50 voted Republican.
41Estimating the Proportion
- Example (marketing application)
- In a survey of 2000 TV viewers at 11.40 p.m. on a
certain night, 226 indicated they watched The
Tonight Show. - Estimate the number of TVs tuned to the Tonight
Show in a typical night, if there are 100 million
potential television sets. Use 95 confidence
level. - Solution
42Estimating the Proportion
43Selecting the Sample Size to Estimate the
Proportion
- Recall The confidence interval for the
proportion is - Thus, to estimate the proportion to within W, we
can write
44Selecting the Sample Size to Estimate the
Proportion
- The required sample size is
45Sample Size to Estimate the Proportion
- Example
- Suppose we want to estimate the proportion of
customers who prefer our companys brand to
within .03 with 95 confidence. - Find the sample size needed to guarantee that
this requirement is met. - Solution
- W .03 1 - a .95,
- therefore a/2 .025,
- so z.025 1.96
Since the sample has not yet been taken, the
sample proportion is still unknown.
We proceed using either one of the following two
methods
46Sample Size to Estimate the Proportion
- Method 1
- There is no knowledge about the value of
- Let . This results in the largest
possible n needed for a 1-a
confidence interval of the form . - If the sample proportion does not equal .5, the
actual W will be narrower than .03 with the n
obtained by the formula below.