Title: Review
1Review 2
- Chapter 9
- Chapter 10
- Chapter 11 and 12
2Chapter 9Sampling Distributions
- A statistic is a random variable describing a
characteristic of a random samples. - Sample mean
- Sample variance
- We use statistic values in inferential statistics
(make inference about population characteristics
from sample characteristics). - Statistics have distributions of their own.
3Chapter 9 The Central Limit Theorem
- The distribution of the sample mean is normal if
the parent distribution is normal. - The distribution of the sample mean approaches
the normal distribution for sufficiently large
samples (n ³ 30), even if the parent
distribution is not normal. - The parameters of the sample distribution of the
mean are - Mean
- Standard deviation
- (Assumption The population is sufficiently
large. No correction is needed in the
calculation of the variance).
4Chapter 9 The Central Limit Theorem
- Problem 1 (Using Excel) Given a normal
population whose mean is 50 and whose standard
deviation is 5, - Question 1 Find the probability that a random
sample of 4 has a mean between 49 and 52 - Answer
-.4
.8
5Chapter 9The Central Limit Theorem
Normal table
- Problem 1 (Using the table) Given a normal
population whose mean is 50 and whose standard
deviation is 5, - Question 1 Find the probability that a random
sample of 4 has a mean between 49 and 52 - Answer
-.4
.8
6Chapter 9The Central Limit Theorem
Normal table
- Problem 1
- Question 2 Find the probability that a random
sample of 16 has a mean between 49 and 52. - Answer
7Chapter 9 The Central Limit Theorem
Normal table
- Problem 2 The amount of time per day spent by
adults watching TV is normally distributed with
m6 and s1.5 hours. - Question 1 What is the probability that a
randomly selected adult watches TV for more
than 7 hours a day? - Answer
- Question 2 What is the probability that 5 adults
watch TV on the average 7 or more hours?Answer
8Chapter 9 The Central Limit Theorem
Normal table
- Problem 2
- Question 3 What is the probability that the
total time of watching TV of the five adults will
not exceed 28 hours? - Answer
- Question 4 What total TV watching time is
exceeded by only 3 of the population for samples
of 5 adults?
Comments 1.Excel returns X for agiven left hand
tail probability 2. .670822 1.5/5.5
9Chapter 9 The Central Limit Theorem
Normal table
- Problem 3
- Assume that the monthly rents paid by students
in a particular town is 350 with a standard
deviation of 40. A random sample of 100 students
who rented apartments was taken. - Question1 What is the probability that the
sample mean of the monthly rent exceeds 355?
10Chapter 9 The Central Limit Theorem
Normal table
- Problem 3 - continued
- Question2 What is the probability that the
total revenue from renting 10 randomly selected
apartments falls between 3300 and 3700 dollars?
11Chapter 9 The Central Limit Theorem
Normal table
- Problem 3 - continued
- Question3 Lets assume the population mean was
unknown, but the standard deviation was known to
be 40. A sample of 100 rentals was selected in
order to estimate the mean monthly rent paid by
the whole student population. What is the
probability that the sample mean differ from the
actual mean by more than 5? How about more than
10? -
12Chapter 9 The Central Limit Theorem
13Chapter 9Sampling distribution of the sample
proportion
- In a sample of size n, if np gt 5 and n(1-p) gt 5,
then the sample proportion p x/n is
approximately normally distributed with the
following parameters
(Assumption The population is sufficiently
large. No correction is needed in the
calculation of the variance).
14Sampling distribution of the sample proportion
- Problem 4
- A commercial of a household appliances
manufacturer claims that less than 5 of all of
its products require a service call in the first
year. - A survey of 400 households that recently
purchased the manufacturer products was conducted
to check the claim.
15Sampling distribution of the sample proportion
Normal table
- Problem 4 - Continued Assuming the
manufacturer is right, what is the probability
that more than 10 of the surveyed households
require a service call within the first year?
If indeed 10 of the sampled households reported
a call for service within the first year, what
does ittell you about the the manufacturer
claim?
16Sampling Distribution of the Difference Between
two Means
- If two independent variables are normally
distributed with means and variances m1, s21,
and m2, s22 respectively, then x1 x2 is also
normally distributed with
17Sampling Distribution of the Difference Between
two Means
- When at least one of the populations is not
normally distributed but the samples sizes are
both at least 30, x1 x2 is approximately
normally distributed, with a mean and a variance
as indicated above.
18 Sampling Distribution of the Difference Between
two Means
- Example A national TV telethon committee is
interested in determining whether donations made
by males are on the average larger than those
made by females by 4. Two samples of 25 males
and 25 females were selected, and the donations
made recorded. If the standard deviations of the
male and female populations are 2.4 and 1.8
respectively, what is the probability that sample
mean of the male donations exceeds the sample
mean of the female donations by at least 5?
Assume donations for the two populations are
normally distributed.
19Sampling Distribution of the Difference Between
two Means
For males For females
20Chapter 10Introduction to Estimation
- A populations parameter can be estimated by a
point estimator and by an interval estimator. - A confidence interval with 1-a confidence level
is an interval estimator that covers the
estimated parameters (1-a) of the time. - Confidence intervals are constructed using
sampling distributions.
21Confidence interval of the mean Known Variance
- We use the central limit theorem to build the
following confidence interval
22Confidence interval of the mean Known Variance
- Problem 5 How many classes university students
miss each semester? A survey of 100 students was
conducted. (See Data next) - Assuming the standard deviation of the number of
classes missed is 2.2, estimate the mean number
of classes missed per student. Use 99 confidence
level.
23Confidence interval of the mean Known Variance
Data
- Solution 10.21 2.575
10.21 .57
1- a .99 a .01 a/2 .005 Za/2 Z.005 2.575
LCL 9.64, UCL 10.78
You can used Data Analysis Plus gt Z-Estimate Mean
24Confidence interval of the mean Known Variance
Data
- Solution (using Data Analysis Plus)
- Shade the data set (you may include the title
label) - Select Data Analysis Plus, then Z-Estimate
Mean - Type in the sigma (2.2), check Labels (if
appropriate), type in alpha (.01), click OK.
25Selecting the sample size
- The shorter the confidence interval, the more
accurate the estimate. - We can, therefore, limit the width of the
interval to 2W, and get - From here we have
W is called Margin of error, or Bound on the
error estimate
26Selecting the sample size
- Problem 6An operation manager wants to estimate
the average amount of time needed by a worker to
assemble a new electronic component. - Sigma is known to be 6 minutes.
- The required estimate accuracy is within 20
seconds. - The confidence level is 90 95.
- Find the sample size.
27Selecting the sample size
- Solution
- s 6 min W 20 sec 1/3 min
- 1 - a .90 Za/2 Z.05 1.645
- 1-a .95, Za/2 Z.025 1.96
28Chapter 11Hypotheses tests
- In hypothesis tests we hypothesize on a value of
a population parameter, and test to see if there
is sufficient evidence to support our belief. - The structure of hypotheses test
- Formulate two hypotheses.
- H0 The one we try to reject in favor of
- H1 The alternative hypothesis, the one we try to
prove. - Define a significance level a.
29Hypotheses tests
- The significance level is the probability of
erroneously reject the null hypothesis. - a P(reject H0 when H0 is true)
- Sample from the population and calculate a
statistic that provides an indication whether or
not the parameter value under H1 is more likely
to be true. - We shall test the population mean assuming the
standard deviation is known.
30Hypotheses tests of the Mean Known Variance
- Problem 7 A machine is set so that the average
diameter of ball bearings it produces is .50
inch. In a sample of 100 ball bearings the mean
diameter was .51 inch. Assuming the standard
deviation is .05 inch, can we conclude at 5
significance level that the mean diameter is not
.50 inch.
31Hypotheses tests of the Mean Known Variance
- SolutionThe population studied is the
ball-bearing diameters. - We hypothesize on the population mean.
- A good point estimator for the population mean is
the sample mean. - We use the distribution of the sample mean to
build a sample statistic to test whether m .50
inch.
32Hypotheses tests of the Mean Known Variance
- Solution (A Two Tail rejection region)
- Define the hypotheses
- H0 m .50
- H1 m .50
-
The probability of conducting atype one error
33Hypotheses tests of the Mean Known Variance
Solution - A Two Tail rejection region
Critical Z
Z.025 1.96 (obtained from the Z-table) Build a
rejection region Zsamplegt Za/2, or
Zsamplelt-Za/2
1.96
-1.96
Calculate the value of the sample Z statistic
and compare it to the critical value
Since 2 gt 1.96, there is sufficient evidence to
rejectH0 in favor of H1 at 5 significance
level.
34Hypotheses tests of the Mean Known Variance
Solution - A Two Tail rejection region
- We can perform the test in terms of the mean
value. - Let us find the critical mean values for
rejection - XL2m0 Z.025 .501.96(.05)/(100)1/2
.5098 - XL1m0 - Z.025 .50
-1.96(.05)/(100)1/2.402
Since.51 gt .5098, there is sufficient evidence to
reject the null hypothesis at 5 significance
level.
35Hypotheses tests of the Mean Known Variance
- Calculate the p value of this test
- Solutionp-value P(Z gt Zsample) P(Z lt
-Zsample) P(Z gt 2) P(Z lt -2) 2P(Z gt 2)
21 - .9772 .0456 - Since .0456 lt .05, H0 is rejected.
36Hypotheses tests of the Mean Known Variance
- Problem 8
- The average annual return on investment for
American banks was found to be 10.2 with
standard deviation of 0.8. - It is believed that banks that exercise
comprehensive planning do better. - A sample of 26 banks that exercise comprehensive
training provide the following result Mean
return 10.5 - Can we infer that the belief about bank
performance is supported at 10 significance
level by this sample result?
37Hypotheses tests of the Mean Known Variance
Data
- Solution (A right Hand Tail Rejection
region)The population tested is the annual rate
of return. - H0 m 10.2
- H1 m gt 10.2
- Let us perform the test with the standardized
rejection region approach Zsample gt Z.10
(Right hand tail rejection region) Z.10 1.28.
Reject H0 if Zsample gt 1.28
38Hypotheses tests of the Mean Known Variance
- Conclusion
- At 10 significance level there is sufficient
evidence in the data to reject H0 in favor of H1,
since the sample statistic falls inside the
rejection region. - Interpretation
- If we are willing to accept 10 chance of making
the wrong conclusion, we can conclude banks
conducting comprehensive training perform better
than banks who do not.
39Hypotheses tests of the Mean Known Variance
Data
- Let us perform the test with the p-value method
- P(X gt 10.5 given that m 10.2) P(Z gt (10.5
10.2)/.8/(26)1/2 P(Z gt 1.91) .5 - .4719
.0281 - Since .0281 lt .10 we reject the null hypothesis
at 10 significance level.
40Hypotheses tests of the Mean Known Variance
- Note the equivalence between the standardized
method or the rejection region method and the
p-value method. - P(ZgtZ.10) .10Z10 1.28
The statement p-value is smallerthan alpha, is
equivalent to the statement the test statistic
fallsin the rejection region
1.91
1.28
41Hypotheses tests of the Mean Known Variance
- Problem 9
- In the midst of labor-management negotiations,
the president of a company argues that the
companys blue collar workers, who are paid an
average of 30K a year, are well-paid because the
mean annual pay for blue-collar workers in the
country is less than 30K. - This figure is disputed by the union. To test the
presidents belief an arbitrator draws a random
sample of 350 blue-collar workers from across the
country and their income recorded (see file
Salaries). - If the arbitrator assumes that income is normally
distributed with a standard deviation of 8,000,
can it be inferred at 5 significance level that
the companys president is correct?
42Hypotheses tests of the Mean Known Variance
Data
- Solution (A left Hand Tail Rejection Region)The
population tested is the ann. Salary - H0 m 30KH1 m lt 30K
- Left hand Tail Rejection region Z lt -Z.05 or Z lt
-1.645ZSample (29,119.5-30,000)/(8,000/350.5)
-2.059Since 2.059 lt -1.645 there is sufficient
evidence to infer that on the average blue collar
workers income is lower than 30K at 5
significance level.
43Hypotheses tests of the Mean Known Variance
- Calculate the p-value of this test
- Solutionp-value P(Z lt Zsample) P(Z lt -2.059)
44Type II Error
- Problem 7a Calculate b for the two-tail
hypotheses test performed in problem 7, when the
actual mean diameter is .515 inch. - Solution
- The rejection region in terms of the critical
values of the sample mean was found before XL1
.402 XL2 .5098. - b P(Do not reject H0 when H1 is true)
P(.402 lt lt .5098 when m .515)
P(.402-.515)/.05/(100).5 lt Z lt
(.5098-.515)/.05/(100).5 P(-22.6 lt Z lt -1.04)
P(1.04 lt Z lt 22.6) - 1 - .8508 .1492
- This large probability may be reduced by taking
larger samples
H0 m .500H1 m .515
P(Zlt22.6) P(Zlt1.04) 1-P(Zlt1.04)
45Ch 12 Inference when the Variance is Unknown
- Generally, the variance may be unknown
- In this case we change the test statistic from
Z to t, when testing the population mean. - To test the population proportion well use the
normal distribution (under certain conditions).
46Testing the mean unknown variance
- Replace the statistic Z with t
- The original distribution must be normal (or at
least mound shaped).
47Testing the mean unknown variance
- Problem 10
- A federal agency inspects packages to determine
if the contents is at least as large as that
advertised. - A random sample of (i)5, (ii)50 containers whose
packaging states that the weight was 8.04 ounces
was drawn. (data is provided later) - From the sample results
- Can we conclude that the average weight does not
meet the weight stated? (use a .05). - Estimate the mean weight of all containers with
99 confidence - What assumption must be met?
48Testing the mean unknown variance
- Solution
- We hypothesize on the mean weight.
- H0 m 8.04
- H1 m lt 8.04
- (i) n5. For small samples let us solve
manuallyAssume the sample was 8.07, 8.03,
7.99, 7.95, 7.94 - The rejection region t lt -ta, n-1 -t.05,5-1
-2.132The tsample ? - Mean (8.077.94)/5 7.996Std.
Dev.(8.07-7.996)2(7.94- 7.996)2/41/2
0.054
-2.132
49Testing the mean unknown variance
- The tsample is calculated as follows
- Since -1.32 gt -2.132 the sample statistic does
not fall in the rejection region. There is
insufficient evidence to conclude that the mean
weight is smaller than 8, at 5 significance
level.
-.165
-2.132
50Testing the mean unknown variance
- (ii) n50. To calculate the sample statistics we
use Excel, Descriptive statistics from the
ToolsgtData analysis menu. From the sample we
obtainMean 8.02 Std. Dev. .04 - The confidence interval is calculated by
- 8.02 2.678
8.02 .015
LCL 8.005, UCL 8.35
51Testing the mean unknown variance
Data
- Comments
- Check whether it appears that the distribution is
normal
52Using Excel
Data
- To obtain an exact value for t use the TINV
function - The exact value
Degrees of freedom
TINV(0.01,49)
.01 is the two tail probability .0052
2.6799535
53Testing the mean unknown variance
- Problem 11
- Engineers in charge of the production of car
seats are concerned about the compliance of the
springs used with design specifications. - Springs are designed to be 500mm long.
- Springs too long or too short must be reworked.
- A standard deviation of 2mm in springs length
will result in an acceptable number of reworked
springs. - A sample of 100 springs was taken and measured.
54Testing the mean unknown variance
Data
- Problem continued
- Can we infer at 10 significance level that the
mean spring length is not 500mm?
SolutionH0 m 500 Since the standard
deviation is unknown H1 m ¹ 500 We need to
run a t-test, assuming the
spring length is normally distributed.
Rejection region t lt -ta/2 or t gt ta/2with d.f.
99
t lt -1.6604 ort gt 1.6604
55Inference about a population proportion
- The test and the confidence interval are based on
the approximated normal distribution of the
sample proportion, if npgt5 and n(1-p)gt5. - For the confidence interval of p we have
- where p x/n
- For the hypotheses test, we use a Z test.
56Inference about a population proportion
- Problem 12 (problem 11 continued). The engineers
were interested in the percentage of springs that
are the correct length. They marked each spring
in the sample as - Correct 1
- Too long 2
- Too short 3
Can we infer that less than 90 of the springs
are the correct length, at 10 sig.
level?
57Inference about a population proportion
Data
- Problem 12 - Solution
- H0 p .9H1 p lt .9
- Rejection regionZ lt -Za, or Z lt -1.28
ConclusionSince 1.33 lt -1.28 we can infer
that less than 90 of the springs do not need
reworking.
58Inference about a population proportion
Data
- Problem 12 solution continued
- Let us estimate the proportion of good springs at
99 confidence level.
59Inference about a population proportion
- Problem 12 solution continued
- Find the sample size if the proportion of good
springs is to be estimated to within .035.
Consider the given sample an initial sample.
60Inference about a population proportion
- Problem 13
- A consumer protection group runs a survey of 400
dentists to check a claim that more than 4 out of
5 dentists recommend ingredients included in a
certain toothpaste. - The survey results are as follows 71 No 329
Yes - At 5 significance level, can the consumer group
infer that the claim is true?
61Inference about a population proportion
- Problem 13 - Solution
- The two hypotheses are
- H0 p .8
- H1 p gt .8
- Z.05 1.645
- Conclusion Since 1.125 lt 1.645 the consumer
group cannot confirm the claim at 5 significance
level.
The rejection region Z gt Za
62Summary Example
- An automotive expert claims that the large number
of self-serve gas stations has resulted in poor
automobile maintenance, and that the average tire
pressure is more than 4.5 psi below its
manufacturer specifications. - A random sample of 50 tires revealed the results
stored in the file TirePressure. - Assume the tire pressure is normally distributed
with s 1.5 psi, and answer the following
questions
63Summary Example
Tire Pressure
- At 10 significance level can we infer that the
expert is correct? What is the p value?
- Solution
- The HypothesesH0 m 4.5H1 m gt 4.5 The
rejection region Z gt Z.10 or Z gt 1.28.From the
data we have mean 5.04, soZ(5.04
4.5)/(1.5/50.5) 2.545 - Since 2.545 gt 1.28, there is sufficient evidence
to infer that the expert is correct.
The p value P(Sample Mean gt 5.04 when m
4.5)P(Z gt 2.545) 1- .9945 .0055
64Summary Example
- Find the probability of making a type II error
when the actual tire under-inflation is 5 psi on
the average. - SolutionThe Rejection Region in terms of the
sample means is found firstZL 1.28 (XL
4.5)/(1.5/50.5). XL 4.5 1.28(1.5/50.5)
4.77. So, the Rejection Region is Sample mean
gt 4.77. b P(accept H0 when H1 is true)
P(sample mean does not fall in the RR, when m
5) P( lt 4.77 when m 5) P(Z lt
(4.77-5)/(1.5/50.5)) P(Z lt -1.08) - From Excel NORMSDIST(-1.077) .1407
65Inference about the population Variance
- The following statistic is c2 (Chi squared)
distributed with n-1 degrees of freedom - We use this relationship to test and estimate the
variance.
66Inference about the population Variance
- The Hypotheses tested are
- The rejection region is
67Testing the Variance
- Problem 15
- Engineers in charge of the production of car
seats are concerned about the compliance of the
springs used with design specifications. - Springs are designed to be 500mm long.
- Springs too long or too short must be reworked.
- A standard deviation of 2mm in springs length
will result in an acceptable number of reworked
springs. - A sample of 100 springs was taken and measured.
68Testing the Variance
Data
- Problem 15 - continued Can we infer at 10
significance level that the number of springs
requiring reworking is unacceptably large?
H0 s2 4 H1 s2 gt 4
The number of springs requiring reworkingdepends
on the standard deviation, or the variance.
Rejection regionc2Sample gt c2ad.f. 99
c2Sample gt 117.4069
69Testing the Variance
- Problem 15 - conclusion Since 161.25 gt 117.4069,
we can infer at 10 significance level that the
standard deviation is greater than 2, thus the
number of springs that require reworking is
unacceptably large.
70Testing the Variance
- Problem 16
- A random sample of 100 observations was taken
from a normal population. The sample variance
was 29.76. - Can we infer at 2.5 significance level that the
population variance DOES NOT exceeds 30? - Estimate the population variance with 90
confidence.
71Testing the Variance
- Problem 16 Solution
- H0s2 30
- H1s2 lt 30
- c2
98.21
Rejection region c2 lt c21-a, n-1 c2 lt 73.36
!
72Testing the Variance
- Problem 16 - conclusion Since 98.208 gt 73.36 we
conclude that there is insufficient evidence at
2.5 significance level to infer that the
variance is smaller than 30.
73Using Excel
- We can get an exact value of the probability
P(c2d.f.gt c2) ? for a given c2 and known d.f.,
and then determine the p-value. - Use the CHIDIST function For example
.50359 - That is P(c299gt 98.208) .50359
- In our example we had a left hand tail rejection
region, and therefore the p-value is P(c299 lt
98.208) 1 - .50359 .49641gt .025
CHIDIST(c2,d.f.)
CHIDIST(98.208,99)
74Using Excel
- We can get the exact c2 value for which
P(c2d.f.gt c2) a, for any given probability a
and known d.f., then define the rejection region - Use the CHIINV functionFor example
CHIINV(.975,99) 73.36 - That is P(c299 gt ?) .975. c2 73.36The
rejection region is c2 lt 73.36.
CHIINV(a,d.f.)