Title: Inferences About Process Quality
1Chapter 3
- Inferences About Process Quality
23-1. Statistics and Sampling Distributions
- Statistical methods are used to make decisions
about a process - Is the process out of control?
- Is the process average you were given the true
value? - What is the true process variability?
33-1. Statistics and Sampling Distributions
- Statistics are quantities calculated from a
random sample taken from a population of
interest. - The probability distribution of a statistic is
called a sampling distribution.
43-1.1 Sampling from a Normal Distribution
- Let X represent measurements taken from a normal
distribution. X - Select a sample of size n, at random, and
calculate the sample mean, - Then
53-1.1 Sampling from a Normal Distribution
- Probability example
- The life of an automotive battery is normally
distributed with mean 900 days and standard
deviation 35 days. What is the probability that
a random sample of 25 batteries will have an
average life of more than 910 days?
6Example
- Z (910-900)/(35/SQRT(25) 1.429
- P(Xbar gt 910) 1 - .9235 .0765
73-1.1 Sampling from a Normal Distribution
- Chi-square (?2) Distribution
- If x1, x2, , xn are normally and independently
distributed random variables with mean zero and
variance one, then the random variable -
- is distributed as chi-square with n degrees of
freedom
83-1.1 Sampling from a Normal Distribution
- Chi-square (?2) Distribution
- Furthermore, the sampling distribution of
- is chi-square with n 1 degrees of freedom
when sampling from a normal population. -
93-1.1 Sampling from a Normal Distribution
- Chi-square (?2) Distribution for various degrees
of freedom.
5
10
20
103-1.1 Sampling from a Normal Distribution
- t-distribution
- If x is a standard normal random variable and if
y is a chi-square random variable with k degrees
of freedom, then - is distributed as t with k degrees of freedom.
113-1.1 Sampling from a Normal Distribution
- F-distribution
- If w and y are two independent chi-square random
variables with u and v degrees of freedom,
respectively, then - is distributed as F with u numerator degrees
of freedom and v denominator degrees of freedom.
123-1.2 Sampling from a Bernoulli
Distribution
- A random variable, x, with probability function
- is called a Bernoulli random variable.
- The sum of a sample from a Bernoulli process has
a binomial distribution with parameters n and p.
133-1.2 Sampling from a Bernoulli
Distribution
- x1, x2, , xn taken from a Bernoulli process
- The sample mean is a discrete random variable
given by - The mean and variance of are
143-1.3 Sampling from a Poisson
Distribution
- Consider a random sample of size n x1, x2, ,
xn, taken from a Poisson process with parameter ?
- The sum, x x1 x2 xn is also Poisson
with parameter n?. - The sample mean is a discrete random variable
given by - The mean and variance of are
153-2. Point Estimation of Process Parameters
- Parameters are values representing the
population. (Ex.) The population mean
and variance, respectively. - Parameters in reality are often unknown and must
be estimated. - Statistics are estimates of parameters.
- (Ex.) are the sample mean and
sample variance, respectively.
163-2. Point Estimation of Process Parameters
- Two properties of good point estimators
- The point estimator should be unbiased.
- E(Q) Q
- The point estimator should have minimum variance.
17Unbiased estimators
- The sample mean (Xbar) and variance (S2) are
unbiased estimators of the population mean (m)
and variance (s2) - The sample standard deviation is not an unbiased
estimator (S) of the standard deviation (s) - E(S) c4s
- So, sest S/c4 (sest is s with a hat on it)
- App. Table VI gives values of c4 for 2 lt n lt 25
18Range Method
- The range of a sample is often used in QC
- R xmax xmin
- The relative range is given by
- W R/s
- W has been well studied
- E(W) d2
19Range method
- Since W R/s
- Then, sW R
- And s R/W
- And sest R/d2
- Values of d2 for 2 lt n lt 25 are in App. Table VI
20Range method
- Using S is better than using the range method
- But, for small sample sizes (n lt 10) the range
method is acceptable - Often times, sample sizes are n 5 or n 6
- The relative efficiency of the range method is
shown on the next frame
21Range method
223-3. Statistical Inference for a Single
Sample
- Two categories of statistical inference
- Parameter Estimation
- Hypothesis Testing
233-3. Statistical Inference for a Single
Sample
- A statistical hypothesis is a statement about the
values of the parameters of a probability
distribution.
243-3. Statistical Inference for a Single
Sample
- Steps in Hypothesis Testing
- Identify the parameter of interest
- State the null hypothesis, H0 and alternative
hypotheses, H1 - Choose a significance level
- State the appropriate test statistic
- State the rejection region
- Compare the value of the test statistic to the
rejection region. Can the null hypothesis be
rejected?
253-3. Statistical Inference for a Single
Sample
- Example An automobile manufacturer claims a
particular automobile averages 35 mpg (highway). - Suppose we are interested in testing this claim.
We will sample 25 of these particular autos and
under identical conditions calculate the average
mpg for this sample. - Before actually collecting the data, we decide
that if we get a sample average less than 33 mpg
or more than 37 mpg, we will reject the makers
claim. (Critical Values)
263-3. Statistical Inference for a Single
Sample
- Example (continued)
- H0
- H1
- From the sample of 25 cars, the average mpg was
found to be 31.5. What is your conclusion? -
273-3. Statistical Inference for a Single
Sample
- Choice of Critical Values
- How are the critical values chosen?
- Wouldnt it be easier to decide how much room
for error you will allow instead of finding the
exact critical values for every problem you
encounter? - OR
- Wouldnt it be easier to set the size of the
rejection region, rather than setting the
critical values for every problem?
283-3. Statistical Inference for a Single
Sample
- Significance Level
- The level of significance, ? determines the size
of the rejection region. - The level of significance is a probability. It is
also known as the probability of a Type I error
(want this to be small) - Type I error - rejecting the null hypothesis when
it is true. - How small? Usually want
293-3. Statistical Inference for a Single
Sample
- Types of Error
- Type I error - rejecting the null hypothesis when
it is true. - Pr(Type I error) ?. Sometimes called the
producers risk. - Type II error - not rejecting the null hypothesis
when it is false. - Pr(Type II error) ?. Sometimes called the
consumers risk.
303-3. Statistical Inference for a Single
Sample
- Power of a Test
- The Power of a test of hypothesis is given by 1 -
? - That is, 1 - ? is the probability of correctly
rejecting the null hypothesis, or the probability
of rejecting the null hypothesis when the
alternative is true
313-3.1 Inference on the Mean of a
Population, Variance Known
- Hypothesis Testing
- Hypotheses H0 H1
- Test Statistic
- Significance Level, ?
- Rejection Region
- If Z0 falls into either of the two regions above,
reject H0
323-3.1 Inference on the Mean of a
Population, Variance Known
- Example 3-1
- Hypotheses H0 H1
- Test Statistic
- Significance Level, ? 0.05
- Rejection Region
- Since 3.50 gt 1.645, reject H0 and conclude that
the lot mean pressure strength exceeds 175 psi
333-3.1 Inference on the Mean of a
Population, Variance Known
- Confidence Intervals
- A general 100(1- ?) two-sided confidence
interval on the true population mean, ? is - 100(1- ?) One-sided confidence intervals are
- Upper
Lower
343-3.1 Inference on the Mean of a
Population, Variance Known
- Confidence Interval on the Mean with Variance
Known - Two-Sided
- See the text for one-sided confidence intervals
353-3.1 Inference on the Mean of a
Population, Variance Known
- Example 3-2
- Reconsider Example 3-1. Suppose a 95 two-sided
confidence interval is specified. Using Equation
(3-28) we compute - Our estimate of the mean bursting strength is 182
psi ? 3.92 psi with 95 confidence
363-3.2 The Use of P-Values in
Hypothesis Testing
- If it is not enough to know if your test
statistic, Z0 falls into a rejection region, then
a measure of just how significant your test
statistic is can be computed - P-value. - P-values are probabilities associated with the
test statistic, Z0.
373-3.2 The Use of P-Values in Hypothesis
Testing
- Definition
- The P-value is the smallest level of significance
that would lead to rejection of the null
hypothesis H0.
383-3.2 The Use of P-Values in Hypothesis
Testing
- Example
- Reconsider Example 3-1. The test statistic was
calculated to be Z0 3.50 for a right-tailed
hypothesis test. The P-value for this problem is
then - P 1 - ?(3.50) 0.00023
- Thus, H0 ? 175 would be rejected at any level
of significance ? ? P 0.00023
393-3.3 Inference on the Mean of a
Population, Variance Unknown
- Hypothesis Testing
- Hypotheses H0 H1
- Test Statistic
- Significance Level, ?
- Rejection Region
- Reject H0 if
403-3.3 Inference on the Mean of a
Population, Variance Unknown
- Confidence Interval on the Mean with Variance
Unknown - Two-Sided
- See the text for the one-sided confidence
intervals.
413-3.3 Inference on the Mean of a
Population, Variance Unknown
423-3.4 Inference on the Variance of a
Normal Distribution
- Hypothesis Testing
- Hypotheses H0 H1
- Test Statistic
- Significance Level, ?
- Rejection Region
433-3.4 Inference on the Variance of a
Normal Distribution
- Confidence Interval on the Variance
- Two-Sided
- See the text for the one-sided confidence
intervals.
44Example
- Compute a 95 2-sided CI on the data in Table
3.1. - S2 2.76
- Find c2.025, 15 27.49 and c2.975,15 6.27
- 15(2.76)/27.49 lt s2 lt 15(2.76)/6.27
- 1.51 lt s2 lt 6.60
-
453-3.5 Inference on a Population
Proportion
- Hypothesis Testing
- Hypotheses H0 p p0 H1 p ? p0
- Test Statistic
- Significance Level, ?
- Rejection Region
463-3.5 Inference on a Population
Proportion
- Confidence Interval on the Population Proportion
- Two-Sided
- See the text for the one-sided confidence
intervals.
47Example
- A foundry produces castings used in the
automotive industry. We wish to test the
hypothesis that the fraction nonconforming from
this process is 10. In a random sample of 250
castings, 41 were found to be nonconforming. - H0 p .10
- H1 p .10
48Example, cont.
- np0 250(.10) 25
- So, x 41 gt np0
- Use Z0 (x - .5) - np0/SQRTnp0(1 p0)
- (41 - .5) - 25/SQRT25(.1)(1 .1) 3.27
- At a .05, Z.025 1.96
- So, reject H0
- P 2(1 - .99946) 2(.00054) .00108
493-3.6 The Probability of Type II Error
- Calculation of P(Type II Error)
- Assume the test of interest is H0 H1
- P(Type II Error) is found to be
- The Power of the test is then 1 - ?
50Explanation
- Test statistic is
- Z0 (xbar m0)/s/SQRT(n) N(0,1)
- Assume H0 is false
- Then find the distribution of Z
- Suppose that the mean is really m1 m0 d where
d gt 0 - Under this assumption Z0N(d/s/SQRT(n),1)
- Or, Z0Nd(SQRT(n)/s,1
51Explanation, cont.
- Now, take a look at Fig. 3-6
- We are trying to calculate the shaded area
- We must calculate F(Za/2) F(-Za/2)
- So, we use the equation shown previously
52Example
- Mean contents of coffee cans
- Given that s .1 oz.
- H0 m 16.0
- H1 m 16.0
- Suppose we want to find b if m 16.1 oz.
53Example, cont.
- Now, d 16.1 16.0 .1
- Z.025 1.96
- Then b F1.96(.1)(3)/.1 -
F-1.96(.1)(3)/.1 - Or, F(-1.04) F(-4.96)
- .1492
543-3.6 The Probability of Type II Error
- Operating Characteristic (OC) Curves
- Operating Characteristic (OC) curve is a graph
representing the relationship between ?, ?, ? and
n. - OC curves are useful in determining how large a
sample is required to detect a specified
difference with a particular probability.
553-3.6 The Probability of Type II Error
- Operating Characteristic (OC) Curves
Previous example d .1/.1 1 b .1492
d/s
563-3.7 Probability Plotting
- Probability plotting is a graphical method for
determining whether sample data conform to a
hypothesized distribution based on a subjective
visual examination of the data. - Probability plotting uses special graph paper
known as probability paper. Probability paper is
available for the normal, lognormal, and Weibull
distributions among others. - Can also use the computer.
573-3.7 Probability Plotting
58Tensile strength data
Note m and s can be estimated from the
probability plot as shown
593-4. Statistical Inference for Two Samples
- Previous section presented hypothesis testing and
confidence intervals for a single population
parameter. - Results are extended to the case of two
independent populations - Statistical inference on the difference in
population means,
603-4.1 Inference For a Difference in
Means, Variances Known
- Assumptions
- X11, X12, , X1n1 is a random sample from
population 1. - X21, X22, , X2n2 is a random sample from
population 2. - The two populations represented by X1 and X2 are
independent - Both populations are normal, or if they are not
normal, the conditions of the central limit
theorem apply
613-4.1 Inference For a Difference in
Means, Variances Known
- Point estimator for is
- where
-
623-4.1 Inference For a Difference in
Means, Variances Known
- Hypothesis Tests for a Difference in Means,
Variances Known - Null Hypothesis
- Test Statistic
-
633-4.1 Inference For a Difference in
Means, Variances Known
- Hypothesis Tests for a Difference in Means,
Variances Known - Alternative Hypotheses Rejection
Criterion
643-4.1 Inference For a Difference in
Means, Variances Known
- Confidence Interval on a Difference in Means,
Variances Known - 100(1 - ?) confidence interval on the
difference in means is given by
65Example 3-9
- Drying time is being studied
- 10 samples of each paint
- Xbar1 121 and Xbar2 112
- Standard deviation, s 8, unaffected by paint
formulation - H0 m1 - m2 0
- H1 m1 - m2 gt 0
66Example, cont.
- Z0 2.52 gt Z.05 1.645
- Therefore, reject H0
- P-value 1 F(2.52) .0059
- H0 would be rejected at any significance level a
gt .0059
67Confidence Interval
- On a difference in means, variances known
- See Eq. 3-49
683-4.2 Inference For a Difference in
Means, Variances Unknown
- Hypothesis Tests for a Difference in Means,
- Case I
- Point estimator for is
- where
-
693-4.2 Inference For a Difference in
Means, Variances Unknown
- Hypothesis Tests for a Difference in Means,
- Case I
- The pooled estimate of , denoted by is
defined by
703-4.2 Inference For a Difference in
Means, Variances Unknown
- Hypothesis Tests for a Difference in Means,
- Case I
- Null Hypothesis
- Test Statistic
-
713-4.2 Inference For a Difference in
Means, Variances Unknown
- Hypothesis Tests for a Difference in Means,
Variances Unknown - Alternative Hypotheses Rejection
Criterion
72Example 3-10
- Discuss this example on pgs. 120-121
- The variances are assumed to be equal, so that
they are pooled - t.025,14 2.145
733-4.2 Inference For a Difference in
Means, Variances Unknown
- Hypothesis Tests for a Difference in Means,
- Case II
- Null Hypothesis
- Test Statistic
-
743-4.2 Inference For a Difference in
Means, Variances Unknown
- Hypothesis Tests for a Difference in Means,
- Case II
- The degrees of freedom for are given by
753-4.2 Inference For a Difference in
Means, Variances Unknown
- Confidence Intervals on a Difference in Means,
Case I - 100(1 - ?) confidence interval on the
difference in means is given by
763-4.2 Inference For a Difference in
Means, Variances Unknown
- Confidence Intervals on a Difference in Means,
Case II - 100(1 - ?) confidence interval on the
difference in means is given by
77Example 3-11
- Discuss this example beginning on pg. 122
- The variances are assumed to be equal, so that
they are pooled - t.025,23 2.069
- Note that the 95 CI contains zero, so that we
cannot conclude that there is a difference in the
means
78Minitab solution
Two-sample T for Catalyst 1 vs Catalyst 2
N Mean StDev SE Mean Catalyst 8
92.26 2.39 0.84 Catalyst 8 92.73
2.99 1.1 Difference mu Catalyst 1 -
mu Catalyst 2 Estimate for difference -0.47 95
CI for difference (-3.37, 2.42) T-Test of
difference 0 (vs not ) T-Value -0.35
P-Value 0.731 DF 14 Both use Pooled StDev
2.70
Note 0 is included in the CI
79Minitab solution
No obvious differences
80Minitab solution
Normality and equal variances are reasonable
813-4.2 Paired Data
- Observations in an experiment are often paired to
prevent extraneous factors from inflating the
estimate of the variance. - Difference is obtained on each pair of
observations, dj x1j x2j, where j 1, 2, ,
n. - Test the hypothesis that the mean of the
difference, ?d, is zero.
823-4.2 Paired Data
- The differences, dj, represent the new set of
data with the summary statistics
833-4.2 Paired Data
- Hypothesis Testing
- Hypotheses H0 ?d 0 H1 ?d ? 0
- Test Statistic
- Significance Level, ?
- Rejection Region t0 ? t?/2,n-1
84Example 3-12
- dbar -1.38
- t0 -1.46
- t.025, 7 2.365
- Conclusion No strong evidence to indicate the
the two machines differ in their tensile strength - P-value .1877
85Solution using Minitab
Paired T for Mach 1 - Mach 2
N Mean StDev SE Mean Mach 1
8 69.13 5.96 2.11 Mach 2
8 70.50 6.07 2.15 Difference
8 -1.375 2.669 0.944 95 CI for mean
difference (-3.608, 0.858) T-Test of mean
difference 0 (vs not 0) T-Value -1.46
P-Value 0.188
863-4.3 Inferences on the Variances of Two
Normal Distributions
- Hypothesis Testing
- Consider testing the hypothesis that the
variances of two independent normal distributions
are equal. - Assume random samples of sizes n1 and n2 are
taken from populations 1 and 2, respectively
873-4.3 Inferences on the Variances of Two
Normal Distributions
- Hypothesis Testing
- Hypotheses
- Test Statistic
- Significance Level, ?
- Rejection Region
883-4.3 Inferences on the Variances of Two
Normal Distributions
-
- Alternative Test Rejection
- Hypothesis Statistic Region
893-4.3 Inferences on the Variances of Two
Normal Distributions
- Confidence Intervals on Ratio of the Variances of
Two Normal Distributions - 100(1 - ?) two-sided confidence interval
on the ratio of variances is given by
903-4.4 Inference on Two Population
Proportions
- Large-Sample Hypothesis Testing
- Hypotheses H0 p1 p2 H1 p1 ? p2
- Test Statistic
- Significance Level, ?
- Rejection Region
913-4.4 Inference on Two Population
Proportions
-
- Alternative Hypothesis Rejection Region
-
923-4.4 Inference on Two Population
Proportions
- Confidence Interval on the Difference in Two
Population Proportions - Two-Sided
- See the text for the one-sided confidence
intervals.
933-5. What If We Have More Than Two
Populations?
- Example
- Investigating the effect of one factor (with
several levels) on - some response. See Table 3-5
- Hardwood Observations
- Concentration 1 2 3 4 5 6
Totals Avg - 5 7 8 15 11 9 10
60 10 - 10 12 17 13 18 19 15
94 15.67 - 15 14 18 19 17 16 18
102 17 - 20 19 25 22 23 18 20
127 21.17 - Overall 383 15.96
943-5. What If We Have More Than Two
Populations?
- Analysis of Variance
- Always a good practice to compare the levels of
the factor using graphical methods such as
boxplots. - Comparative boxplots show the variability of the
observations within a factor level and the
variability between factor levels.
953-5. What If We Have More Than Two
Populations?
963-5. What If We Have More Than Two
Populations?
- The observations yij can be modeled by
- a number of factor levels
- n number of replicates of observations
- per treatment (factor) level.
973-5. What If We Have More Than Two
Populations?
- The hypotheses being tested are
- H0
- H1 for at least one i
- Total variability can be measured by the total
corrected sum of squares
983-5. What If We Have More Than Two
Populations?
- The sum of squares identity is
- Notationally, this is often written as
- SST SSTreatments SSE
993-5. What If We Have More Than Two
Populations?
- The expected value of the treatment sum of
squares is - If the null hypothesis is true, then
1003-5. What If We Have More Than Two
Populations?
- The error mean square
- If the null hypothesis is true, the ratio
-
- has an F-distribution with a 1 and a(n 1)
degrees of freedom.
1013-5. What If We Have More Than Two
Populations?
- The following formulas can be used to calculate
the sums of squares. - Total Sum of Squares (SST)
- Sum of Squares for the Treatments (SSTreatment)
- Sum of Squares for error (SSE)
- SSE SST -SSTreatment
1023-5. What If We Have More Than Two
Populations?
- Analysis of Variance Table 3-7
103Example 3-13
- Four different hardwood concentrations
- Trts
- Six observations of each
- Completely randomized
- Hypotheses
- H0 t1 t2 t3 t4 0
- H1 ti 0 for at least one i
104Example 3-13
- SST 7282 202 (383)2/24 512.96
- SStrts 60294210221272/6 (383)2/24
- 382.79
- SSE SST SStrts 512.96 382.79 130.17
105Example 3-13, cont.
106Example 3-13 cont.
- Since F.05,3,20 3.10, reject H0
- (H0 No difference in concentrations)
- Since F.01,3,20 4.94, reject H0
- (H0 No difference in concentrations)
1073-5. What If We Have More Than Two
Populations?
- Analysis of Variance Table 3-8
1083-5. What If We Have More Than Two
Populations?
- Residual Analysis
- Assumptions model errors are normally and
independently distributed with equal variance. - Check the assumptions by looking at residual
plots.
1093-5. What If We Have More Than Two
Populations?
- Residual Analysis
- Residual is given by eij yij ybari.
- ybar1. 10
- e11 7 10 3, and so on
1103-5. What If We Have More Than Two
Populations?
- Residual Analysis
- Plot of residuals versus factor levels
1113-5. What If We Have More Than Two
Populations?
- Residual Analysis
- Normal probability plot of residuals
112Exercises
- Work as many odd-numbered exercises as necessary
to make sure that you understand this chapter
113End