Title: Analysis of Variance ANOVA
1Analysis of Variance (ANOVA)
- Engineering Experimental Design
- Valerie L. Young
2Example Problem
- As a chicken nutritionist, you need reliable
measures of lasalocid sodium levels in chicken
feed, so you decide to test the analytical labs. - You send a sample of feed containing 85 mg/kg
lasalocid sodium to each of three independent
labs, and get back the results at left. - Do the analyses agree with each other?
- Do the analyses agree with the true value?
3Graph it First
4Interpret the Graph
- Laboratory B results appear to be higher
- All labs results overlap with each other
- All labs results overlap with the true value (85
mg/kg) - Based on the graph alone, I cannot tell whether
any of the labs differ significantly from each
other or from the true value.
5ANOVA What does it tell me?
- ANOVA Analysis of Variance
- ANOVA will tell me whether I have sufficient
evidence to say that measurements from at least
one lab differ significantly from at least one
other. - It will not tell me which ones differ, or how
many differ.
6ANOVA vs. t-test
- ANOVA is like a t-test among multiple data sets
simultaneously - t-tests can only be done between two data sets,
or between one set and a true value - ANOVA uses the F distribution instead of the
t-distribution - ANOVA assumes that all of the data sets have
equal variances - Use caution on close decisions if they dont
- Consult a professional
7ANOVA a Hypothesis Test
- H0 There is no significant difference among the
results provided by these three laboratories. - H1 At least one of these laboratories provides
results significantly different from at least one
other.
8Excel and ANOVA
- Tools gt Data Analysis gt
- ANOVA Single Factor
- ANOVA Two-Factor with Replication
- ANOVA Two-Factor without Replication
- So how many factors do we have here?
- Factor Independent Variable
- The I.V. here is Laboratory
- We have a SINGLE factor with THREE levels (A,B,C)
9ANOVA Results
10Focus on ANOVA Table
F MSTr / MSE 59.233 / 6.2407
Treatments
MSTr
Error
MSE
- F ratio of variability (between groups) due to
treatment to variability (within groups) due to
random error - P probability of getting an F value at least
this large if these were 3 sets of 10
measurements from the same population
11Decision Based on ANOVA
F MSTr / MSE 59.233 / 6.2407
Treatments
MSTr
Error
MSE
- F gt F critical
- Reject H0
- P lt 0.05 (chosen significance level)
- Reject H0
- If H0 were true, the probability of getting 3
sets of data like this is less than 0.1
12Where Does the Difference Lie?
- ANOVA only shows that a difference exists
- To find the difference, consider
- Graphical representation
- Mean with confidence limits
- Effects plot
- Analysis of Means (ANOM)
- For one factor at multiple levels, ANOM is a
better technique than ANOVA - For multiple factors, ANOVA is required. We are
showing ANOVA for one factor so you can better
understand it when it is properly applied for
multiple factors.
13Descriptive (Summary) Statistics
Note I have not used the pooled variance (MSe)
to calculate the confidence limits. Since the
analysis was performed by different labs, I
decided to allow for them to have different
uncertainties.
14Focus on Descriptive Statistics
Mean of 10 replicate measurements by laboratory
A. This sample mean is an estimate of the true
concentration.
Standard deviation of 10 replicate measurements
by laboratory A. This sample std dev is an
estimate of the true std dev. for lab A
95 confidence interval on the mean.
15Graphical Representation
95 CI on mean
Mean
Target Value
16Interpretation of Graph
- Results from Lab B differ significantly from Lab
C and from the known value (85 mg/kg) - Results from Lab A and Lab C agree with one
another and with the known value - Lab B analysis is unacceptable
- This is the appropriate conclusion even though
analyses by Labs A and B are apparently not
significantly different
17Example Lead Contamination
- Lead was banned as a gasoline additive in 1978.
Soil lead levels were monitored at 80 randomly
chosen locations in the United States to
determine whether the lead ban resulted in
significant reduction of environmental lead
levels over time. The following results were
obtained.
18ANOVA Lead Contamination
- You must have all of the original data to
calculate the values for SSyear and SStotal . - F-crit can be found on Table B.7, last row (?),
3rd column (2).
19CRD Completely Randomized Design
- An experiment with multiple independent groups of
data is a Completely Randomized Design (CRD) - ANOVA is like a t-test among multiple independent
groups of data - If specific data points between groups are linked
(like in a paired t-test) then it is not
single-factor ANOVA - Linking data between groups with some other
factor is called blocking