AOV Assumption Checking and Transformations - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

AOV Assumption Checking and Transformations

Description:

class sand; model resistance = sand / solution; means sand / hovtest=bartlett; ... hovtest=bf; hovtest=bartlett; hovtest=levene(type=abs); hovtest=levene(type=square) ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 35

Provided by: imadk

Category:

more less

Transcript and Presenter's Notes

Title: AOV Assumption Checking and Transformations

1
AOV Assumption Checking andTransformations

How do we check the Normality of residuals
assumption in AOV?
How do we check the Homogeniety of variances
assumption in AOV?
What to do if these assumptions are not met?

2
Model Assumptions

Homoscedasticity (common group variances
Normality of residuals
Effect additivity
Independence of residuals

3
Checking the Equal Variance Assumption
HA some of the variances are different from each
other
Little work but little power
Hartleys Test A logical extension of the F
test for t2.
Requires equal replication, n, among groups.
Reject if Fmax exceeds Fa,t,n-1 in Fmax Table 12.
4
Bartletts Test
More work but better power
Bartletts Test Allows unequal replication.
T.S.
If C gt c2(k-1),a then apply the correction term
Reject if C/CF gt c2(k-1),a
R.R.
kaverage replicates per group.
5
Levines Test
More work but powerful result
sample median of i-th group
Let
T.S.
df1 t-1 df2 N-t
Reject H0 if
R.R.
Use Table 8.
Essentially an AOV on the zij
6
SAS Program
proc glm datastress class sand model
resistance sand / solution means sand /
hovtestbartlett means sand / hovtestlevene(type
abs) means sand / hovtestlevene(typesquare) m
eans sand / hovtestbf / Brown and Forsythe mod
of Levene / title1 'Compression resistance in
concrete beams as' title2 ' a function of
percent sand in the mix' run
Hovtest only works when one factor in (right hand
side) model.
7
SAS
hovtestbartlett
Bartlett's Test for Homogeneity of resistance
Variance Source DF Chi-Square Pr gt
ChiSq sand 4 1.8901 0.7560
Levene's Test for Homogeneity of resistance
Variance ANOVA of Absolute Deviations
from Group Means Sum of
Mean Source DF Squares Square
F Value Pr gt F sand 4 8.8320
2.2080 0.95 0.4573 Error 20
46.6080 2.3304
hovtestlevene(typeabs)
Levene's Test for Homogeneity of resistance
Variance ANOVA of Squared Deviations from
Group Means Sum of
Mean Source DF Squares Square
F Value Pr gt F sand 4 202.2
50.5504 0.85 0.5076 Error 20
1182.8 59.1400
hovtestlevene(typesquare)
Brown and Forsythe's Test for Homogeneity of
resistance Variance ANOVA of Absolute
Deviations from Group Medians
Sum of Mean Source DF Squares
Square F Value Pr gt F sand 4
7.4400 1.8600 0.46
0.7623 Error 20 80.4000 4.0200
hovtestbf
8
Checking for Normality
Reminder Normality of the RESIDUALS is assumed.
The original data are assumed normal also, but
each group may have a different mean if HA is
true. Practice is to first fit the model, THEN
output the residuals, then test for normality of
the residuals. This APPROACH is always correct.
TOOLS

Histogram of all residuals (eij).
Normal probability (Q-Q) plot.
Formal test for normality.

9
Histogram of Residuals
proc glm datastress class sand model
resistance sand / solution output outresid
rr_resis pp_resis title1 'Compression
resistance in concrete beams as' title2 ' a
function of percent sand in the mix' run proc
capability dataresid histogram r_resis /
normal ppplot r_resis / normal square run
10
Probability Plots
A scatter plot of the percentiles on the
residuals versus the percentiles of a standard
normal distribution. The basic idea is that if
the residuals are truly normally distributed,
values for these percentiles should lie on a
straight line.

Compute and sort the residuals e(1), e(2),,
e(n).
Associate to each residual a standard normal
percentile. z(i) normsinv((i-.5)/n).
Plot z(i) versus e(i). Compare to straight line.

11
Speadsheet
Percentile pi (i-0.5)/n
Normal percentile
NORMSINV(pi)
Use EXCEL for scatterplot of percentile
versus Normal percentile. Use AddLine option.
12
Excel Probability Plot
13
Excel Probability Plot
14
Probability Plot
Minitab
SAS (note axes changed)
These look normal!
15
Non Normal Residuals
Examples of non-normal looking residuals.
Note the strong deviations from a straight line.
16
Formal Normality Tests
Many, many tests (a favorite pass-time of
statisticians is developing new tests for
normality.)

Kolmogorov-Smirnov test.
Shapiro-Wilks test (n lt 50).
DAgostinos test (ngt50)

All quite conservative they reject the
hypothesis of normality more often than they
should.
17
Shapiro-Wilks W test
e1, e2, , en represent data ranked from smallest
to largest.
H0 The population has a normal distribution. HA
The population does not have a normal
distribution.
T.S.
Coefficients ai come from a table.
If n is even
R.R. Reject H0 if W lt W0.05
If n is odd.
Critical values of Wa come from a table.
18
Shipiro-Wilk Coefficients
19
Shipiro-Wilk Coefficients
20
Shipiro-Wilk W Table
21
DAgostinos Test
e1, e2, , en represent data ranked from smallest
to largest.
H0 The population has a normal distribution. HA
The population does not have a normal
distribution.
T.S.
R.R. (two sided test) Reject H0 if
Y0.025 and Y0.975 come from a table of
percentiles of the Y statistic.
22
(No Transcript)
23
K-S test

Too difficult to explain.
Not as powerful as Shipiro-Wilks or DAgostino
tests.

What do we do if the residuals are not normal or
the variances not equal?
24
Handling Heterogeneity
no
Regression?
ANOVA
yes
Fit Effect Model
Fit linear model
accept
OK
Test for Homoscedasticity
Plot residuals
reject
Transform
Not OK
OK
Box/Cox Family Power Family
Traditional
Transformed Data
25
Transformations to Achieve Normality
no
Regression?
ANOVA
yes
Fit linear model
Estimate group means
Probability plot Formal Tests
yes
OK
Residuals Normal?
no
Different Model
Transform
26
Square Root Transformation
Response is positive and continuous.
This transformation works when we notice the
variance changes as a linear function of the mean.
kgt0

Useful for count data (Poisson Distributed).
For small values of Y, use Y.5.

Typical use Counts of items when counts are
between 0 and 10.
27
Logarithmic Transformation
Response is positive and continuous.
This transformation tends to work when the
variance is a linear function of the square of
the mean
kgt0

Replace Y by Y1 if zero occurs.
Useful if effects are multiplicative, or,
If there is considerable heterogeneity
in the data.

Typical use Growth over time. Concentrations.
Counts are greater than 10.
28
ARCSINE SQUARE ROOT
Response is a proportion.
With proportions, the variance is a linear
function of the mean times (1-mean) where the
sample mean is the expected proportion.

Y is a proportion (decimal between 0 and 1).
Zero counts should be replaced by 1/4, and
N by N-1/4 before converting to percentages

Typical use Proportion of seeds
germinating. Proportion responding.
29
Reciprocal Transformation
Response is positive and continuous.
This transformation works when the variance is a
linear function of the fourth root of the mean.

Use Y1 if zero occurs.
Useful if the reciprocal of the original
scale has meaning.

Typical use Survival time.
30
Power Family of Transformations (1)
Suppose we apply the power transformation
Suppose the true situation is that the variance
is proportional to the mean.
In the transformed variable we will have
If p is taken as 1-k, then the variance of Z will
not depend on the mean.
31
Power Family of Transformations (2)
With replicated data, k can sometimes be found
empirically by fitting
Estimate
k can be estimated by least squares (regression
Next Unit).
If is zero use the logarithmic
transformation.
32
Box and Cox Transformations
suggested transformation
geometric mean of the original data.
Exponent, l, is unknown. Hence the model can be
viewed as having an additional parameter which
must be estimated.
33
Box and Cox Transformations
Find the value of l that minimizes the residual
sum of squares.
If SSE0 denotes the minimum sums of squares,
then values of l corresponding to the critical
sum of squares
where n is the residual degrees of freedom,
provide an approximate 100(1-a) CI on power.
34
Conclusions

What have we learned?
How to check ANOVA model assumptions?
What are typical transformation that might be
used to correct for these assumptions?
Other models? next unit.

Write a Comment

User Comments (0)