Title: Topic 9: Remedies
1Topic 9 Remedies
2Outline
- Review diagnostics for residuals
- Discuss remedies
- Nonlinear relationship
- Nonconstant variance
- Nonnormal distribution
- Outliers
3Diagnostics for residuals
- Look at residuals to find serious violations of
the model assumptions - nonlinear relationship
- nonconstant variance
- nonnormal errors
- presence of outliers
- a strongly skewed distribution
4Recommendations for checking assumptions
- Plot Y vs X (is it a linear relationship?)
- Look at distribution of residuals
- Plot residuals vs X, time, or any other potential
explanatory variable - Use the ism in symbol statement to get
smoothed curves
5Plots of Residuals
- Plot residuals vs
- Time (order)
- Explanatory variables
- Look for
- nonrandom patterns
- outliers (unusual observations)
6Residuals vs Order
- Pattern in plot suggests dependent errors / lack
of indep - Pattern usually a linear or quadratic trend
and/or cyclical - If you are interested read NKNW pp 104-105
7Tests for normality
- H0 data are an i.i.d. sample from a normal
population - Ha data are not an i.i.d. sample from a normal
population - NKNW (p 111) suggest a correlation test that
requires a table look-up
8Tests for normality
- We have several choices for a significance
testing procedure - Proc univariate with the normal option provides
four - proc univariate normal
- Shapiro-Wilk is a common choice
9Test Shapiro-Wilk W
Kolmogorov-Smirnov D Cramer-von Mises
W-Sq Anderson-Darling A-Sq statistic
-----p Value------ 0.978 Pr lt W
0.8626 0.095 Pr gt D gt0.1500 0.033
Pr gt W-Sq gt0.2500 0.207 Pr gt A-Sq gt0.2500
10Other tests for model assumptions
- Durbin-Watson test for serially correlated errors
(NKNW p 110) - Modified Levene test for homogeneity of variance
(NKNW p 112-114) - Breusch-Pagan test for homogeneity of variance
(NKNW p 115) - For SAS commands see nknw110.sas
11Plots vs significance test
- Plots are more likely to suggest a remedy
- Significance tests results are very dependent on
the sample size with sufficiently large samples
we can reject most null hypotheses
12Lack of fit
- When we have repeat observations at different
values of X, we can do a significance test for
nonlinearity - Browse through NKNW 3.7
- We will do details when we get to NKNW 17.9, p
742 - Basic idea is to compare two models
- Gplot with a smooth is a better (i.e., simpler)
approach
13Nonlinear relationships
- We can model many nonlinear relationships with
linear models, some have several explanatory
variables (i.e., multiple linear regression) - Y ß0 ß1X ß2X2 ? (quadratic)
- Y ß0 ß1log(X) ?
14Nonlinear Relationships
- Sometimes can transform a nonlinear equation into
a linear equation - Consider Y ß0exp(ß1X) ?
- Can form linear model using log
- log(Y) log(ß0) ß1X ?
- Note that we have changed our assumption about
the error
15Nonlinear Relationship
- We can perform a nonlinear regression analysis
- NKNW Chapter 13
- SAS PROC NLIN
16Nonconstant variance
- Sometimes we model the way in which the error
variance changes - may be linearly related to X
- We can then use a weighted analysis
- NKNW 10.1
- Use a weight statement in PROC REG
17Nonnormal errors
- Transformations often help
- Use a procedure that allows different
distributions for the error term - SAS PROC GENMOD
18GENMOD
- Possible distributions of Y
- Binomial (Y/N or percentage data)
- Poisson (Count data)
- Gamma (exponential)
- Inverse gaussian
- Negative binomial
- Multinomial
- Specify a link function for E(Y)
19Ladder of Reexpression(transformations)
1.5
p
Transformation is xp
1.0
0.5
0.0
-0.5
-1.0
20Circle of Transformations
X up, Y up
X down, Y up
Y
X
X up, Y down
X down, Y down
21Box-Cox Transformations
- Also called power transformations
- These transformations adjust for nonnormality and
nonconstant variance - Y Y? or Y (Y? - 1)/?
- In the second form, the limit as ? approaches
zero is the (natural) log
22Important Special Cases
- ? 1, Y Y1, no transformation
- ? .5, Y Y1/2, square root
- ? -.5, Y Y-1/2, one over square root
- ? -1, Y Y-1 1/Y, inverse
- ? 0, Y (natural) log of Y
23Box-Cox Details
- We can estimate ? by including it as a parameter
in a non linear model - Y? ß0 ß1X ?
- and using the method of maximum likelihood
- Details are in NKMW p 132-133
- SAS code is in nknw132.sas
24Box-Cox Solution
- Standardized transformed Y is
- K1(Y? - 1) if ? ? 0
- K2log(Y) if ? 0
- where K2 (? Yi)1/n (the geometric mean)
- and K1 1/ (? K2 ?-1)
- Run regressions with X as explanatory variable
- estimated ? minimizes SSE
25data a1 input age plasma _at__at_ cards 0 13.44 0
12.84 0 11.91 0 20.09 0 15.60 1 10.11 1 11.38 1
10.28 1 8.96 1 8.59 2 9.83 2 9.00 2 8.65 2
7.85 2 8.88 3 7.94 3 6.01 3 5.14 3 6.90 3
6.77 4 4.86 4 5.10 4 5.67 4 5.75 4 6.23
26(No Transcript)
27The first part of the program gets the geometric
mean data a2 set a1 lplasmalog(plasma)
proc univariate dataa2 noprint var lplasma
output outa3 meanmeanl
28data a4 set a2 if _n_ eq 1 then set a3
keep age yl l k2exp(meanl) do l -1.0
to 1.0 by .1 k11/(lk2(l-1))
ylk1(plasmal -1) if abs(l) lt 1E-8 then
ylk2log(plasma) output end
29proc sort dataa4 outa4 by l proc reg
dataa4 noprint outesta5 model ylage
by l data a5 set a5 n25 p2
sse(n-p)(_rmse_)2 proc print dataa5
var l sse
30 Obs l sse 1 -1.0 33.9089 2
-0.9 32.7044 3 -0.8 31.7645 4
-0.7 31.0907 5 -0.6 30.6868 6
-0.5 30.5596 7 -0.4 30.7186 8
-0.3 31.1763 9 -0.2 31.9487 10
-0.1 33.0552
31symbol1 vnone ijoin proc gplot dataa5
plot ssel run
32(No Transcript)
33data a1 set a1 tplasma plasma(-.5) tage
(age.5)(-.5) symbol1 vcircle ism50
proc gplot plot tplasmaage proc sort by
tage proc gplot plot tplasmatage run
34(No Transcript)
35(No Transcript)
36Box Cox Procedure
There is a fairly new procedure that will find
the box-cox transformation proc transreg
dataa1 model boxcox(plasma)identity(age) run
37 Transformation
Information for BoxCox(plasma)
Lambda R-Square Log Like
-2.50 0.76
-17.0444 -2.00
0.80 -12.3665
-1.50 0.83 -8.1127
-1.00 0.86
-4.8523 -0.50
0.87 -3.5523 lt
0.00 0.85 -5.0754
0.50 0.82
-9.2925 1.00
0.75 -15.2625
1.50 0.67 -22.1378
2.00 0.59
-29.4720 2.50
0.50 -37.0844
lt - Best
Lambda -
Confidence Interval
- Convenient Lambda
38Background Reading
- Sections 3.4 - 3.7 describe significance tests
for assumptions (read it if you are interested). - Box-Cox transformation is in nknw132.sas
- Read sections 4.1, 4.2, 4.4, 4.5, and 4.6