Title: Assumptions of Ordinary Least Squares Regression Part 2
1Assumptions of Ordinary Least Squares
Regression(Part 2)
27. Errors not normally distributed
- Problem
- Parameter estimates are unbiased
- P-values are unreliable
- Regression fits the mean with skewed residuals
the mean is not a good measure of central
tendency - Diagnosis examine QQ plot of residuals
3- Save residuals to dataset using Models -gt Add
Observation Statistics to Data - Make a QQ plot
4Errors not normally distributed
- Problem
- Parameter estimates are unbiased
- P-values are unreliable
- Regression fits the mean with skewed residuals
the mean is not a good measure of central
tendency - Diagnosis examine QQ plot of Studentized
residuals - Corrects for bias in estimates of residual
variance
5- Models -gt Graphs -gt Residual Quantile-Comparison
Plot
6Errors not normally distributed
- Problem
- Parameter estimates are unbiased
- P-values are unreliable
- Regression fits the mean with skewed residuals
the mean is not a good measure of central
tendency - Diagnosis examine QQ plot of Studentized
residuals - Corrects for bias in estimates of residual
variance
- Solutions
- Transform the dependent variable
- May create nonlinearity in the model
7Try transforming the response variable
Call lm(formula Sqrt.C Phosphorus
PhosphorusNitrogen, data Chlorophyll) Residual
s Min 1Q Median 3Q Max
-2.2788 -0.7979 -0.3315 0.6360 3.3290
Coefficients Estimate
Std. Error t value Pr(gtt) (Intercept)
2.5212494 0.4412690 5.714 9.54e-06
Phosphorus 0.0091702 0.0032668
2.807 0.010268 PhosphorusNitrogen 0.0019191
0.0004201 4.569 0.000150 --- Signif. codes
0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1
Residual standard error 1.436 on 22 degrees of
freedom Multiple R-Squared 0.8521, Adjusted
R-squared 0.8386 F-statistic 63.35 on 2 and 22
DF, p-value 7.433e-10
Data -gt Manage Variables -gt Compute New
Variable
8The residuals are less skewed
9But weve introduced nonlinearity
Actual by Predicted Plot (Chlorophyll)
Actual by Predicted Plot (sqrtChlorophyll)
10Errors not normally distributed
- Problem
- Parameter estimates are unbiased
- P-values are unreliable
- Regression fits the mean with skewed residuals
the mean is not a good measure of central
tendency - Diagnosis examine QQ plot of Studentized
residuals - Corrects for bias in estimates of residual
variance
- Solutions
- Transform the dependent variable
- May create nonlinearity in the model
- Fit a generalized linear model (GLM)
- Allows us to assume the residuals follow a
different distribution (binomial, gamma, etc.)
11Summary of OLS assumptions
12Fixing assumptions via data transformations is an
iterative process
- After each modification, fit the new model and
look at all the assumptions again
13What can we do about chlorophyll regression?
- Square root transform helps a little with
non-normality and a lot with heteroskedasticity - But it creates nonlinearity
14A new model its linear
15 it has normal residuals (sort of) and is
homoskedastic
16 and it fits well!
Call lm(formula sqrt(Chlorophyll.a)
sqrt(Phosphorus) sqrt(Phosphorus
Nitrogen), data Chlorophyll) Residuals
Min 1Q Median 3Q Max -1.5846
-0.7758 -0.1640 0.6975 2.5464 Coefficients
Estimate Std. Error t
value Pr(gtt) (Intercept)
-0.90141 0.61584 -1.464 0.157414
sqrt(Phosphorus) 0.21408 0.09547
2.242 0.035348 sqrt(Phosphorus Nitrogen)
0.15133 0.03742 4.044 0.000542
--- Signif. codes 0 '' 0.001 '' 0.01
'' 0.05 '.' 0.1 ' ' 1 Residual standard error
1.198 on 22 degrees of freedom Multiple
R-Squared 0.897, Adjusted R-squared 0.8876
F-statistic 95.77 on 2 and 22 DF, p-value
1.388e-11
17(No Transcript)