Assumptions of Ordinary Least Squares Regression Part 2 - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Assumptions of Ordinary Least Squares Regression Part 2

Description:

Make a QQ plot. 4. Errors not normally distributed. Problem: Parameter ... Diagnosis: examine QQ plot of ... error: 1.436 on 22 degrees of freedom ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 18
Provided by: brucek64
Category:

less

Transcript and Presenter's Notes

Title: Assumptions of Ordinary Least Squares Regression Part 2


1
Assumptions of Ordinary Least Squares
Regression(Part 2)
  • ESM 206
  • Jan 21, 2008

2
7. Errors not normally distributed
  • Problem
  • Parameter estimates are unbiased
  • P-values are unreliable
  • Regression fits the mean with skewed residuals
    the mean is not a good measure of central
    tendency
  • Diagnosis examine QQ plot of residuals

3
  • Save residuals to dataset using Models -gt Add
    Observation Statistics to Data
  • Make a QQ plot

4
Errors not normally distributed
  • Problem
  • Parameter estimates are unbiased
  • P-values are unreliable
  • Regression fits the mean with skewed residuals
    the mean is not a good measure of central
    tendency
  • Diagnosis examine QQ plot of Studentized
    residuals
  • Corrects for bias in estimates of residual
    variance

5
  • Models -gt Graphs -gt Residual Quantile-Comparison
    Plot

6
Errors not normally distributed
  • Problem
  • Parameter estimates are unbiased
  • P-values are unreliable
  • Regression fits the mean with skewed residuals
    the mean is not a good measure of central
    tendency
  • Diagnosis examine QQ plot of Studentized
    residuals
  • Corrects for bias in estimates of residual
    variance
  • Solutions
  • Transform the dependent variable
  • May create nonlinearity in the model

7
Try transforming the response variable
Call lm(formula Sqrt.C Phosphorus
PhosphorusNitrogen, data Chlorophyll) Residual
s Min 1Q Median 3Q Max
-2.2788 -0.7979 -0.3315 0.6360 3.3290
Coefficients Estimate
Std. Error t value Pr(gtt) (Intercept)
2.5212494 0.4412690 5.714 9.54e-06
Phosphorus 0.0091702 0.0032668
2.807 0.010268 PhosphorusNitrogen 0.0019191
0.0004201 4.569 0.000150 --- Signif. codes
0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1
Residual standard error 1.436 on 22 degrees of
freedom Multiple R-Squared 0.8521, Adjusted
R-squared 0.8386 F-statistic 63.35 on 2 and 22
DF, p-value 7.433e-10
Data -gt Manage Variables -gt Compute New
Variable
8
The residuals are less skewed
9
But weve introduced nonlinearity
Actual by Predicted Plot (Chlorophyll)
Actual by Predicted Plot (sqrtChlorophyll)
10
Errors not normally distributed
  • Problem
  • Parameter estimates are unbiased
  • P-values are unreliable
  • Regression fits the mean with skewed residuals
    the mean is not a good measure of central
    tendency
  • Diagnosis examine QQ plot of Studentized
    residuals
  • Corrects for bias in estimates of residual
    variance
  • Solutions
  • Transform the dependent variable
  • May create nonlinearity in the model
  • Fit a generalized linear model (GLM)
  • Allows us to assume the residuals follow a
    different distribution (binomial, gamma, etc.)

11
Summary of OLS assumptions
12
Fixing assumptions via data transformations is an
iterative process
  • After each modification, fit the new model and
    look at all the assumptions again

13
What can we do about chlorophyll regression?
  • Square root transform helps a little with
    non-normality and a lot with heteroskedasticity
  • But it creates nonlinearity

14
A new model its linear
15
it has normal residuals (sort of) and is
homoskedastic
16
and it fits well!
Call lm(formula sqrt(Chlorophyll.a)
sqrt(Phosphorus) sqrt(Phosphorus
Nitrogen), data Chlorophyll) Residuals
Min 1Q Median 3Q Max -1.5846
-0.7758 -0.1640 0.6975 2.5464 Coefficients
Estimate Std. Error t
value Pr(gtt) (Intercept)
-0.90141 0.61584 -1.464 0.157414
sqrt(Phosphorus) 0.21408 0.09547
2.242 0.035348 sqrt(Phosphorus Nitrogen)
0.15133 0.03742 4.044 0.000542
--- Signif. codes 0 '' 0.001 '' 0.01
'' 0.05 '.' 0.1 ' ' 1 Residual standard error
1.198 on 22 degrees of freedom Multiple
R-Squared 0.897, Adjusted R-squared 0.8876
F-statistic 95.77 on 2 and 22 DF, p-value
1.388e-11
17
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com