Title: Influential Observations in Regression
1Influential Observations in Regression
- Measurements on Heat Production as a Function of
Body Mass and Work Effort. - M. Greenwood (1918). On the Efficiency of
Muscular Work, Proc. Roy. Soc. Of London, Series
B, Vol. 90, 627, pp. 199-214
2Data Description
- Study involved Algerians accustomed to heavy
labor. Experiment consisted of several hours on
stationary bicycle. - Dependent (Response) Variable
- Heat Production (Calories)
- Independent (Explanatory/Predictor) Variables
- Work Effort (Calories)
- Body Mass (kg)
- Model
- H b0 b1W b2M e
3Raw Data (Table III, p.203)
4Estimated Regression Coefficients
- Note that that we can conclude, controlling for
the other factor - Work Effort increase ? Heat Production increases
(p .0136) - Body Mass increase does not ? Heat Production
increases (p .1957)
5Plot of Residuals versus Fitted Values
Huge, Positive, Residual
6Influential Measures (I) Note n37, p3
Parameters
7Standardized / Studentized Residuals
8Influential Measures (II)
9Influential Measures (III)
10Diagnosing Influential Observations
- Clearly, Observation 19 exerts a huge influence
(although it has a small hat or leverage value,
so it must be near center of Mass/Work
observations - Upon further review to authors original
calculations provided in paper, the mean and S.D.
are much to high for H (but exactly the same for
M and W). - Could observation been a typo?
- Try replacing H193936 with H192936
- Note Do not do this arbitrarily, check your data
sources in practice
11Analysis with Corrected Data Point
Note that both factors are significant, and that
the intercept and body mass coefficients have
changed drastically
12Plot of Residuals versus Predicted Values