Title: Stat%20112:%20Lecture%207%20Notes
1Stat 112 Lecture 7 Notes
- Homework 2 Due next Thursday
- The Multiple Linear Regression model (Chapter
4.1) - Inferences from multiple regression analysis
(Chapter 4.2)
2Interpretation of Regression Coefficients
- Gas mileage regression from Car89.JMP
3Partial Slopes vs. Marginal Slopes
- Multiple Linear Regression Model
- The coefficient is a partial slope. It
indicates the change in the mean of y that is
associated with a one unit increase in while
holding all other variables
fixed. - A marginal slope is obtained when we perform a
simple regression with only one X, ignoring all
other variables. Consequently the other
variables are not held fixed.
4Partial vs. Marginal Slopes Example
5Partial Slopes vs. Marginal Slopes Another
Example
- In order to evaluate the benefits of a proposed
irrigation scheme in a certain region, suppose
that the relation of yield Y to rainfall R is
investigated over several years. - Data is in rainfall.JMP.
6(No Transcript)
7Higher rainfall is associated with lower
temperature.
8Rainfall is estimated to be beneficial once
temperature is held fixed.
Multiple regression provides a better picture of
the benefits of an irrigation scheme because
temperature would be held fixed in an irrigation
scheme.
9Inferences about Regression Coefficients
- Confidence intervals
confidence interval for - Degrees of freedom for t equals n-(K1).
Standard error of , , found on JMP
output. - Hypothesis Test
- Decision rule for test Reject H0 if
or - where
- p-value for testing is
printed in JMP output under Probgtt.
10Inference Examples
- Find a 95 confidence interval for ?
- Is seating of any help in predicting gas mileage
once horsepower, weight and cargo have been taken
into account? Carry out a test at the 0.05
significance level.
11(No Transcript)
12Checking Assumptions
Multiple Linear Regression Model
-
- The expected value of the disturbances is zero
for each , - The variance of each is equal to ,i.e.,
- The are normally distributed.
- The are independent.
13Plots for Checking Assumptions
- We can construct residual plots of each
explanatory variable Xk vs. the residuals. - We save the residuals by clicking the red
triangle next to Response after fitting the model
and clicking Save Columns and then residuals. We
then plot Xk vs. the residuals using Fit Y by X
(where Ythe residuals). We can plot a
horizontal line at 0 by using Fit Y by X (it is a
property of multiple linear regression that the
least squares line for the regression of the
residuals on any Xk is a horizontal line. - A useful summary of the residual plots for each
explanatory variable is the Residual by Predicted
plot that is automatically plotted after using
Fit Model. The residual by predicted plot is a
plot of the predicted values
, , vs. the
residuals
14Checking Assumptions
- Linearity
- Check that in residual by predicted plot, the
mean of the residuals for each range of the
predicted values is about zero. - Check that in each residual plot, the mean of the
residuals for each range of the explanatory
variable is about zero. - Constant Variance Check that in the residual by
predicted plot that for each range of the
predicted values, the spread of the residuals is
about the same. - Normality Plot histogram of the residuals.
Check that the histogram is bell shaped.
15Residual by predicted plot does not suggest and
suggests approximately constant variance
Plot of horsepower vs. residuals suggests
linearity is okay.
Plot of weight vs. residuals suggests linearity
is okay. One potential concern is that highest
weight cars all have negative residuals.
16Plot of residuals vs. horsepower suggest
linearity is okay. Highest 4 horsepower cars all
have negative residuals but next 5 highest
horsepower cars all have positive residuals.
Plot of residuals vs. seating suggests linearity
is not perfect for seating. Residuals for
small and high seating seem to have a mean that
is smaller than 0.
17Coefficient of Determination
- The coefficient of determination for
multiple regression is defined as for simple
linear regression - Represents percentage of variation in y that is
explained by the multiple regression line. - is between 0 and 1. The closer to 1, the
better the fit of the regression equation to the
data.
18Assessing Quality of Prediction (Chapter 3.5.3)
- R squared measures is a measure of a fit of the
regression to the sample data. It is not
generally considered an adequate measure of the
regressions ability to predict the responses for
new observations. - One method of assessing the ability of the
regression to predict the responses for new
observations is data splitting. - We split the data into a two groups a training
sample and a holdout sample (also called a
validation sample). We fit the regression model
to the training sample and then assess the
quality of predictions of the regression model to
the holdout sample.
19College Data in collegeclass.JMP
- Training Sample 40 observations.
- Holdout Sample Last 10 observations.
- Mean Squared Deviation Mean squared prediction
error over the holdout sample - over the n2 (10 here)
observations - in the holdout sample.