Stat%20112:%20Lecture%207%20Notes

About This Presentation

Title:

Stat%20112:%20Lecture%207%20Notes

Description:

Plot of horsepower vs. residuals. suggests linearity is okay. Plot of ... highest horsepower cars all have. positive residuals. Coefficient of Determination ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 20

Provided by: dsma3

Learn more at: https://statistics.wharton.upenn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Stat%20112:%20Lecture%207%20Notes

1
Stat 112 Lecture 7 Notes

Homework 2 Due next Thursday
The Multiple Linear Regression model (Chapter
4.1)
Inferences from multiple regression analysis
(Chapter 4.2)

2
Interpretation of Regression Coefficients

Gas mileage regression from Car89.JMP

3
Partial Slopes vs. Marginal Slopes

Multiple Linear Regression Model
The coefficient is a partial slope. It
indicates the change in the mean of y that is
associated with a one unit increase in while
holding all other variables
fixed.
A marginal slope is obtained when we perform a
simple regression with only one X, ignoring all
other variables. Consequently the other
variables are not held fixed.

4
Partial vs. Marginal Slopes Example
5
Partial Slopes vs. Marginal Slopes Another
Example

In order to evaluate the benefits of a proposed
irrigation scheme in a certain region, suppose
that the relation of yield Y to rainfall R is
investigated over several years.
Data is in rainfall.JMP.

6
(No Transcript)
7
Higher rainfall is associated with lower
temperature.
8
Rainfall is estimated to be beneficial once
temperature is held fixed.
Multiple regression provides a better picture of
the benefits of an irrigation scheme because
temperature would be held fixed in an irrigation
scheme.
9
Inferences about Regression Coefficients

Confidence intervals
confidence interval for
Degrees of freedom for t equals n-(K1).
Standard error of , , found on JMP
output.
Hypothesis Test
Decision rule for test Reject H0 if
or
where
p-value for testing is
printed in JMP output under Probgtt.

10
Inference Examples

Find a 95 confidence interval for ?
Is seating of any help in predicting gas mileage
once horsepower, weight and cargo have been taken
into account? Carry out a test at the 0.05
significance level.

11
(No Transcript)
12
Checking Assumptions
Multiple Linear Regression Model

The expected value of the disturbances is zero
for each ,
The variance of each is equal to ,i.e.,
The are normally distributed.
The are independent.

13
Plots for Checking Assumptions

We can construct residual plots of each
explanatory variable Xk vs. the residuals.
We save the residuals by clicking the red
triangle next to Response after fitting the model
and clicking Save Columns and then residuals. We
then plot Xk vs. the residuals using Fit Y by X
(where Ythe residuals). We can plot a
horizontal line at 0 by using Fit Y by X (it is a
property of multiple linear regression that the
least squares line for the regression of the
residuals on any Xk is a horizontal line.
A useful summary of the residual plots for each
explanatory variable is the Residual by Predicted
plot that is automatically plotted after using
Fit Model. The residual by predicted plot is a
plot of the predicted values
, , vs. the
residuals

14
Checking Assumptions

Linearity
Check that in residual by predicted plot, the
mean of the residuals for each range of the
predicted values is about zero.
Check that in each residual plot, the mean of the
residuals for each range of the explanatory
variable is about zero.
Constant Variance Check that in the residual by
predicted plot that for each range of the
predicted values, the spread of the residuals is
about the same.
Normality Plot histogram of the residuals.
Check that the histogram is bell shaped.

15
Residual by predicted plot does not suggest and
suggests approximately constant variance
Plot of horsepower vs. residuals suggests
linearity is okay.
Plot of weight vs. residuals suggests linearity
is okay. One potential concern is that highest
weight cars all have negative residuals.
16
Plot of residuals vs. horsepower suggest
linearity is okay. Highest 4 horsepower cars all
have negative residuals but next 5 highest
horsepower cars all have positive residuals.
Plot of residuals vs. seating suggests linearity
is not perfect for seating. Residuals for
small and high seating seem to have a mean that
is smaller than 0.
17
Coefficient of Determination

The coefficient of determination for
multiple regression is defined as for simple
linear regression
Represents percentage of variation in y that is
explained by the multiple regression line.
is between 0 and 1. The closer to 1, the
better the fit of the regression equation to the
data.

18
Assessing Quality of Prediction (Chapter 3.5.3)

R squared measures is a measure of a fit of the
regression to the sample data. It is not
generally considered an adequate measure of the
regressions ability to predict the responses for
new observations.
One method of assessing the ability of the
regression to predict the responses for new
observations is data splitting.
We split the data into a two groups a training
sample and a holdout sample (also called a
validation sample). We fit the regression model
to the training sample and then assess the
quality of predictions of the regression model to
the holdout sample.

19
College Data in collegeclass.JMP