Stat%20112:%20Lecture%207%20Notes - PowerPoint PPT Presentation

About This Presentation
Title:

Stat%20112:%20Lecture%207%20Notes

Description:

Plot of horsepower vs. residuals. suggests linearity is okay. Plot of ... highest horsepower cars all have. positive residuals. Coefficient of Determination ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 20
Provided by: dsma3
Category:

less

Transcript and Presenter's Notes

Title: Stat%20112:%20Lecture%207%20Notes


1
Stat 112 Lecture 7 Notes
  • Homework 2 Due next Thursday
  • The Multiple Linear Regression model (Chapter
    4.1)
  • Inferences from multiple regression analysis
    (Chapter 4.2)

2
Interpretation of Regression Coefficients
  • Gas mileage regression from Car89.JMP

3
Partial Slopes vs. Marginal Slopes
  • Multiple Linear Regression Model
  • The coefficient is a partial slope. It
    indicates the change in the mean of y that is
    associated with a one unit increase in while
    holding all other variables
    fixed.
  • A marginal slope is obtained when we perform a
    simple regression with only one X, ignoring all
    other variables. Consequently the other
    variables are not held fixed.

4
Partial vs. Marginal Slopes Example
5
Partial Slopes vs. Marginal Slopes Another
Example
  • In order to evaluate the benefits of a proposed
    irrigation scheme in a certain region, suppose
    that the relation of yield Y to rainfall R is
    investigated over several years.
  • Data is in rainfall.JMP.

6
(No Transcript)
7
Higher rainfall is associated with lower
temperature.
8
Rainfall is estimated to be beneficial once
temperature is held fixed.
Multiple regression provides a better picture of
the benefits of an irrigation scheme because
temperature would be held fixed in an irrigation
scheme.
9
Inferences about Regression Coefficients
  • Confidence intervals
    confidence interval for
  • Degrees of freedom for t equals n-(K1).
    Standard error of , , found on JMP
    output.
  • Hypothesis Test
  • Decision rule for test Reject H0 if
    or
  • where
  • p-value for testing is
    printed in JMP output under Probgtt.

10
Inference Examples
  • Find a 95 confidence interval for ?
  • Is seating of any help in predicting gas mileage
    once horsepower, weight and cargo have been taken
    into account? Carry out a test at the 0.05
    significance level.

11
(No Transcript)
12
Checking Assumptions
Multiple Linear Regression Model

  • The expected value of the disturbances is zero
    for each ,
  • The variance of each is equal to ,i.e.,
  • The are normally distributed.
  • The are independent.

13
Plots for Checking Assumptions
  • We can construct residual plots of each
    explanatory variable Xk vs. the residuals.
  • We save the residuals by clicking the red
    triangle next to Response after fitting the model
    and clicking Save Columns and then residuals. We
    then plot Xk vs. the residuals using Fit Y by X
    (where Ythe residuals). We can plot a
    horizontal line at 0 by using Fit Y by X (it is a
    property of multiple linear regression that the
    least squares line for the regression of the
    residuals on any Xk is a horizontal line.
  • A useful summary of the residual plots for each
    explanatory variable is the Residual by Predicted
    plot that is automatically plotted after using
    Fit Model. The residual by predicted plot is a
    plot of the predicted values
    , , vs. the
    residuals

14
Checking Assumptions
  • Linearity
  • Check that in residual by predicted plot, the
    mean of the residuals for each range of the
    predicted values is about zero.
  • Check that in each residual plot, the mean of the
    residuals for each range of the explanatory
    variable is about zero.
  • Constant Variance Check that in the residual by
    predicted plot that for each range of the
    predicted values, the spread of the residuals is
    about the same.
  • Normality Plot histogram of the residuals.
    Check that the histogram is bell shaped.

15
Residual by predicted plot does not suggest and
suggests approximately constant variance
Plot of horsepower vs. residuals suggests
linearity is okay.
Plot of weight vs. residuals suggests linearity
is okay. One potential concern is that highest
weight cars all have negative residuals.
16
Plot of residuals vs. horsepower suggest
linearity is okay. Highest 4 horsepower cars all
have negative residuals but next 5 highest
horsepower cars all have positive residuals.
Plot of residuals vs. seating suggests linearity
is not perfect for seating. Residuals for
small and high seating seem to have a mean that
is smaller than 0.
17
Coefficient of Determination
  • The coefficient of determination for
    multiple regression is defined as for simple
    linear regression
  • Represents percentage of variation in y that is
    explained by the multiple regression line.
  • is between 0 and 1. The closer to 1, the
    better the fit of the regression equation to the
    data.

18
Assessing Quality of Prediction (Chapter 3.5.3)
  • R squared measures is a measure of a fit of the
    regression to the sample data. It is not
    generally considered an adequate measure of the
    regressions ability to predict the responses for
    new observations.
  • One method of assessing the ability of the
    regression to predict the responses for new
    observations is data splitting.
  • We split the data into a two groups a training
    sample and a holdout sample (also called a
    validation sample). We fit the regression model
    to the training sample and then assess the
    quality of predictions of the regression model to
    the holdout sample.

19
College Data in collegeclass.JMP
  • Training Sample 40 observations.
  • Holdout Sample Last 10 observations.
  • Mean Squared Deviation Mean squared prediction
    error over the holdout sample
  • over the n2 (10 here)
    observations
  • in the holdout sample.
Write a Comment
User Comments (0)
About PowerShow.com