Multiple Regression - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Multiple Regression

Description:

x01, x02, ..., x0k are specified values of the independent predictor variables x1, x2, ..., xk. ... For example, might want to include the gender of respondents ... – PowerPoint PPT presentation

Number of Views:271
Avg rating:3.0/5.0
Slides: 32
Provided by: mcgrawhill
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression


1
Multiple Regression
  • 12.1 The Linear Regression Model and Assumptions
  • 12.2 The Least Squares Estimates and Prediction
  • 12.3 The Mean Squared Error and the Standard
    Error
  • 12.4 Model Utility R2 and Adjusted R2
  • 12.5 The Overall F-Test
  • 12.6 Testing Significance of Independent
    Variables
  • 12.10 Dummy Variables

2
Multiple Regression
  • One independent variable may not be sufficient to
    adequately explain the variation in our dependent
    variable.
  • We may have to include more than one independent
    variable in the model.
  • There is a separate slope coefficient for each
    independent variable
  • We can use the new multiple regression model to
    do predictions on the dependent variable, Y

3
12.1 The Linear Regression Model
The linear regression model relating y to x1, x2,
, xk is
where
4
The Regression Model Assumptions
Assumptions about the model error terms,
?s Mean Zero The mean of the error terms is
equal to 0. Constant Variance The variance of
the error terms s2 is, the same for every
combination values of x1, x2, , xk. Normality
The error terms follow a normal distribution for
every combination values of x1, x2, ,
xk. Independence The values of the error terms
are statistically independent of each other.
5
12.2 Least Squares Estimates and Prediction
Estimation/Prediction Equation
is the point estimate of the mean value of the
dependent variable when the values of the
independent variables are x01, x02, , x0k. It
is also the point prediction of an individual
value of the dependent variable when the values
of the independent variables are x01, x02, ,
x0k.
b1, b2, , bk are the least squares point
estimates of the parameters ?1, ? 2, , ?
k. x01, x02, , x0k are specified values of the
independent predictor variables x1, x2, , xk.
6
Example The Linear Regression Model
Example 12.1 The Fuel Consumption Case
7
(No Transcript)
8
(No Transcript)
9
Multiple Regression Fuel Consumption based on
Temperature Chill Index
Example 12.3 The Fuel Consumption Case
Minitab Output FuelCons 13.1 - 0.0900 Temp
0.0825 Chill Predictor Coef StDev
T P Constant 13.1087
0.8557 15.32 0.000 Temp -0.09001
0.01408 -6.39 0.001 Chill
0.08249 0.02200 3.75 0.013 S
0.3671 R-Sq 97.4 R-Sq(adj)
96.3 Analysis of Variance Source DF
SS MS F
P Regression 2 24.875 12.438
92.30 0.000 Residual Error 5 0.674
0.135 Total 7
25.549 Predicted Values (Temp 40, Chill 10)
Fit StDev Fit 95.0 CI
95.0 PI 10.333 0.170 ( 9.895,
10.771) ( 9.293, 11.374)
10
Example Point Predictions and Residuals
Example 12.3 The Fuel Consumption Case
11
Interpreting Slope Estimates in a Multiple
Regression Model
Similar to simple regression but we must consider
the other explanatory variables in the model
being held constant
Temp For a given chill index, the mean weekly
fuel consumption is expected to decrease by 0.09
units for each additional degree F rise in
temperature.
12
Interpreting Slope Estimates in a Multiple
Regression Model
Chill For a given average outside temperature,
the mean weekly fuel consumption is expected to
increase by 0.083 units for each unit increase in
chill index
13
12.3 Mean Square Error and Standard Error
Sum of Squared Errors
Mean Square Error, point estimate of residual
variance s2
Standard Error, point estimate of residual
standard deviation s
Example 12.3 The Fuel Consumption Case
Analysis of Variance Source DF SS
MS F P Regression 2
24.875 12.438 92.30 0.000 Residual Error
5 0.674 0.135 Total 7 25.549
14
12.4 Model Utility Multiple Coefficient of
Determination, R²
R2 is the proportion of the total variation in y
explained by the linear regression model
15
The Adjusted R2
  • Adding an independent variable to multiple
    regression will raise R2
  • R2 will rise slightly even if the new variable
    has no relationship to y
  • The adjusted R2 corrects this tendency in R2
  • As a result, it gives a better estimate of the
    importance of the independent variables

16
Comparison with Simple Coefficient of
Determination
The simple coefficient of determination r2 is
r2 is the proportion of the total variation in y
explained by the simple linear regression model
17
Comparison with Simple Correlation Coefficient
The simple correlation coefficient measures the
strength of the linear relationship between y and
x and is denoted by r.
Where, b1 is the slope of the least squares line.
18
12.5 Model Utility F Test for Multiple
Regression Model. Are any Variables Useful?
To test H0 ?1 ?2 ?k 0 versus Ha At
least one of the ?1, ?2, , ?k is not equal to
0 Sometimes referred to as the Global F-test
Test Statistic
Reject H0 in favor of Ha if F(model) gt Fa or
p-value lt a Fa is based on k numerator and
n-(k1) denominator degrees of freedom.
19
Example F Test for Linear Regression
Example 12.5 The Fuel Consumption Case Minitab
Output
Analysis of Variance Source DF SS
MS F P Regression 2
24.875 12.438 92.30 0.000 Residual Error
5 0.674 0.135 Total 7 25.549
Test Statistic
20
12.6 Testing Significance of the Independent
Variable Which ones are significant?
If the regression assumptions hold, we can reject
H0 ?j 0 at the ? level of significance
(probability of Type I error equal to ?) if and
only if the appropriate rejection point condition
holds or, equivalently, if the corresponding
p-value is less than ?.
Alternative
Reject H0 if
p-Value
Test Statistic
100(1-?) Confidence Interval for ?j
t?, t?/2 and p-values are based on n (k1)
degrees of freedom.
21
Example Testing and Estimation for ?s
Example 12.6 The Fuel Consumption Case
Minitab Output Predictor Coef StDev
T P Constant 13.1087
0.8557 15.32 0.000 Temp -0.09001
0.01408 -6.39 0.001 Chill
0.08249 0.02200 3.75 0.013
Test
Interval
Chill is significant at the ? 0.05 level, but
not at ? 0.01
t?, t?/2 and p-values are based on 5 degrees of
freedom.
22
12.7 Confidence and Prediction Intervals in
Simple Regression Compared to Multiple
If the regression assumptions hold,
100(1 - a) confidence interval for the mean
value of y, myxo
100(1 - a) prediction interval for an individual
value of y
ta/2 is based on n-2 degrees of freedom
23
Example C.I. P.I. in Simple Regression
Example 11.7 The Fuel Consumption Case Minitab
Output (predicted FuelCons when Temp, x
40) Predicted Values Fit StDev Fit
95.0 CI 95.0 PI 10.721 0.241
( 10.130, 11.312) ( 9.014, 12.428)
24
C.I. P.I. in Multiple Regression
Prediction
If the regression assumptions hold,
100(1 - a) confidence interval for the mean
value of y
100(1 - a) prediction interval for an individual
value of y
(Distance value requires matrix algebra provided
in MegaStat output)
ta/2 is based on n-(k1) degrees of freedom
25
Example C.I. P.I. in Multiple Regression
Example 12.9 The Fuel Consumption Case Minitab
Output FuelCons 13.1 - 0.0900 Temp 0.0825
Chill Predicted Values (Temp 40, Chill 10)
Fit StDev Fit 95.0 CI 95.0 PI
10.333 0.170 (9.895, 10.771)
(9.293,11.374)
95 Confidence Interval
95 Prediction Interval
26
12.10 Using Dummy Variables toModel Qualitative
Independent Variables
Part 3
  • So far, we have only looked at including
    quantitative data in a regression model
  • However, we may wish to include descriptive
    qualitative data as well
  • For example, might want to include the gender of
    respondents
  • We can model the effects of different levels of a
    qualitative variable by using what are called
    dummy variables
  • Also known as indicator variables

27
How to Construct Dummy Variables
  • A dummy variable always has a value of either 0
    or 1
  • For example, to model sales at two locations,
    would code the first location as a zero and the
    second as a 1
  • Operationally, it does not matter which is coded
    0 and which is coded 1

28
What If We Have More Than TwoCategories?
  • Consider having three categories, say A, B, and C
  • Cannot code this using one dummy variable
  • A0, B1, and C2 would be invalid
  • Assumes the difference between A and B is the
    same as B and C
  • Must use multiple dummy variables
  • Specifically, a categories requires a-1 dummy
    variables
  • For A, B, and C, would need two dummy variables
  • x1 is 1 for A, zero otherwise
  • x2 is 1 for B, zero otherwise
  • If x1 and x2 are zero, must be C
  • This is why the third dummy variable is not needed

29
12.10 Dummy Variables
Example 12.11 The Electronics World Case
Code 0 for the category you wish to be the
reference
30
Example Regression with a Dummy Variable
Example 12.11 The Electronics World Case
Minitab Output Sales 17.4 0.851 Households
29.2 DM Predictor Coef StDev
T P Constant 17.360 9.447
1.84 0.109 Househol 0.85105
0.06524 13.04 0.000 Mall 29.216
5.594 5.22 0.001 S 7.329
R-Sq 98.3 R-Sq(adj) 97.8 Analysis of
Variance Source DF SS
MS F P Regression 2
21412 10706 199.32 0.000 Residual
Error 7 376 54 Total
9 21788
31
Interpreting Slope Estimates for Dummy Variables
Since variable values are limited to 0 and 1 we
cannot refer to a unit increase. We use change
in mean Y value when X 1 compared to when X 0
Mall For a given number of households, the mean
sales volume for stores in a mall location is
expected to be 29.22 units higher than that for
stores in other (street) locations.
Write a Comment
User Comments (0)
About PowerShow.com