Title: TESTING THE STRENGTH
1- TESTING THE STRENGTH
- OF THE
- MULTIPLE REGRESSION MODEL
2Test 1 Are Any of the xs Useful in Predicting
y?
- We are asking Can we conclude at least one of
the ?s (other than ?0) ? 0? - H0 ?1 ?2 ?3 ?4 0
- HA At least one of these ?s ? 0
- ? .05
3Idea of the Test
- Measure the overall average variability due to
changes in the xs - Measure the overall average variability that is
due to randomness (error) - If the overall average variability due to
changes in the xs IS A LOT LARGER than average
variability due to error, we conclude at least ?
is non-zero, i.e. at least one factor (x) is
useful in predicting y
4Total Variability
- Just like with simple linear regression we have
total sum of squares due to regression SSR , and
total sum of squares due to error, SSE, which are
printed on the EXCEL output. - The formulas are a more complicated (they involve
matrix operations)
5Average Variability
- Average variability (Mean variability) for a
group is defined as the Total Variability divided
by the degrees of freedom associated with that
group - Mean Squares Due to Regression
- MSR SSR/DFR
- Mean Squares Due to Error
- MSE SSE/DFE
-
6Degrees of Freedom
- Total number of degrees of freedom DF(Total)
always n-1 - Degrees of freedom for regression (DFR) the
number of factors in the regression (i.e. the
number of xs in the linear regression) - Degrees of freedom for error (DFE) difference
between the two DF(Total) -DFR
7The F-Statistic
- The F-statistic is defined as the ratio of two
measures of variability. Here, - Recall we are saying if MSR is large compared
to MSE, at least one ß ? 0. - Thus if F is large, we draw the conclusion is
that HA is true, i.e. at least one ß ? 0.
8The F-test
- Large compared to what?
- F-tables give critical values for given values of
? - TEST REJECT H0 (Accept HA) if
- F MSR/MSE gt F?,DFR,DFE
9RESULTS
- If we do not get a large F statistic
- We cannot conclude that any of the variables in
this model are significant in predicting y. - If we do get a large F statistic
- We can conclude at least one of the variables is
significant for predicting y . - NATURAL QUESTION --
- WHICH ONES?
10(No Transcript)
11(No Transcript)
12Results
- We see that the F statistic is 20.89856
- This would be compared to F.05,3,34
- From the F.05 Table, the value of F.05,3,34 is
not given. - But F.05,3,30 2.92 and F.05,3,40 2.84.
- And 20.89856 gt either of these numbers.
- The actual value of F.05,3,34 can be calculated
by Excel by FINV(.05,3,34) 2.882601 - USE SIGNIFICANCE F
- This is the p-value for the F-Test
- Significance F 7.46 x 10-8 .0000000746 lt .05
- Can conclude that at least one x is useful in
predicting y
13Test 2 Which Variables Are Significant IN THIS
MODEL?
- The question we are asking is, taking all the
other factors (xs) into consideration, does a
change in a particular x (x3, say) value
significantly affect y. - This is another hypothesis test (a t-test).
- To test if the age of the house is significant
- H0 ?3 0 (x3 is not significant in this
model) - HA ?3 ? 0 (x3 is significant in this model)
14The t-test for a particular factor IN THIS MODEL
15(No Transcript)
16Reading Printout for the t-test
- Simply look at the p-value
- p-value for ?3 0 is .021931 lt .05
- Thus the age of the house is significant in this
model - The other variables
- p-value for ?1 0 is .0000839 lt .05
- Thus square feet is significant in this model
- p-value for ?2 0 is .15501 gt .05
- Thus the land (acres) is not significant in this
model
17Does A Poor t-value Imply the Variable is not
Useful in Predicting y?
- NO
- It says the variable is not significant IN THIS
MODEL when we consider all the other factors. - In this model land is not significant when
included with square footage and age. - But if we would have run this model without
square footage we would have gotten the output on
the next slide.
18(No Transcript)
19Can it even happen that F says at least one
variable is significant, but none of the ts
indicate a useful variable?
- YES
- EXAMPLES IN WHICH THIS MIGHT HAPPEN
- Miles per gallon vs. horsepower and engine size
- Salary vs. GPA and GPA in major
- Income vs. age and experience
- HOUSE PRICE vs. SQUARE FOOTAGE OF HOUSE AND LAND
- There is a relation between the xs
- Multicollinearity
20Approaches That Could Be Used When
Multicollinearity Is Detected
- Eliminate some variables and run again
- Stepwise regression
- This is discussed in a future module.
21Test 3 --What Proportion of the Overall
Variability in y Is Due to Changes in the xs?
- R2
- R2 .442199
- Overall 44 of the total variation in sales price
is explained by changes in square footage, land,
and age of the house. -
22What is Adjusted R2?
- Adjusted R2 adjusts R2 to take into account
degrees of freedom. - By assuming a higher order equation for y, we can
force the curve to fit this one set of data
points in the model eliminating much of the
variability (See next slide). - But this is not what is going on!
- R2 might be higher but adjusted R2 might be
much lower - Adjusted R2 takes this into account
- Adjusted R2 1-MSE/SST
23Scatterplot
24Review
- Are any of the xs useful in predicting y IN THIS
MODEL - Look at p-value for F-test Significance F
- F MSR/MSE would be compared to F?,DFR,DFE
- Which variables are significant in this model?
- Look at p-values for the individual t-tests
- What proportion of the total variance in y can be
explained by changes in the xs? - R2
- Adjusted R2 takes into account the reduced
degrees of freedom for the error term by
including more terms in the model
254 Places to Look on Excel Printout