TESTING THE STRENGTH - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

TESTING THE STRENGTH

Description:

Test 1: Are Any of the x's Useful in Predicting y? ... Miles per gallon vs. horsepower and engine size. Salary vs. GPA and GPA in major ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 26
Provided by: JohnLa
Category:

less

Transcript and Presenter's Notes

Title: TESTING THE STRENGTH


1
  • TESTING THE STRENGTH
  • OF THE
  • MULTIPLE REGRESSION MODEL

2
Test 1 Are Any of the xs Useful in Predicting
y?
  • We are asking Can we conclude at least one of
    the ?s (other than ?0) ? 0?
  • H0 ?1 ?2 ?3 ?4 0
  • HA At least one of these ?s ? 0
  • ? .05

3
Idea of the Test
  • Measure the overall average variability due to
    changes in the xs
  • Measure the overall average variability that is
    due to randomness (error)
  • If the overall average variability due to
    changes in the xs IS A LOT LARGER than average
    variability due to error, we conclude at least ?
    is non-zero, i.e. at least one factor (x) is
    useful in predicting y

4
Total Variability
  • Just like with simple linear regression we have
    total sum of squares due to regression SSR , and
    total sum of squares due to error, SSE, which are
    printed on the EXCEL output.
  • The formulas are a more complicated (they involve
    matrix operations)

5
Average Variability
  • Average variability (Mean variability) for a
    group is defined as the Total Variability divided
    by the degrees of freedom associated with that
    group
  • Mean Squares Due to Regression
  • MSR SSR/DFR
  • Mean Squares Due to Error
  • MSE SSE/DFE

6
Degrees of Freedom
  • Total number of degrees of freedom DF(Total)
    always n-1
  • Degrees of freedom for regression (DFR) the
    number of factors in the regression (i.e. the
    number of xs in the linear regression)
  • Degrees of freedom for error (DFE) difference
    between the two DF(Total) -DFR

7
The F-Statistic
  • The F-statistic is defined as the ratio of two
    measures of variability. Here,
  • Recall we are saying if MSR is large compared
    to MSE, at least one ß ? 0.
  • Thus if F is large, we draw the conclusion is
    that HA is true, i.e. at least one ß ? 0.

8
The F-test
  • Large compared to what?
  • F-tables give critical values for given values of
    ?
  • TEST REJECT H0 (Accept HA) if
  • F MSR/MSE gt F?,DFR,DFE

9
RESULTS
  • If we do not get a large F statistic
  • We cannot conclude that any of the variables in
    this model are significant in predicting y.
  • If we do get a large F statistic
  • We can conclude at least one of the variables is
    significant for predicting y .
  • NATURAL QUESTION --
  • WHICH ONES?

10
(No Transcript)
11
(No Transcript)
12
Results
  • We see that the F statistic is 20.89856
  • This would be compared to F.05,3,34
  • From the F.05 Table, the value of F.05,3,34 is
    not given.
  • But F.05,3,30 2.92 and F.05,3,40 2.84.
  • And 20.89856 gt either of these numbers.
  • The actual value of F.05,3,34 can be calculated
    by Excel by FINV(.05,3,34) 2.882601
  • USE SIGNIFICANCE F
  • This is the p-value for the F-Test
  • Significance F 7.46 x 10-8 .0000000746 lt .05
  • Can conclude that at least one x is useful in
    predicting y

13
Test 2 Which Variables Are Significant IN THIS
MODEL?
  • The question we are asking is, taking all the
    other factors (xs) into consideration, does a
    change in a particular x (x3, say) value
    significantly affect y.
  • This is another hypothesis test (a t-test).
  • To test if the age of the house is significant
  • H0 ?3 0 (x3 is not significant in this
    model)
  • HA ?3 ? 0 (x3 is significant in this model)

14
The t-test for a particular factor IN THIS MODEL
  • Reject H0 (Accept HA) if

15
(No Transcript)
16
Reading Printout for the t-test
  • Simply look at the p-value
  • p-value for ?3 0 is .021931 lt .05
  • Thus the age of the house is significant in this
    model
  • The other variables
  • p-value for ?1 0 is .0000839 lt .05
  • Thus square feet is significant in this model
  • p-value for ?2 0 is .15501 gt .05
  • Thus the land (acres) is not significant in this
    model

17
Does A Poor t-value Imply the Variable is not
Useful in Predicting y?
  • NO
  • It says the variable is not significant IN THIS
    MODEL when we consider all the other factors.
  • In this model land is not significant when
    included with square footage and age.
  • But if we would have run this model without
    square footage we would have gotten the output on
    the next slide.

18
(No Transcript)
19
Can it even happen that F says at least one
variable is significant, but none of the ts
indicate a useful variable?
  • YES
  • EXAMPLES IN WHICH THIS MIGHT HAPPEN
  • Miles per gallon vs. horsepower and engine size
  • Salary vs. GPA and GPA in major
  • Income vs. age and experience
  • HOUSE PRICE vs. SQUARE FOOTAGE OF HOUSE AND LAND
  • There is a relation between the xs
  • Multicollinearity

20
Approaches That Could Be Used When
Multicollinearity Is Detected
  • Eliminate some variables and run again
  • Stepwise regression
  • This is discussed in a future module.

21
Test 3 --What Proportion of the Overall
Variability in y Is Due to Changes in the xs?
  • R2
  • R2 .442199
  • Overall 44 of the total variation in sales price
    is explained by changes in square footage, land,
    and age of the house.

22
What is Adjusted R2?
  • Adjusted R2 adjusts R2 to take into account
    degrees of freedom.
  • By assuming a higher order equation for y, we can
    force the curve to fit this one set of data
    points in the model eliminating much of the
    variability (See next slide).
  • But this is not what is going on!
  • R2 might be higher but adjusted R2 might be
    much lower
  • Adjusted R2 takes this into account
  • Adjusted R2 1-MSE/SST

23
Scatterplot
24
Review
  • Are any of the xs useful in predicting y IN THIS
    MODEL
  • Look at p-value for F-test Significance F
  • F MSR/MSE would be compared to F?,DFR,DFE
  • Which variables are significant in this model?
  • Look at p-values for the individual t-tests
  • What proportion of the total variance in y can be
    explained by changes in the xs?
  • R2
  • Adjusted R2 takes into account the reduced
    degrees of freedom for the error term by
    including more terms in the model

25
4 Places to Look on Excel Printout
Write a Comment
User Comments (0)
About PowerShow.com