Part II: Inference for MLR - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Part II: Inference for MLR

Description:

So far we have discussed the different parts of the JMP output, ... Occam's razor...'All things being equal, the simplest solution tends to be the best one. ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 44
Provided by: amy53
Category:
Tags: mlr | inference | occams | part

less

Transcript and Presenter's Notes

Title: Part II: Inference for MLR


1
Chapter 9
  • Part II Inference for MLR

2
SLR alternative
  • So far we have discussed the different parts of
    the JMP output, Summary of Fit, Parameter
    Estimates and just barely Analysis of Variance
    (ANOVA from here out)
  • We can use Parameter estimates to get a CI and
    do a Hypothesis test for
  • H0 ß1 0 vs HA ß1 ? 0
  • There is another way to do this using the ANOVA
    table in SLR

3
SLR alternative
  • Fact In a SLR context
  • Under the SLR model, if H0 ß1 0 is true,
  • has a so-called F1, n-2 distribution

4
F-distribution
  • An F-distribution with degrees of freedom ?1 and
    ?2, labeled F(?1, ?2)
  • Table B.6A-E gives some F distribution quantiles

5
F-distribution
  • The F distn quantiles tables are very similar to
    t-tables

6
F-distribution
  • Example UF2,8. Find a so that PUgt 0.05.
  • .95 quantile for F2,8 is 4.46
  • i.e. PF2,8 lt 4.46 0.95 so
  • PF2,8 gt 4.46 0.05
  • Example VF1,6. Find a so that PVgt 0.01.
  • .99 quantile for F1,6 is 13.75
  • i.e. PF1,6 lt 13.75 0.99 so
  • PF1,6 gt 13.75 0.01

7
F-distribution
  • Find the p-value for the following
  • f 4, v1 3, v2 10at the v1 3, v2 10
    spot,
  • Q(.95) 3.71 lt 4 lt 6.55 Q(.99)
  • so .01 lt p-value lt .05
  • f 10, v1 2, v2 20at the v1 2, v2 20
    spot,
  • Q(.999) 9.95 lt 10
  • so .001 gt p-value
  • f 1, v1 8, v2 30at the v1 8, v2 30
    spot,
  • 1 lt 1.37 Q(.75)
  • so p-value gt .25

8
SLR alternative
  • Fact The square of the t-statistic for testing
    H0 ß1 0 is
  • which has an F1,(n-2) distribution if H0 is true
    and tends to be larger if H0 is false
  • i.e. we can use large F as evidence against
    H0 as a sensible testing method

9
SLR alternative
  • These calculations are summarized in the ANOVA
    table
  • ANOVA table from SLR (for testing H0 ß1 0)
  • F is the test statistic for this test and it
    gives the corresponding p-value as well

10
SLR alternative
  • Example Stress/ time till failure
  • x uniaxial stress applied (kg/mm2)
  • y time till fracture (hours)

11
SLR alternative
Analysis of Variance Parameter Estimates
Summary of Fit
12
SLR alternative
  • Stress/time till failure ANOVA table
  • F 13.77 F1,8, p-value 0.006 Reject H0
    and conclude ß1? 0.
  • Using the parameter estimates table yields exact
    same result

13
Multiple Linear Regression Review
  • MLR The term used to describe fitting equations
    with multiple experimental (x) variables
  • i.e. Yi ß0 ß1X1i ß2X2i or Yi ß0
    ß1Xi ß2Xi2
  • The Principle of Least Squares is still used to
    fit such models
  • Minimize
  • Recall hand formulas dont work so we rely on
    JMP to get our least squares estimates.

14
Example
  • Model I yi ß0 ß1xi ei

15
Example
  • Model II yi ß0 ß1xi ß2xi2 ei

16
Example
  • Model III yi ß0 ß1xi ß2xi2 ß3xi3 ei

17
Example
  • Using the output from the three models to write
    out the following models with appropriate
    estimates
  • Model that predicts the mean of y for all
    estimates
  • Polynomial model of degree 1
  • Polynomial model of degree 2
  • Polynomial model of degree 3

18
Example
  • Which model should we use and why?
  • So far, we can only look at residual plots and R2
  • Model 1 R2 0.211 and the residual plot is
    quadratic BAD
  • Model II R2 0.873 and the residual plot is
    randomnot too BAD
  • Model III R2 0.876 and the residual plot is
    random not too bad but slightly worse than Model
    II.

19
Multiple Linear Regression Review
  • Common Models ei iid N(0, s2)
  • Constant Mean
  • Simple Linear Regression
  •  
  • Multiple Linear Regression

20
Multiple Linear Regression Review
  • Example A table in the textbook contains data
    from the operation of a plant for the oxidation
    of ammonia to nitric acid. In plant operation,
    the nitric oxides produced are absorbed in an
    absorption tower. The three experimental
    (predictor/x) variables are x1 (the rate of
    operation of the plant), x2 (cooling water inlet
    temperature), and x3 (acid concentration, which
    is the percent circulating minus 50, times 10).
    The response variable is y (stack loss, which is
    ten times the percentage of ingoing ammonia that
    escapes the absorption column unabsorbed, i.e.,
    an inverse measure of overall plant efficiency
  • Note In any model fitting exercise, the first
    step should be to visualize the data. For
    multiple linear regression models, a good place
    to start is by examining the correlation
    matrix, and all possible bivariate scatterplots.

21
Multiple Linear Regression Review
  • Producing the Correlation
  • Matrix and All Possible
  • Scatterplots in JMP
  • Directions Click Analyze, then
  • Multivariate Methods and finally
  • Multivariate. For each variable,
  • highlight the name of the variable and
  • click Y,Columns. Click OK.

22
Multiple Linear Regression Review
  • If I gave you the following models with their
    respective R2 values, which model would you
    choose?
  • Model R2
  • .950
  • .695 .023
  • .165
  • .973
  • .952 .002 (too .975 small?)

23
Multiple Linear Regression Review
  • What model would you choose?
  • 1st model with R2 .950?
  • 4th model with R2 0.973?
  • 6th model with R2 .975?
  • Is the difference between 4th and 6th significant
    enough to warrant adding an extra term to the
    model?
  • No!
  • From the calculation of R2, adding terms inflates
    R2 slightly regardless of whether it helped in
    the model
  • Check R2adj

24
Multiple Linear Regression Review
  • If I gave you the following models with their
    respective R2 values, which model would you
    choose?
  • Model R2 R2adj
  • .950 .947
  • .695 .674
  • .165 .109
  • .973 .969
  • .952 .945
  • .975 .969

25
Multiple Linear Regression Review
  • When building statistical models, we must be
    careful not to put everything but the kitchen
    sink into the model
  • Occams razorAll things being equal, the
    simplest solution tends to be the best one.
  • Recall with MLR Need to look at residual plots
    for the whole model (observed vs predicted) and
    for each variable individually

26
Inference for MLR
  • If and ,
    then E(yi) 0
  • Note our assumptions for the model dont change
    when you add variables
  • Ie Still need to have ei iid N(0, s2) which can
    still be checked using residual plots
  • From SLR notes, we know how to get a confidence
    interval and perform appropriate hypothesis tests
    for ß0 and ß1. What happens to each parts of the
    JMP output when we add variables and include
    polynomials?

27
Example
  • Model I yi ß0 ß1xi ei

28
Example
  • Model II yi ß0 ß1xi ß2xi2 ei

29
Example
  • Model III yi ß0 ß1xi ß2xi2 ß3xi3 ei

30
Inference for MLR
  • Summary of Fit box (Rsquare, Mean of Response,
    Observations)
  • Nothing changes
  • For MLR, we generally look at R2adj instead of R2
  • Parameter Estimates (Term, Estimate, Std Error,
    t Ratio, Probgtt)
  • Nothing changes, simply add more terms
  • Produces t-test results for every term

31
Inference for MLR
  • What will happen to the ANOVA table? (Source,
    DF, SS, MS, F Ratio)
  • DF change to reflect the number of parameters in
    the model
  • F-test is different depending on the model
  • Rows stay the same, column values/interpretations
    change as the model changes

32
Inference for MLR
  • In generalANOVA for n of observations, p
    of parameters in the model
  • Source DF Sum of Square s Mean Square
    F Ratio
  • Model p 1 SSM (from table) SSM/(p
    1) MSmodel/MSE
  • Error n p SSE (from table)
    SSE/(n p) Prob gt F
  • C.Total n 1 SST (from table)
    p-value
  • Recall p is the number of parameters (number of
    ßs), n is the number of observations)
  • Performs a test for H0 ß1 ß2 ßp 0 vs
    HA at least one ßi ? 0 for i 1, 2, , p

33
Inference for MLR
  • We can use the F-tests to compare two models!
  • What does R2 represent?
  • How do you calculate SSTot?

34
Inference for MLR
  • We can use the F-tests to compare two models!
  • How do you calculate SSE?
  • How do you calculate SSM?

35
Inference for MLR
  • Recall the 3 models compared earliercompare the
    following values
  • Note
  • For all three, models SST stays the same (this is
    because the predicted value is not in this
    equation)
  • As model complexity increases, so does SSM which
    means SSE decreases

36
Inference for MLR
  • Using these ideas, we can test to find a better
    model between two nested models
  • Nested model all the parameters in the previous
    model are contained in the current one along with
    at least one more
  • Example the following are nested models

37
Inference for MLR
  • We can compare the ANOVA tables for the
    polynomial degree 1 and the polynomial degree 2
    models

p - 1 1 so there is 1 non-intercept term, this
is our degree 1 polynomial. It tests H0ß1
0 vs HAß1? 0
p - 1 2 so there are 2 non-intercept terms,
this is our degree 2 polynomial. It tests
H0ß1 ß2 0 vs HA at least one ßi? 0, i
1,2
38
Inference for MLR
  • Example Fill in the missing blanks for the
    ANOVA table for the following model yi ß0
    ß1x1 ß2x2 ei
  • State the hypotheses and interpret the p-value
    for this test

39
Inference for MLR
  • Example Fill in the missing blanks for the
    ANOVA table for the following model yi ß0
    ß1x1 ß2x2 ei
  • From the table, we know n 42, from the model we
    know p 3
  • 1. p 1 2 4. MSM SSM/DFM 10,000
  • 2. DFTotal DFModel 40 DFError 5. MSE
    SSE/DFE 2,000
  • 3. SSE SST SSM 80,000 6. F MSM/MSE 5

40
Inference for MLR
  • State the hypotheses that this F-ratio is test
    and find the p-value.
  • H0 ß1 ß2 0 vs. HA at least one ßi ?0 for
    i 1,2
  • This F-Ratio follows an F2,40 distribution so
    the p-value is
  • P F2,40 gt 5
  • Q(.95) 3.23 lt 5 lt 5.18 Q(.99)
  • .01 lt p-value lt .05

41
Inference for MLR
  • In general, when we compare two nested models,
    the format goes
  • Note FTR H0 means choose the reduced model,
    Reject H0 means choose the full model

42
Inference for MLR
  • Use the polynomial degree 2 and degree 3 models
    from before to test these two nested models
  • Models
  • Hypotheses
  • Other important terms
  • SSEfull 683,453. pfull 4 dffull 11
  • SSEred 702,153.0 pred 3 dfred 12
  • n 15

Reduced model Full model
Extra term
43
Inference for MLR
  • Calculate the test statistic
  • Determine PFv1,v2 gt f and state the appropriate
    conclusion
  • PF1,11 gt .3 so p-value gt 0.25
  • With a large p-value (larger than a 0.05), we
    FTR H0 and conclude that x3 is not useful to add
    to the model so we should go with the reduced
    model
Write a Comment
User Comments (0)
About PowerShow.com