Multiple Regression - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Multiple Regression

Description:

Comparing Models. As always, we have to ask 'Is the decrease in SSE unusually large? ... Comparing Models. Recall that the SD of the ... Comparing Models ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 46
Provided by: johnt1
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression


1
Multiple Regression
  • In multiple regression, we consider the response,
    y, to be a function of more than one predictor
    variable, x1, , xk
  • Easiest to express in terms of matrices

2
Multiple Regression
  • Let Y be a col vector whose rows are the
    observations of the response
  • Let X be a matrix.
  • First col of X contains all 1s
  • Other cols contain the observations of the other
    predictor vars

3
Multiple Regression
  • Y X b e
  • B is a col vector of coefficients
  • E is a col vector of the normal errors
  • Matlab does matrices very well, but you MUST
    watch the sizes and orders when you multiply

4
Multiple Regression
  • A lot of MR is similar to simple regression
  • BX\Y gives coefficients
  • YHXB gives fitted values
  • SSE (y-yh) (y-yh)
  • SSR (yavg-yh) (yavg-yh)
  • SST (y-yavg) (y-yavg)

5
Multiple Regression
  • We can set up the ANOVA table
  • Df for Regr vars
  • (So df1 for simple regr)
  • F test is as before
  • R2 is as before, with same interpretation

6
Matrix Formulation
  • Instead of X\Y, we can solve the equations for B
  • B (XX)-1 XY
  • We saw things like XX before as sum of squares
  • Because of the shape of X, XX is square, so it
    makes sense to use its inverse
  • (Actually, XX is always square)

7
Matrix Formulation
  • When we consider the coefficients, we not only
    have variances (SDs), but the relationship
    between coeffs
  • The sd of a coefficient is the corresponding
    diagonal element of est(SD) ?(XX)-1
  • We can use this to get conf int for coefficients
    (and other information)

8
Matrix Formulation
  • Exercises
  • 1. Compute coeffs, ANOVA and SD(coeff) for
    Fig13.11, p 608 where time f(vol, wt, shift).
    Find PV for testing B10. Find 95 confidence
    interval for B1.
  • 2. Repeat for Fig 13.17, p. 613 where yrun time
  • 3. Repeat for DS13.2.1, p 614 where y sales
    volume

9
Comparing Models
  • Suppose we have two models in mind
  • 1 uses a set of predictors
  • 2 includes 1, but has extra variables
  • SSE for 2 is never greater than SSE for 1
  • We can always consider a model for 2 which has
    zeros for the new coefficients

10
Comparing Models
  • As always, we have to ask Is the decrease in SSE
    unusually large?
  • Suppose that model1 has p variables and model2
    has pk models
  • SSE1 is Chi2 with dfN-p
  • SSE2 is Chi2 with dfN-(pk)
  • Then SSE1-SSE2 is Chi2 with dfk

11
Comparing Models
  • Partial F (SSE1-SSE2)/k / MSE2
  • Note that numerator is Chi2 divided by df
  • Denominator is MS for model with more variables
  • Note that subtraction is larger smaller

12
Comparing Models
  • Consider Fig 13.11 on p. 608
  • Yunloading time
  • Model1 Xvolume
  • F203, PV very small
  • Model2 Xvolume and wt
  • F96 and PV still very small

13
Comparing Models
  • SSE1215.6991
  • SSE2215.1867
  • So Model2 is better (smaller SSE), but only
    trivially
  • MSE212.6580
  • Partial F 0.0405
  • So the decrease in SSE is not significant at all,
    even though Model2 is significant
  • (In part because Model2 includes Model1)

14
Comparing Models
  • Recall that the SD of the coefficients can be
    found
  • The sd of a coefficient is the corresponding
    diagonal element of est(SD) ?(XX)-1
  • In Model2 of the example, the 3rd diagonal
    element 0.0030
  • SD of coeff3 0.1961
  • Coeff3/SD 0.2012 SD the coefficient is away
    from zero
  • (0.2012)2 0.0405 Partial F

15
Comparing Models
  • Suppose we have a number of variables to choose
    from
  • What set of variables should we use?
  • Several approaches
  • Stepwise regression
  • Step up or step down
  • Either start from scratch and add variables or
    start with all variables and delete variables

16
Comparing Models
  • Step up method
  • Fit regression using each variable on its own
  • If the best model (smallest SSE or largest F) is
    significant, then continue
  • Using the variable identified at step1, add all
    other variables one at a time
  • Of all these models, consider the one with
    smallest SSE (or largest F)
  • Compute partial F to see if this model is better
    than the single variable

17
Comparing Models
  • We can continue until the best variable to add
    does not have a significant partial F
  • To be complete, after we have added a variable,
    we should check to be sure that all the variables
    in the model are still needed

18
Comparing Models
  • One by one, drop each other variable from the
    model
  • Compute partial F
  • If partial F is small, then we can drop this
    variable
  • After all variables have been dropped that can
    be, we can resume adding variables

19
Comparing Models
  • Recall that when adding single variables, we can
    find partial F by squaring coeff/SD(coeff)
  • So, for single variables, we dont need to
    compute a large number of models because the
    partial Fs can be computed in one step from the
    larger model

20
Other Models
  • Because it allows for multiple predictors, MLR
    is very flexible
  • We can fit polynomials by including not only X,
    but other columns for powers of X

21
Other Models
  • Consider Fig13.7, p 605
  • Yield is a fn of Temp
  • Model1 Temp only
  • F162, highly significant
  • Model2 Temp and Temp2
  • F326
  • Partial F 29.46
  • Conclude that the quadratic model is
    significantly better than the linear model

22
Other Models
  • Would a cubic model work better?
  • Partial F 4.5e-4
  • So cubic model is NOT preferred

23
Other Models
  • Taylors Theorem
  • Continuous functions are approximately
    polynomials
  • In Calc, we started with the function and used
    the fact that the coefficients are related to the
    derivatives
  • Here, we do not know the function, but can find
    (estimate) the coefficients

24
Other Models
  • Consider Fig13.15, p. 611
  • If yf(water, fertilizer), then Flt1
  • Plot y vs each variable
  • VERY linear fn of water
  • Somewhat quadratic fn of fertilizer
  • Consider a quadratic fn of both (and product)

25
Other Models
  • F is about 17 and pv is near 0
  • All partial Fs are large, so should keep all
    terms in model
  • Look at coeffs
  • Quadratics are neg, so the surface has a local
    max

26
Other Models
  • Solve for max response
  • Water6.3753, Fert11.1667
  • Which is within the range of values, but in the
    lower left corner
  • (1) We can find a confidence interval on where
    the max occurs
  • (2) Because of the cross product term, the
    optimal fertilizer varies with water

27
Other Models
  • Exercises
  • 1. Consider Fig13.48, p 721. (Fix the line that
    starts 24.2. The 3rd col should be 10.6.) Is
    there any evidence of a quadratic relation?
  • 2. Consider Fig13.49, p. 721. Fit the response
    model. Comment. Plot y vs yh. What is the est SD
    of the residuals?

28
Indicator Variables
  • For simple regression, if we used an indicator
    variable, we were doing a 2 sample t test
  • We can use indicator variables and multiple
    regression to do ANOVA

29
Indicator Variables
  • Return to Fig11.4 on blood flow
  • Do indicators by
  • for i1max(ndx),
  • y(,i)(ndxi)end
  • VERY IMPORTANT
  • If you are going to use the intercept, then you
    must leave out one column of the indicators
    (usually the last col)

30
Indicator Variables
  • F is the same for regression as for ANOVA
  • The intercept is the avg of the group that was
    left out of indicators
  • The other coefficients are the differences
    between their avg and the intercept

31
Indicator Variables
  • Exercises
  • Compare the sumstats approach and the regression
    approach for Fig11.4, Fig11.5 on p. 488, 489

32
Other ANOVA
  • Why bother with a second way to solve a problem
    we already can solve?
  • The regression approach works easily for other
    problems
  • But note that we cannot use regression approach
    on summary stats

33
Other ANOVA
  • Two-way ANOVA
  • Want to compare Treatments, but the data has
    another component that we want to control for
  • Called Blocks from the origin in agriculture
    testing

34
Other ANOVA
  • So we have 2 category variables, one for
    Treatment and one for Blocks
  • Set up indicators for both and use all these for
    X
  • Omit one column from each set

35
Other ANOVA
  • We would like to separate the Treatment effect
    from the Block effect
  • Use partial F
  • ANOVA table often includes the change in SS
    separately for Treatment and Blocks

36
Other ANOVA
  • Consider Fig 14.4 on p 640
  • 3 machines and 4 solder methods
  • Problem doesnt tell us which is Treatment and
    which is Blocks, so well let machines be
    Treatments

37
Other ANOVA
  • gtgt xi1 i2
  • gtgt b,f,pv,aov,invxtxmultregr(x,y)aov
  • aov
  • 60.6610 5.0000 12.1322 13.9598
  • 26.0725 30.0000 0.8691 0.2425
  • This is for both sets of indicators

38
Other ANOVA
  • For just machine
  • gtgt xi1
  • gtgt b,f,pv,aov,invxtxmultregr(x,y)aov
  • aov
  • 1.8145 2.0000 0.9073 0.3526
  • 84.9189 33.0000 2.5733 0.5805
  • Change in SS is 60.6610- 1.8145 when we use
    Solder as well

39
Other ANOVA
  • For just solder
  • gtgt xi2
  • gtgt b,f,pv,aov,invxtxmultregr(x,y)aov
  • aov
  • 58.8465 3.0000 19.6155 22.5085
  • 27.8870 32.0000 0.8715 0.1986
  • SSR is the same as the previous difference
  • If we list them separately, then we use SSE for
    model with both vars so that it will properly add
    up to SSTotal

40
Interaction
  • The effect of Solder may not be the same for each
    Machine
  • This is called interaction where a combination
    may not be the sum of the parts
  • We can measure interaction by using a product of
    the indicator variables
  • Need all possible products (23 in this case)

41
Interaction
  • Including interaction
  • gtgt b,f,pv,aov,invxtxmultregr(x,y)aov
  • aov
  • 64.6193 11.0000 5.8745 6.3754
  • 22.1142 24.0000 0.9214 0.3144
  • We can subtract to get the SS for Interaction
  • 64.6193 - 60.6610
  • See ANOVA table on p 654

42
Interaction
  • We can do interaction between categorical
    variables and quantitative variables
  • Allows for different slopes for different
    categories
  • Can also add an indicator, which allows for
    different intercepts for different categories
  • With this approach, we are assuming a single SD
    for the es in all the models
  • May or may not be a good idea

43
ANACOVA
  • We can do regression with a combination of
    categorical and quantitative variables
  • The quantitative variable is sometimes called a
    co-variate
  • Suppose we want to see if test scores vary among
    different groups
  • But the diff groups may come from diff
    backgrounds which would affect their scores
  • Use some measure of background (quantitative) in
    the regression

44
ANACOVA
  • Then the partial F for the category variable
    after starting with the quan variable will
    measure the diff among groups after correcting
    for background

45
ANACOVA
  • Suppose we want to know if mercury levels in fish
    vary among 4 locations
  • We catch some fish in each location and measure
    Hg
  • But the amount of Hg could depend on size (which
    indicates age), so we also measure that
  • Then we regress on both Size and the indicators
    for Location
  • If partial F for Location is large then we say
    that Location matters, after correcting for Size
Write a Comment
User Comments (0)
About PowerShow.com