C12 Multiple Regression - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

C12 Multiple Regression

Description:

1) Re-arrange Columns. 2) Run PH-stat | Regression | Multiple Regression. 3) Write ... KISS - Is it worth adding an IV to gain 1%? Multiple Regression Questions ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 16
Provided by: drandre8
Category:

less

Transcript and Presenter's Notes

Title: C12 Multiple Regression


1
C12 - Multiple Regression
  • Diet example
  • In many situations, a better fitting model can be
    developed if more than one explanatory variable
    (IV) is considered.
  • Several IVs can be used to predict the value of
    a DV
  • Many variable can be considered, problem is
    choosing which ones to use.
  • Book example, 2 IVs, Price (X1) and Promotion
    (X2) are used to predict Sales(Yhat).
  • OMNI.XLS
  • output on p584, develop a prediction equation
  • Sales 5837 - 53.2 X1i 3.61 X2i
  • Sales 5837 - 53.2 (Price) 3.61 (Promotion)
  • notice the negatives
  • predict sales with Price79 and Promotion 400
  • Sales 5837 - 53.2 (79) 3.61 (400)
  • Sales 3079.57

2
r2 and adjusted r2
  • We are searching for the best model to predict
    the DV. The question is which IVs to use.
  • r2 The proportion of variation in Y that is
    explained by the IV (Xs) in the regression
    model.
  • adjusted r2 reflects both the number of IVs in
    the model and the sample size. Necessary when we
    are comparing regression models that have
    different numbers of IVs and different sample
    sizes.
  • Book example adjusted r2 .74 or 74 of the
    variation in Sales can be explained by our
    multiple regression model - adjusted for number
    of predictors and sample size.
  • 26 remains unexplained
  • 4 Ps
  • weather, month. (are these the same?)

3
Correlation
  • ToolsData AnalysisCorrelation
  • All the variable are entered in one range
    (A1C18)
  • Include labels for each column in the range
  • Correlation matrix
  • the two IVs have a very small correlation. I
    dont think so??? Promotion adds to Price.

4
Testing Portions of a Multiple Regression
  • The objective is to utilize only the explanatory
    variables (IVs) that are useful in predicting
    the value of the DV
  • If an IV is not helpful in making the prediction,
    it can be deleted from the multiple regression
    model.
  • Many methods of doing this, the most common
  • forward stepwise - start with the highest
    correlation and add the next highest correlation
    over and over until the adjusted r2 stops
    growing
  • backward stepwise - start with all the IVs that
    are available and eliminate the smallest
    correlation, then the next smallest correlation,
    over and over, until the adjusted r2 starts to
    get smaller.
  • All possible regression - evaluate all the
    possible combinations of the IVs
  • Step A, evaluate the Correlation matrix, this
    should give you clues of what to expect.
  • Step B, evaluation all possible regressions using
    the adjusted r2. Evaluate p-value from each IV
    in the model to see if that IV will contribute to
    a multiple regression model. P-value analysis is
    the same as a F-test in excel output, choose the
    best model, make a prediction

5
Book example OMNI.XLS
  • Correlation matrix P601
  • IVs have a small correlation, both should
    contribute to the model
  • 3 possible models to choose from
  • only Promotion as the IV adjusted r2 .264
    p-value .001 boarder line situation
  • only Price as the IV adjusted r2 .526 p-value
    .0000 strong rejection, this IV should contribute
    to a multiple regression model
  • both Price and Promotion as IVs adjusted r2
    .74
  • Choose model with both Price and Promotion,
    highest adjusted r2

6
  • 3 excel procedures need to be performed to
    generate the out put needed to answer the
    following 8 questions.
  • a) PhstatRegressionBest Subsets
  • Include labels of the column in the DV and IV
    ranges
  • Only a single range is entered for the IVs. So
    all the IVs need to be in adjacent columns
    (C1I90)
  • p636
  • This procedure will provide an all variable
    regression in a sheet called AllX
  • This procedure will provide summary information
    for all the regressions in a sheet called
    BESTWS
  • 1) Sort by Yes/No
  • 2) Sorted by adjusted R square
  • 3) Choose highest adjusted R square, while taking
    into account the KISS principle.
  • b) ToolsData AnalysisCorrelation will provide a
    correlation matrix. (B1I90)
  • c) PhstatRegressionMultiple Regression will
    provide the information go generate the
    prediction equation for the chosen model.
  • 1) Re-arrange Columns
  • 2) Run PH-stat Regression Multiple Regression
  • 3) Write equation
  • 4) Predict value for a set of IVs

7
Auto96 Multiple Regression Problem
  • File Name auto_96_data.xls
  • Solution File Name auto_96_solution.xls
  • Goal/Purpose Predict MPG for a car with
  • 0 Type of Drive
  • 1 Fuel Type
  • 190 Length
  • 40 Turning Circle
  • 3295 Weight
  • 12 Luggage Capacity
  • 2.8 Front Head Room
  • MPG Intercept (Beta IV?) (Beta IV?)
    ...
  • MPG 55.30 - (.0069 Weight) - (.2665
    Turning Circle)

8
Multiple Regression Questions
  • 1) Determine the multiple independent variables
    and the one dependant variable.
  • DV - MPG
  • IVs - Weight, Type of Drive, Fuel Type, Length,
    Turning Circle, Luggage Capacity, Front Head
    Room.
  • 2) What is the maximum percentage of variation of
    the DV that the IVs can explain.
  • From the AllX regression, using all the IVs,
    r2 .8388. Using all the information we have,
    we can explain about 84 of the variation in MPG.
    About 16 of the variation in MPG is
    unexplained, caused by variables that we have yet
    to define.
  • 3) Which IV or IVs appear to be the best
    predictor of the DV. Place them in order from
    strongest relationship to weakest. Provide
    correlations.
  • Weight -0.91
  • Length -0.83
  • Turning Circle -0.73
  • Type of Drive 0.41
  • Luggage Capacity -0.37
  • Fuel Type 0.31
  • Front Head Room 0.05

9
  • 4) Which prediction models did you consider and
    why?
  • 24 models that meet the criteria of Cplt p 1
    were considered. The one variable model with
    Weight did not meet the criteria. Adjusted R
    square and the KISS principle were used in
    considering the following models, listed from
    simplest to most complex.
  • of Variable Variable Cplt p 1
    adj r2 r2
  • 1 Weight No .818 .831
  • 2 Weight, TC Yes .828 .833
  • 3 Weight, TC, FHR Yes .827 .835
  • 4 Yes .827 .835
  • 5) Which prediction model did you choose and why?
  • The two variable model with Weight and Turning
    Circle was selected because it had the highest
    adjusted r2 and the fewest number of variables.
    This model explains about 83 of the variation in
    MPG compared to 83 explained by the three
    variable model. The two variable model is
    smaller and easier to use then the AllX model and
    does all most as good of job predicting MPG. The
    three variable model is larger and did not
    provide any increase in prediction accuracy.
  • The prediction model is
  • MPG 55.30 - (.0069 Weight) - (.2665
    Turning Circle)
  • DO NOT use All X worksheet for this equation

10
Multiple Regression Questions
  • 6) For a given set of IV information, what is
    your predicted value of the dependant variable.
  • 0 Type of Drive
  • 1 Fuel Type
  • 190 Length
  • 40 Turning Circle
  • 3295 Weight
  • 12 Luggage Capacity
  • 2.8 Front Head Room
  • MPG 55.30 - (.0069 Weight) - (.2665
    Turning Circle)
  • MPG 55.30 - (.0069 3295) - (.2665 40)
  • MPG 21.96
  • 7) Interpret the r2 for your prediction model.
  • Variations in weight and turning circle account
    for 83.2 of the variation in MPG. About 16.8
    of the variation in MPG is left unexplained by
    the model. Using a model with all the variables
    we could have explained about 84 of the
    variation in MPG.
  • 8) Identify other factors (independent variables)
    that were not considered that might be useful in
    prediction the dependant variable.
  • Aerodynamics, Foreign/Domestic, age of design,
    engine size, Auto/Manual transmission, tire size.

11
BB97 Multiple Regression Problem
  • Data File Name BB97_data.xls
  • Solution File Name BB97_solution.xls
  • Goal/Purpose Predict wins for a team with
  • 4.7 E.R.A. 775 Runs Scored
  • 1500 Hits Allowed 600 Walks Allowed
  • 35 Saves 120 Errors
  • 0 League
  • Wins Intercept (Beta IV?) (Beta IV?)
    ...
  • 4 variable model
  • Wins Intercept (Beta E.R.A.) (Beta
    Runs Scored) (Beta Saves) (Beta Hits
    Allowed)
  • Wins 76.05 wins
  • 3 variable model
  • Wins Intercept (Beta E.R.A.) (Beta
    Runs Scored) (Beta Saves)
  • Win 75.23 wins

12
Multiple Regression Questions
  • 1) Determine the multiple independent variables
    and the one dependant variable.
  • DV - Wins
  • IVs - League, E.R.A., Hits Allowed, Runs Scored,
    Walks Allowed, Saves, Errors.
  • 2) What is the maximum percentage of variation of
    the DV that the IVs can explain.
  • From the AllX regression, using all the IVs,
    r2 .895. Using all the information we have, we
    can explain about 90 of the variation in Wins.
    About 10 of the variation in wins is
    unexplained, caused by variables that we have yet
    to define.
  • 3) Which IV or IVs appear to be the best
    predictor of the DV. Place them in order from
    strongest relationship to weakest. Provide
    correlations.
  • E.R.A. -0.60
  • Saves 0.53
  • Runs Scored 0.49
  • Hits Allowed -0.45
  • Walks Allowed -0.21
  • League 0.08
  • Errors 0.05

13
  • 4) Which prediction models did you consider and
    why?
  • 14 models that meet the criteria of Cplt p 1
    were considered. No one variable or two variable
    models meet the criteria. Adjusted R square and
    the KISS principle were used in considering the
    following models, listed from simplest to most
    complex.
  • of Variable Variable Cplt p 1
    adj r2 r2
  • 1 ERA No .339
  • 2 ERA, RS No .799
  • 3 ERA, RS, Saves Yes .859 .874
  • 4 ERA, RS, Saves, HA Yes .866 .885
  • 5) Which prediction model did you choose and why?
  • The three variable model with ERA, Runs
    Scored and Saves was selected because a 1
    increase in adjusted r2 did not justify adding
    an additional variable (Hits Allowed). Hits
    Allowed had a non-significant p-value in the
    four variable model indicating that it
    contributed very little to the model. This 3
    variable model explains about 86 of the
    variation in Wins compare to about 87 explained
    by the 4 variable model. The two variable model
    did not meet the Cplt p 1 criteria. The 3
    variable model is smaller than the 4 variable
    model and does all most as good of job predicting
    Wins.
  • The prediction model is
  • Wins 57.92 - (12.42 ERA) (.3695
    Saves) (.081 RS)
  • DO NOT use All X worksheet for this equation
  • Or..
  • The four variable model was chosen because a 1
    increase in adjusted r2 justified adding an
    additional variable (Hits Allowed). This 4
    variable model explains about 87 of the
    variation in Wins compared to the 3 variable
    model which explained about 86.
  • The prediction model is

14
Multiple Regression Questions
  • 6) For a given set of IV information, what is
    your predicted value of the dependant variable.
  • 4.7 E.R.A. 775 Runs Scored
  • 1500 Hits Allowed 600 Walks Allowed
  • 35 Saves 120 Errors
  • 0 League
  • Wins 57.92 - (12.42 ERA) (.3695
    Saves) (.081 RS)
  • Wins 57.92 - (12.42 4.7) (.3695 35)
    (.081 775)
  • Wins 75.23
  • Wins 73.08 - (8.87ERA) (.344Saves)
    (.088RS) - (.024HA)
  • Wins 73.08 - (8.874.7) (.34435)
    (.088775) - (.0241500)
  • Wins 76.05
  • 7) Interpret the r2 for your prediction model.
  • Variations in ERA, Runs Scored and Saves
    account for 87.4 of the variation in Wins.
    About 14.1 of the variation in Wins is left
    unexplained by the model. Using a model with all
    the variables we could have explained about 89.5
    of the variation in Wins.
  • 8) Identify other factors (independent variables)
    that were not considered that might be useful in
    prediction the dependant variable.
  • Salary of players, salary of coaches, whether,
    size of market, amount bet on games, TV
    appearance, dome.

15
Final
  • Times
  • Night section
  • Day section
  • You can take it with the other section.
  • 3 hour time limit
  • Comprehensive
  • 40 Multiple choice (160 points)
  • some verbal/concepts
  • mostly problems
  • Problems
  • (40 points ) Multiple Regression, like bb97 and
    auto96, choose the best model and use it to
    predict.
Write a Comment
User Comments (0)
About PowerShow.com