Multiple Regression Models: Interactions and Indicator Variables - PowerPoint PPT Presentation

About This Presentation
Title:

Multiple Regression Models: Interactions and Indicator Variables

Description:

Multiple Regression Models: Interactions and Indicator Variables Today s Data Set A collector of antique grandfather clocks knows that the price received for the ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 27
Provided by: ITC97
Learn more at: https://www.uky.edu
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression Models: Interactions and Indicator Variables


1
Multiple Regression Models Interactions and
Indicator Variables
2
Todays Data Set
  • A collector of antique grandfather clocks knows
    that the price received for the clocks increases
    linearly with the age of the clocks. Moreover,
    the collector hypothesizes that the auction price
    of the clocks will increase linearly as the
    number of bidders increases.
  • (Lets hypothesize a first order MLR model.)

3
First order model
  • y ß0 ß1x1 ß2x2 e
  • y auction price
  • x1 age of clock (years)
  • x2 number of bidders

4
The regression equation is
  • AuctionPrice - 1339 12.7 AgeOfClock 86.0
    Bidders
  • Analysis of Variance
  • Source DF SS MS
    F P
  • Regression 2 4283063 2141531 120.19
    0.000
  • Residual Error 29 516727 17818
  • Total 31 4799790

5
Tests of Individual ß Parameters
  • Predictor Coef SE Coef T P
  • Constant -1339.0 173.8 -7.70 0.000
  • AgeOfClock 12.7406 0.9047 14.08 0.000
  • Bidders 85.953 8.729 9.85 0.000

6
What if the relationship between E(y) and either
of the independent variables depends on the other?
  • In this case, the two independent variables
    interact, and we model this as a cross-product of
    the independent variables.

7
For our example
  • Do age and the number of bidders interact?
  • In other words, is the rate of increase of the
    auction price with age driven upward by a large
    number of bidders?
  • In this case, as the number of bidders increases,
    the slope of the price versus age line increases.
  • To facilitate investigation, number of bidders
    has been separated into
  • A 0-6 bidders
  • B 7-10 bidders
  • C 11-15 bidders

8
Are these slopes parallel, or do they change with
the number of bidders?
9
Caution!
  • Once an interaction has been deemed important in
    a model, all associated first-order terms should
    be kept in the model, regardless of the magnitude
    of their p-values.

10
Another Example Graph and interpret the
following findings
  • Lets say we want to study how hard students work
    on tests. We have some achievement-oriented
    students and some achievement-avoiders. We create
    two random halves in each sample, and give half
    of each sample a challenging test, the other an
    easy test. We measure how hard the students work
    on the test. The means of this study are

Achievement-oriented (n100) Achievement avoiders (n100)
Challenging test 10 5
Easy test 5 10
11
Conclusions
  • E(y) ß0 ß1x1 ß2x2 ß3x1x2
  • The effect of test difficulty (x1) on effort (y)
    depends on a students achievement orientation
    (x2).
  • Thus, the type of achievement orientation and
    test difficulty interact in their effect on
    effort.
  • This is an example of a two-way interaction
    between achievement orientation and test
    difficulty.

12
Basic premises up to this point
  • We have used continuous variables (we can assume
    that having a value of 2 on a variable means
    having twice as much of it as a 1.)
  • We often work with categorical variables in which
    the different values have no real numerical
    relationship with each other (race, political
    affiliation, sex, marital status)
  • Democrat(1), Independent(2), Republican(3)
  • Is a Republican three times as politically
    affiliated as a Democrat?
  • How do we resolve this problem?

13
Dummy Variables
  • A dummy variable is a numerical variable used in
    regression analysis to represent subgroups of the
    sample in your study.
  • Dummy variables have two values 0, 1
  • "Republican" variable someone assigned a 1 on
    this variable is Republican and someone with an 0
    is not.
  • They act like 'switches' that turn various
    parameters on and off in an equation.

14
Creating Dummy Variables
  • In Minitab, we can recode the categorical
    variable into a set of dummy variables, each of
    which has two levels.
  • In the regression model, we will use all but one
    of the original levels.
  • The level which is not included in the analysis
    is the category to which all other categories
    will be compared (base level.) You decide this.
  • The coefficient on the variable in your
    regression will show the effect that being that
    variable has on your dependent variable.

15
Returning to the Clocks at Auction Data Set
  • The collector of antique grandfather clocks knows
    that the price received for the clocks increases
    linearly with the age of the clocks and he
    hypothesized that the auction price of the clocks
    will increase linearly as the number of bidders
    increases. But lets say he doesnt have the
    exact number of bidders, only knows if there was
    a high number of bidders (well say 9 and above)
    or a low number (below 9.)

16
Lets Create a Dummy Variable in Minitab
  • Well use the Bidders2Cat column
  • Calc? Make Indicator Variables
  • In top box, specify that you want to make
    indicator variables for Bidders2Cat
  • Lets store results in C10 - C11
  • Once you have created the variables, name columns
    10 and 11 ManyBidders and FewBidders
  • Which is which? Why?

17
Before we run the analysis
  • Lets say we decide to include ManyBidders
    (FewBidders is the base level.)
  • Because FewBidders is not included, we can
    determine if ManyBidders predicts a different
    Auction Price than FewBidders.
  • If ManyBidders is significant in our regression,
    with a positive ß coefficient, we conclude that
    ManyBidders has a significant effect on the price
    of the clocks at auction.

18
Thinking Through the Variables
  • What is x1?
  • Lets hypothesize the model in plain English
    just looking at high/low bidders.)
  • Whats the Null Hypothesis?

19
Run the Analysis
  • Results of the t-test?
  • What would happen if we used ManyBidders as our
    base?

20
Lets look at your Journal Application
  • What does it mean to create a dummy variable and
    when is it appropriate to do this?
  • What are all the terms in the original model?
  • This researcher started with a complex model and
    simplified it. Which model was better? How can we
    know?

21
Nested Models
  • Two models are nested if both contain the same
    terms and one has at least one additional term.
  • Example
  • The first (straight-line) model is nested within
    the second (curvilinear) model.
  • The first model is the reduced model and the
    second is the full or complete model.

22
Which is better? How do we decide?
  • In this example, we would test
  • To test, we compare the SSE for the reduced model
    (SSER) and the SSE for the complete model (SSEC).
  • Which will be larger?

At least one
23
Error is Always Greater for the Reduced Model
  • SSERgtSSEC
  • Is the drop in SSE from fitting the complete
    model large enough?
  • We use an F-test to compare models
  • Here, we test the null hypothesis that our
    curvature coefficients simultaneously equal zero.

24
Test statistic F
  • F drop in SSE/number of ßs being tested
  • s2 for larger model
  • Table C4 (p. 766) gives you the critical value
    for F
  • df for numerator (v1)
  • Number of ßs being tested
  • df for denominator (v2)
  • Number of ßs in the complete model
  • If F The critical F value, reject H0. At least
    one of the additional terms contributes
    information about the response.

25
Conclusions?
  • Parsimonious models are preferable to big models
    as long as both have similar predictive power.
  • A model is parsimonious when it has a small
    number of predictors
  • In the end, choice of model is subjective.

26
Question 3 (Journal)
  • What type of error do we risk making by
    conducting multiple t-tests?
  • Pages 184, 188
Write a Comment
User Comments (0)
About PowerShow.com