Introduction to Multivariate Regression - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Introduction to Multivariate Regression

Description:

... number of hours of light per day on plant growth, holding constant the effects ... of respondent's to public opinion polls in the state (-100 = conservative; 0 ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 16
Provided by: homeUc
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Multivariate Regression


1
Introduction to Multivariate Regression
  • Jeff Grynaviski

2
Motivation
  • In the natural sciences, researchers can often
    conduct controlled experiments to isolate the
    effects of one variable on another.
  • Example An agronomist wants to know the effect
    of nitrogen on the growth of a certain plant.
  • However, she also knows that the plants growth
    also depends on how much light, water, and
    nutrients (e.g. nitrogen, phosphorous, potassium,
    and boron) it receives.
  • To isolate the effect of nitrogen on plant
    growth, the researcher can conduct a controlled
    experiment involving 50 similar plants, and then
    try to give each plant the same amounts of light,
    water, and important nutrients (other than
    nitrogen), in order to control for these effects
    on plant growth.
  • These 50 plants will be given different amounts
    of nitrogen and there plant growth recorded.
  • Because the other important growth factors have
    been held constant, the researcher can attribute
    the observed variations in plant growth to the
    different applications of nitrogen.
  • There will inevitably be chance variation in
    observed plant growth because a) not all plants
    are created equal b) errors in measurement c)
    the application of light, water, and nutrients
    cannot be precisely identical for each plant and
    d) there are always other growth influences over
    which the researcher has not control.
  • Nevertheless, the most important influences have
    been identified and controls carefully imposed.
  • The experiment should yield useful data for
    estimating the equation
  • Y B0 B1 X1 e

3
Motivation (cont.)
  • In the social sciences, researchers can seldom
    set up these sorts of controlled laboratory
    experiments. Instead, they must make do with
    natures experiments and try to unravel the
    effects of one variable on another in a world in
    which many variables are changing at the same
    time.
  • In multiple regression, an equation with several
    explanatory variables is estimated in an attempt
    to isolate the separate effect of each dependent
    variable.
  • In other words, rather than trying to impose
    restrictions on the values of influences like
    sunlight and other nutrients through physical
    controls, we measure these variables, along with
    the independent variable about which we are
    interested.

4
The Set-Up
  • In general, a multiple regression can be written
    as
  • Y B0 B1 X1 B2 X2 Bk Xk e
  • Thus, the dependent variable Y depends on k
    explanatory variables plus an error term. (It
    does not matter what order the explanatory
    variables are listed.)
  • The regression coefficient B1 gauges the effect
    of the first explanatory variable X1 on the
    dependent variable Y, holding the other variables
    constant. Similarly, B2 gives the effect of X2 on
    Y.
  • In effect, the regression coefficient B1 measures
    the same thing that controlled laboratory
    experiments seek, which is to control for the
    effect on Y of an isolated change in one
    explanatory variable.

5
Example
  • Plant Growth B0 B1 X1 B2 X2 B3 X3 e
  • X1 Average hours of Light / day
  • X2 ml of Nitrogen / day
  • X3 ml of Water / day
  • B1 measures the effect of the number of hours of
    light per day on plant growth, holding constant
    the effects of nitrogen and water.
  • B2 measures the effect of nitrogen on plant
    growth, holding constant the effects of light and
    water.
  • B3 measures the effect of water on plant growth,
    holding constant the effects of light and
    nitrogen.
  • What this shows is that while a controlled
    experiment can estimate a regression coefficient
    Bi by holding the other variables constant, a
    multiple regression procedure estimates Bi by
    taking into account how uncontrolled changes in
    the other variables influence Y.

6
Derivation of OLS estimators
  • The OLS estimators for the multiple regression
    model are chosen in order to minimize the error
    squared, which is precisely the approach that we
    took with the bivariate regression model.
  • In other words, for the model
  • Y B0 B1 X1 B2 X2 Bk Xk e
  • B0, B1, B2, Bk are chosen in order to minimize
  • ?ei2 ?i(Yi - Yhati)2 ?i(Yi (B0 B1X1i
    BkXki ))2

7
Derivation of OLS estimators (cont.)
  • The solution to the optimization problem
  • min B0, B1, B2,, BK ?i(Yi - (B0 B1X1i
    BkXki ))2
  • is quite complicated, and requires a fair amount
    of linear algebra.
  • It is enough for you to know that statistical
    routines in Excel or SPSS are more than adequate
    to determine the values of the regression
    intercept and coefficients.
  • In addition, computer software will provide
    estimates of
  • - the standard errors for each regression
    coefficient which are interpreted as the amount
    of uncertainty about B0, B1, , Bk which we use
    for hypothesis tests about each of the
    independent variables
  • - the R2 which indicates the proportion of the
    variance in the dependent variable Y which is
    explained by our independent variables
  • - and other stuff which we may talk about later.
  • The key point for you, is that each of the
    concepts from bivariate regression has an analog
    in multiple regression, and should be interpreted
    in essentially the same way.

8
Example
  • Rather than presenting a lot of math, or a
    fictitious example, we shall go over an actual
    application of multiple regression.

9
  • State Policy Components of Interstate Migration
    in the United States
  • Robert Preuhs
  • Political Research Quarterly
  • 1999

10
Overview of the Argument
  • 1) State are laboratories for democracy
  • 2) Better laboratories may provide more
    desirable places to live to some people.
  • 3) People vote with their feet
  • 4) Governments compete for citizen resources
    (they want more tax ).

11
Theory
  • The Consumer-Voter Model of Political Behavior
  • Citizens make migration decisions to maximize
    their utility.
  • Utility for a location is a function of
  • Public Policy
  • ratio of government investment spending to
    government consumption spending
  • State ideology (provides a general indicator of a
    states policy preferences).
  • Economic Performance
  • Climate

12
Epistemic Relationships
  • The Dependent Variable is Percent Change in a
    states total population from 1991 to 1994.
  • Independent Variables
  • A. Public Policy Variables
  • The Ratio of Government Investment Spending to
    Government Consumption Spending is measured by
  • ( Per Capita State Highway Spending Per Capita
    State Education Spending )
  • / ( Per Capita State Welfare Spending ) FY
    1987-1989.
  • State Ideology is measured by the average
    ideological identification of respondents to
    public opinion polls in the state (-100
    conservative 0 moderate 100 liberal)
  • Tax Burden is measured by the average per capita
    state and local taxes in the State from FY
    1987-1999.

13
Epistemic Relationships cont.
  • Independent Variables
  • B. Economic Variables
  • New Jobs is measured by
  • ( Number of Jobs 1990 Number in 1987 ) /
    Number in 1987
  • Income is measured by the
  • ( average median income for a family of four in
    1990 )
  • / (national mean income for a family of four)
    in 1990.
  • C. Control Variables.
  • Migration in the 1980s Average annual change in
    population for state i from 1980-1989.
  • Mean Temperature Mean Temp for the largest city
    in state i.

14
Regression Model and Hypotheses
  • The following OLS model was estimated (with
    research hypotheses in parentheses).
  • Net Migrationi
  • B0 B1 Tax Burdeni (H1 B1 lt 0)
  • B2 Investment Consumption Ratioi (H1 B2 gt
    0)
  • B3 Ideologyi (H1 B3 ?)
  • B4 New Jobsi (H1 B4 gt 0)
  • B5 Migration in the 1980si (H1 B5 gt 0)
  • B6 Mean Temperaturei (H1 B6 gt 0)

15
Findings
Indicates that a regression coefficient is
significant at .05 in a one-tailed test.
Write a Comment
User Comments (0)
About PowerShow.com