Multiple regression - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Multiple regression

Description:

Multiple regression – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 63
Provided by: johnrott
Category:

less

Transcript and Presenter's Notes

Title: Multiple regression


1
Multiple regression
  • V506 Class 12
  • November 12, 2009

2
Overview
  • Theory and common sense
  • Stepwise regression
  • Stepwise regression in SPSS
  • Qualitative (dummy) variables
  • Curvilinear relationships
  • Transforming variables in SPSS
  • Multicollinearity

3
Theory and common sense
  • How do you select independent variables for
    multiple regression model?
  • Some guidance provided by theory suggesting
    causal relationships
  • Other variables may be suggested by common sense,
    general knowledge of what might affect dependent
    variable
  • But there should be reasons for the inclusion of
    any independent variable

4
Stepwise regression
  • Given a list of candidate independent variables,
    which do you use for the regression model?
  • SPSS Linear Regression procedure includes
    stepwise regression
  • Stepwise regression selects variables, one at a
    time, that are likely to produce the regression
    model that most effectively predicts the
    dependent variable

5
Stepwise regression
  • Stepwise regression has (deservedly) gained a bad
    reputation among statisticians
  • Tendency by some to throw in any possible
    variable as a potential independent variable, let
    the stepwise procedure sort through to find a
    regression model

6
Stepwise regression
  • Used in this manner, high likelihood of
    relationships being found that depend on random
    variation in sample rather than true
    relationships in population
  • But with a more carefully selected, limited set
    of possible independent variables, stepwise
    regression can be a valuable tool in finding a
    regression model

7
Stepwise regression to predict housing value
  • Predict housing value in AffHsgEx.sav
  • As independent variables, choose pct renter-occ
    units with rent lt 200, metropolitan area
    population, pct population change, and median
    family income

8
Stepwise regression outputvariables
entered/removed
9
Stepwise regression outputmodel summary
10
Stepwise regression output--ANOVA
11
Stepwise regression output--coefficients
12
Stepwise regression outputexcluded variables
13
Stepwise regression in SPSS
  • Use Statistics, Regression, Linear command
  • Enter all of the candidate independent variables
  • Select Method Stepwise

14
Using qualitative variables in regression
  • So far we have used one or more quantitative
    independent variables as predictor(s) of a
    continuous dependent variable
  • But sometimes it can be useful to include
    qualitative, categorical variables as predictors
    in multiple regressions

15
Predicting housing values using median family
income
  • Start trying to predict the median value of
    owner-occupied houses in metropolitan areas using
    median family income

16
Regression results
17
Regression results (continued)
18
Including a categorical variable
  • Some other analyses suggest that the region of
    the county has a strong effect on housing values
  • With median housing values being highest in the
    West

19
Boxplots of housing value by region
20
ANOVA of housing value by region
21
Creating a West dummy variable
  • Housing values are higher in the West
  • Want to include information on whether
    metropolitan areas is in the West in the
    regression
  • Create dummy variable
  • Value 1 if metro area in West
  • Value 0 if metro area not in West

22
Regression results
23
Regression results
24
Interpreting dummy variable regression coefficient
  • Regression coefficient for the West dummy
    variable is 22895.976
  • Significance is 0.000, so regression coefficient
    is significantly different from zero (reject null
    hypothesis of equal to zero)
  • Coefficient says housing values are 22,895
    higher in West than in other regions, after
    controlling for effect of median family income

25
Nonlinear relationships
  • Regression assumes linear relationships between
    the dependent variable and the independent
    variables
  • But variables can be related to one another with
    a nonlinear, curvilinear relationship

26
Relationship of rent of new units to population
  • Looking at data for 101 of the largest
    metropolitan areas in 1980
  • Dependent variablerent of renter-occupied units
    built 1975-1980
  • Independent variablepopulation

27
Scatterplot
28
Regression results
29
Regression results
30
Scatterplot illustrating nonlinear relationship
31
Form of relationship
  • Rent of new units increases with population
  • But the amount of that increase seems to decline
    as population increases
  • Suggests possibility of nonlinear relationships

32
Variable transformation
  • Can often handle nonlinear relationships by doing
    a mathematical transformation of the independent
    or dependent variable that makes the relationship
    linear
  • Curve on the scatterplot shows what the
    relationship would be if rent were related to the
    natural logarithm of population

33
Doing variable transformation
  • Create new variable that has the value of the
    natural logarithm of population
  • Use that new variable as the independent variable
    in the regression
  • Regression equation

34
Scatterplot of rent versus log of population
35
Regression results using log of population
36
Regression results using log of population
37
Fitted linear and logarithmic regressions
38
Analysis of residuals
  • Sometimes it is easier to understand what is
    going on in a regression by looking at the
    residualsthe errors in prediction of the
    dependent variable
  • Can plot the residuals versus the predicted
    values to look for patterns in the residuals
  • Normally plot standardized residuals
  • Perfect fit would then be horizontal line at 0
  • Scatter above and below indicate errors in
    prediction

39
Residualspredicting rent with population
40
Interpreting the plot of residuals
  • Note the curve in the pattern of the residuals
  • This indicates the presence of a nonlinear
    relationship
  • Suggests using some transform of the independent
    variable to create a more linear relationship
  • Also a lot more variation (larger residuals) for
    areas with smaller populationsproblem of
    heteroskedasticity

41
Residualspredicting rent with log of population
42
Interpreting plot for the revised regression
  • More of a random scattering of the residuals than
    before
  • Lack of distinct pattern suggests relationship is
    closer to being linear
  • Also, somewhat more even amounts of variation at
    different population levels except for highest
    less of a problem of heteroskedasticity

43
Library use as a function of travel time
  • Percent using library in different zip codes as
    function of distance to library
  • Linear
  • Negative exponential

44
Library use as a function of travel timelinear
45
Library use as a function of travel timenegative
exponential
46
Possible forms for variable transformation
  • Could use any mathematical function
  • Commonly-used functions include
  • Natural logarithm of variable
  • Square of variable
  • Inverse of variable
  • Variable and its square (quadratic function)

47
Transforming variables in SPSS
  • Use Transform, Compute command
  • For Target Variable, enter new variable name for
    new variable
  • Use TypeLabel button to enter variable label for
    new variable
  • Create Numeric Expression as function of other
    variable(s), using Functions, if necessary

48
Multicollinearity
  • High levels of intercorrelation among independent
    variables can produce unstable estimates of
    regression coefficients and insignificant results
  • Produce regression models that are not very
    informative or useful

49
Percent births to teens in Marion County
  • Attempt to predict the percentage of births to
    teenage mothers by census tract in Marion County
  • Hypothesis that this would be affected by
    socioeconomic status
  • Conduct first regression using percent college
    graduates, median family income

50
First regression output
51
First regression output
52
Expanding the regression
  • Both independent variables are significant
  • Given that logic regarding socioeconomic status
    seems sound, add percent persons below poverty
    level, percent high school graduates

53
Second regression output
54
Second regression output
55
Multicollinearity in the results
  • Note that regression coefficients for percent
    college grads and median family income are very
    different, are no longer significant
  • Nor is percent below poverty level significant
  • Problem of multicollinearity
  • Results from high levels of correlations among
    independent variables
  • Regression is no longer very useful

56
Correlations among variables
57
Signs of multicollinearity
  • Significant correlations between pairs of
    independent variables
  • Nonsignificant tests for some or all of
    regression coefficients when overall model is
    significant
  • Opposite signs from what is expected

58
SPSS scatterplot creation
  • Use the Graphs, Chart Builder command
  • Click on the Gallery tab and select Scatter/Dot
    on the list of graph types in the lower left
  • Drag the Simple Scatter icon (the first one) onto
    the canvas (the blank area at the top)
  • Drag the independent variable into the x-axis
    drop zone
  • Drag the dependent variable into the y-axis drop
    zone

59
SPSS correlation
  • Use Analyze, Correlate, Bivariate command
  • Select variables for correlation matrix
  • Check mark next to Pearson
  • Can specify one-tailed tests of significance if
    desired (or simply divide reported Sig. by 2)

60
SPSS linear regression
  • Use Statistics, Regression, Linear command
  • Select Dependent Variable
  • Select Independent Variable
  • Can use Statistics to specify descriptive
    statistics if desired

61
Stepwise regression in SPSS
  • Use Statistics, Regression, Linear command
  • Enter all of the candidate independent variables
  • Select Method Stepwise

62
Transforming variables in SPSS
  • Use Transform, Compute command
  • For Target Variable, enter new variable name for
    new variable
  • Use TypeLabel button to enter variable label for
    new variable
  • Create Numeric Expression as function of other
    variable(s), using Functions, if necessary
Write a Comment
User Comments (0)
About PowerShow.com