Class 17: Tuesday, Nov. 9 - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Class 17: Tuesday, Nov. 9

Description:

A marketing firm studied the demand for a new type of personal digital assistant ... Model Parsimony: If a variable is not of central interest and is not significant, ... – PowerPoint PPT presentation

Number of Views:10
Avg rating:3.0/5.0
Slides: 21
Provided by: D2
Category:

less

Transcript and Presenter's Notes

Title: Class 17: Tuesday, Nov. 9


1
Class 17 Tuesday, Nov. 9
  • Another example of interpreting multiple
    regression coefficients
  • Steps in multiple regression analysis and example
    analysis
  • Omitted Variables Bias
  • Discuss final project

2
Interpreting Multiple Regression Coefficients
Another Example
  • A marketing firm studied the demand for a new
    type of personal digital assistant (PDA). The
    firm surveyed a sample of 75 consumers. Each
    respondent was initially shown the new device and
    then asked to rate the likelihood of purchase on
    a scale of 1 to 10, with 1 implying little chance
    of purchase and 10 indicating almost certain
    purchase. The age (in years) and income (in
    thousands of dollars) were recorded for each
    respondent. The data are in pda.JMP.

3
(No Transcript)
4
Simple Regressions to Predict Rating (Likelihood
of Purchase)
  • As income rises, the likelihood of purchase also
    increases specifically a 10,000 increase in
    income is associated with a 0.7 increase in
    rating.
  • As age increases, the likelihood of purchase also
    increases specifically a 10-year increase in age
    is associated with a 0.9 increase in rating.

5
Multiple Regression
  • For any fixed level of income, the average rating
    decreases by 0.7 if Age increases by 10 years.
  • For all fixed income levels, old consumers have
    higher ratings on average than young consumers
    and at all fixed age levels, average ratings
    increase as income rises.
  • Positive association between age and rating is a
    result of positive association between age and
    income.

6
Air Pollution and Mortality
  • Data set pollution.JMP provides information about
    the relationship between pollution and mortality
    for 60 cities between 1959-1961.
  • The variables are
  • y (MORT)total age adjusted mortality in deaths
    per 100,000 population
  • PRECIPmean annual precipitation (in inches)
  • EDUCmedian number of school years completed for
    persons 25 and older
  • NONWHITEpercentage of 1960 population that is
    nonwhite NOXrelative pollution potential of Nox
    (related to amount of tons of Nox emitted per
    day per square kilometer)
  • SO2relative pollution potential of SO2

7
Multiple Regression Steps in Analysis
  • Preliminaries Define the question of interest.
    Review the design of the study. Correct errors
    in the data.
  • Explore the data. Use graphical tools, e.g.,
    scatterplot matrix consider transformations of
    explanatory variables fit a tentative model
    check for outliers and influential points.
  • Formulate an inferential model. Word the
    questions of interest in terms of model
    parameters.

8
Multiple Regression Steps in Analysis Continued
  1. Check the Model. (a) Check the model assumptions
    of linearity, constant variance, normality. (b)
    If needed, return to step 2 and make changes to
    the model (such as transformations or adding
    terms for interaction and curvature) (c) Drop
    variables from the model that are not of central
    interest and are not significant.
  2. Infer the answers to the questions of interest
    using appropriate inferential tools (e.g.,
    confidence intervals, hypothesis tests,
    prediction intervals).
  3. Presentation Communicate the results to the
    intended audience.

9
Air Pollution and Mortality
  • Question of interest What is the association
    between the air pollution variables (NOX and S02)
    once environmental variables (precipitation) and
    demographic variables have been taken into
    account?

10
Curvature in relationship between Mortality and
S02. Tukeys Bulging Rule suggests transforming
S02 to log S02 as a possible remedy. The
scatterplot of Mortality vs. NOX is crunched.
When a scatterplot between a response and
explanatory variable crunched, transforming the
explanatory variable to log(explanatory variable)
is a good idea.
11
Initial Model
Checking for influential points New Orleans
has Cooks distance of 1.75 and
leverage 0.45gt(36/60). We should remove New
Orleans, noting that it has unusual explanatory
variables and that our conclusions do not apply
to explanatory variables in the range of New
Orleans.
12

Because New Orleans is an influential point and
has leverage 0.45gt(36/60)0.30, we remove it and
note that our model does apply to observations in
the range of explanatory variables of New
Orleans.
13
(No Transcript)
14
Checking the Model
15
(No Transcript)
16
Model Building
  • Model Parsimony If a variable is not of central
    interest and is not significant, we remove it
    from the model.
  • We can remove Education. We dont remove log NOX
    since it is of central interest.

17
Inference About Questions of Interest
  • Strong evidence that mortality is positively
    associated with S02 for fixed levels of
    precipitation, education, nonwhite, NOX.
  • No strong evidence that mortality is associated
    with NOX for fixed levels of precipitation,
    education, nonwhite, S02.

18
Multiple Regression and Causal Inference
  • Goal Figure out what the causal effect on
    mortality would be of decreasing air pollution
    (and keeping everything else in the world fixed)
  • Lurking variable A variable that is associated
    with both air pollution in a city and mortality
    in a city.
  • In order to figure out whether air pollution
    causes mortality, we want to compare mean
    mortality among cities with different air
    pollution levels but the same values of the
    confounding variables.
  • If we include all of the lurking variables in the
    multiple regression model, the coefficient on air
    pollution represents the change in the mean of
    mortality that is caused by a one unit increase
    in air pollution.

19
Omitted Variables
  • What happens if we omit a lurking variable from
    the regression, e.g., percentage of smokers?
  • Suppose we are interested in the causal effect of
    on y and believe that there are lurking
    variables
  • and that
  • is the causal effect of on y. If we
    omit the confounding variable, , then the
    multiple regression will be estimating the
    coefficient as the coefficient on .
    How different are and .

20
Omitted Variables Bias Formula
  • Suppose that
  • Then
  • Formula tells us about direction and magnitude of
    bias from omitting a variable in estimating a
    causal effect.
  • Omitted variable bias
  • Formula also applies to least squares estimates,
    i.e.,
Write a Comment
User Comments (0)
About PowerShow.com