Title: Class 17: Tuesday, Nov. 9
1Class 17 Tuesday, Nov. 9
- Another example of interpreting multiple
regression coefficients - Steps in multiple regression analysis and example
analysis - Omitted Variables Bias
- Discuss final project
2Interpreting Multiple Regression Coefficients
Another Example
- A marketing firm studied the demand for a new
type of personal digital assistant (PDA). The
firm surveyed a sample of 75 consumers. Each
respondent was initially shown the new device and
then asked to rate the likelihood of purchase on
a scale of 1 to 10, with 1 implying little chance
of purchase and 10 indicating almost certain
purchase. The age (in years) and income (in
thousands of dollars) were recorded for each
respondent. The data are in pda.JMP.
3(No Transcript)
4Simple Regressions to Predict Rating (Likelihood
of Purchase)
- As income rises, the likelihood of purchase also
increases specifically a 10,000 increase in
income is associated with a 0.7 increase in
rating. - As age increases, the likelihood of purchase also
increases specifically a 10-year increase in age
is associated with a 0.9 increase in rating.
5Multiple Regression
- For any fixed level of income, the average rating
decreases by 0.7 if Age increases by 10 years. - For all fixed income levels, old consumers have
higher ratings on average than young consumers
and at all fixed age levels, average ratings
increase as income rises. - Positive association between age and rating is a
result of positive association between age and
income.
6Air Pollution and Mortality
- Data set pollution.JMP provides information about
the relationship between pollution and mortality
for 60 cities between 1959-1961. - The variables are
- y (MORT)total age adjusted mortality in deaths
per 100,000 population - PRECIPmean annual precipitation (in inches)
- EDUCmedian number of school years completed for
persons 25 and older - NONWHITEpercentage of 1960 population that is
nonwhite NOXrelative pollution potential of Nox
(related to amount of tons of Nox emitted per
day per square kilometer) - SO2relative pollution potential of SO2
7Multiple Regression Steps in Analysis
- Preliminaries Define the question of interest.
Review the design of the study. Correct errors
in the data. - Explore the data. Use graphical tools, e.g.,
scatterplot matrix consider transformations of
explanatory variables fit a tentative model
check for outliers and influential points. - Formulate an inferential model. Word the
questions of interest in terms of model
parameters.
8Multiple Regression Steps in Analysis Continued
- Check the Model. (a) Check the model assumptions
of linearity, constant variance, normality. (b)
If needed, return to step 2 and make changes to
the model (such as transformations or adding
terms for interaction and curvature) (c) Drop
variables from the model that are not of central
interest and are not significant. - Infer the answers to the questions of interest
using appropriate inferential tools (e.g.,
confidence intervals, hypothesis tests,
prediction intervals). - Presentation Communicate the results to the
intended audience.
9Air Pollution and Mortality
- Question of interest What is the association
between the air pollution variables (NOX and S02)
once environmental variables (precipitation) and
demographic variables have been taken into
account?
10Curvature in relationship between Mortality and
S02. Tukeys Bulging Rule suggests transforming
S02 to log S02 as a possible remedy. The
scatterplot of Mortality vs. NOX is crunched.
When a scatterplot between a response and
explanatory variable crunched, transforming the
explanatory variable to log(explanatory variable)
is a good idea.
11Initial Model
Checking for influential points New Orleans
has Cooks distance of 1.75 and
leverage 0.45gt(36/60). We should remove New
Orleans, noting that it has unusual explanatory
variables and that our conclusions do not apply
to explanatory variables in the range of New
Orleans.
12Because New Orleans is an influential point and
has leverage 0.45gt(36/60)0.30, we remove it and
note that our model does apply to observations in
the range of explanatory variables of New
Orleans.
13(No Transcript)
14Checking the Model
15(No Transcript)
16Model Building
- Model Parsimony If a variable is not of central
interest and is not significant, we remove it
from the model. - We can remove Education. We dont remove log NOX
since it is of central interest.
17Inference About Questions of Interest
- Strong evidence that mortality is positively
associated with S02 for fixed levels of
precipitation, education, nonwhite, NOX. - No strong evidence that mortality is associated
with NOX for fixed levels of precipitation,
education, nonwhite, S02.
18Multiple Regression and Causal Inference
- Goal Figure out what the causal effect on
mortality would be of decreasing air pollution
(and keeping everything else in the world fixed) - Lurking variable A variable that is associated
with both air pollution in a city and mortality
in a city. - In order to figure out whether air pollution
causes mortality, we want to compare mean
mortality among cities with different air
pollution levels but the same values of the
confounding variables. - If we include all of the lurking variables in the
multiple regression model, the coefficient on air
pollution represents the change in the mean of
mortality that is caused by a one unit increase
in air pollution.
19Omitted Variables
- What happens if we omit a lurking variable from
the regression, e.g., percentage of smokers? - Suppose we are interested in the causal effect of
on y and believe that there are lurking
variables - and that
- is the causal effect of on y. If we
omit the confounding variable, , then the
multiple regression will be estimating the
coefficient as the coefficient on .
How different are and .
20Omitted Variables Bias Formula
- Suppose that
- Then
- Formula tells us about direction and magnitude of
bias from omitting a variable in estimating a
causal effect. - Omitted variable bias
- Formula also applies to least squares estimates,
i.e.,