Notes 6: Multiple Linear Regression - PowerPoint PPT Presentation

1 / 90
About This Presentation
Title:

Notes 6: Multiple Linear Regression

Description:

2. Estimates and Plug-in Prediction. 3. Confidence Intervals and Hypothesis Tests ... I might expect the bigger house to have more beds and baths! ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 91
Provided by: rem103
Category:

less

Transcript and Presenter's Notes

Title: Notes 6: Multiple Linear Regression


1
Notes 6 Multiple Linear Regression
1. The Multiple Linear Regression Model 2.
Estimates and Plug-in Prediction 3. Confidence
Intervals and Hypothesis Tests 4. Fits,
residuals, R-squared, and the overall F-test 5.
Categorical Variables as Regressors 6. Issues
with Regression Outliers, Nonlinearities, and
Omitted Variables
The data, regression output, and some of the
plots in these slides can be found in this file.
(MidCity_reg.xls)
2
1. The Multiple Linear Regression Model
The plug-in predictive interval for the price of
a house given its size is quite large. Is this
bad? Not necessarily. There is a lot of
variation in this relationship. Put differently,
you cant accurately predict the price of a house
just based on its size. The width of our
predictive interval reflects this. How can we
predict the price of a house more accurately? If
we know more about a house, we should have a
better idea of its price !!
3
Our data has more variables than just size and
price
(price and size /1000)
The first 7 rows are
x1
x2
x3
y
Before we tried to predict price given size.
Suppose we also know the number of bedrooms and
bathrooms a house has. What is prediction for
price?
Let xij the value of the jth explanatory
variable associated with observation i. So xi1
of bedrooms in the ith house.
In the spreadsheet, xij is the ith row of the jth
column.
4
The Multiple Linear Regression Model
Y is a linear combination of the x variables
error. The error works exactly the same way as in
simple linear regression!! We assume the e are
independent of all the x's. xij is the value of
the jth explanatory variable (or regressor)
associated with observation i. There are k
regressors.
5
How do we interpret this model?
We cant plot the line anymore, but
a is still the intercept, our guess for Y when
all the xs 0.
There are now k slope coefficients bi, one for
each x.
Big difference Each coefficient bi now
describes the change in Y when xi increases by
one unit, holding all of the other xs fixed.
You can think of this as controlling for all of
the other xs.
6
Another way to think about the model
This is the variance of the errors, or how
wrong our guess can be.
This is the guess we would make for Y given
values for each x1, x2, , xk
The conditional distribution of Y given all of
the xs is normal with the mean depending on the
x's through a linear combination.
Notice that s2 has the same interpretation here
as it did in the simple linear regression model.
7
Suppose we model price as depending on nbed,
nbath, and size. Then we have
If we knew a, b1, b2, b3, and s, could we predict
price?
Predict the price of a house that has 3 bedrooms,
2 bathrooms, and total size of 2200 square feet.
Our guess for price would be a b1(3)
b2(2) b3(2200)
Now since we know that e N(0,s2), with 95
probability Price ? a b1(3) b2(2)
b3(2200) ? 2s
8
But again, we dont know the parameters a, b1,
b2, b3, or s. We have to estimate them from
some data.
Given data, we have estimates of a, bi, and s.
a is our estimate of a. b1, b2, b3 are our
estimates of b1, b2, and b3. se is our estimate
of s.
9
2. Estimates and Plug-in Prediction
Here is the output from the regression of price
on size (SqFt), nbed (Bedrooms) and nbath
(Bathrooms) in StatPro
se
a
b1
b2
b3
10
For simple linear regression we wrote down
formulas for a and b. We could do that for
multiple regression, too, but wed need matrix
algebra.
Once we have our intercept and slope estimates,
though, we estimate se just like before.
Define the residual ei as ei yi -
a - b1x1i - b2x2i - - bkxki As before, the
residual is the difference between our guess
for Y and the actual value yi we observed. Also
as before, these estimates a, b1, , bk are
the least squares estimates. They minimize
11
Our estimate of s is just the sample standard
deviation of the residuals ei
Remember for simple regression, k1. So this is
really the same formula. We divide by n-k-1 for
the same reason.
se just asks, on average, how far are our
observed values yi away from the line we fitted?
12
Our estimated relationship is
Price -5.64 10.46nbed 13.55nbath
35.64size /-
2( 20.36)
b1
Interpret (remember UNITS!) With size and
nbath held fixed, how does adding one bedroom
affect the value of the house?
Answer price increases by 10.46 thousands of
dollars.
With nbed and nbath held fixed, adding 1000
square feet increases the price of the house by
35,640.
13
Suppose a house has size 2.2, 3 bedrooms and 2
bathrooms. What is your (estimated) prediction
for its price?
-5.64 10.463 13.552 35.642.2
131.248
a
b1
b2
b3
2se 40.72
131.248 /- 40.72
This is our multiple regression plug-in
predictive interval. We just plug in our
estimates a, b1, b2, b3, and se in place of the
unknown parameters a, b1, b2, b3, and s.
14
Note (1) When we regressed price on size the
coefficient was about 70. Now the coefficient
for size is about 36. WHY? Without nbath and
nbed in the regression, an increase in size can
by associated with an increase in nbath and nbed
in the background. If all I know is that one
house is a lot bigger than another I might expect
the bigger house to have more beds and
baths! With nbath and nbed held fixed, the
effect of size is smaller.
15
Example Suppose I build a 1000 square foot
addition to my house. This addition includes two
bedrooms and one bathroom. How does this affect
the value of my house?
10.462 13.551 35.641 70.11
The value of the house goes up by 70,110. This
is almost exactly the relationship we estimated
before!
But now we can say, if the 1000 square foot
addition is only a basement or screened-in
porch, the increase in value is much smaller.
This is a much more realistic model!!
16
Note (2) Significant coefficients ?
predictive power
With just size, the width of our predictive
interval was 222.467 44.952 With nbath and
nbed added to the model the /- is 220.36
40.72 The additional information makes our
prediction more precise (but not a whole lot in
the case, we still need a "better model"). And
we can do this! We have more info in our
sample, and we might be able to use this info
more efficiently.
17
3. Confidence Intervals and Hypothesis Tests
95 confidence interval for a
estimate /- 2 standard errors AGAIN!!!!!!!
(in Excel)
95 confidence interval for bi
(in Excel)
(recall that k is the number of explanatory
variables in the model)
18
For example, b2 13.55 and
The 95 CI for b2 is 13.55 /- 2(4.22)
Again, StatPro (and nearly every other software
package) prints out the 95 confidence intervals
for the intercept and each slope coefficient.
19
Hypothesis tests on coefficients
To test the null hypothesis
t is the "t statistic" If ngt30 or so, we reject
If the t statistic is bigger than 2 !!
vs.
We reject at level .05 if
Otherwise, we fail to reject.
Intuitively, we reject if estimate is more than 2
se's away from proposed value.
20
Same for the slopes (gee, this looks familiar)
To test the null hypothesis
vs.
We reject at level .05 if
Otherwise, we fail to reject.
Again, we reject if estimate is more than 2 se's
away from proposed value.
21
Example
StatPro automatically prints out the t-statistics
for testing whether the intercept0 and whether
each slope 0, as well as the associated p-values.
e.g., 35.64/10.673.34 gt
reject Ho b30
22
What does this mean?
In this sample, we have evidence that each of our
explanatory variables has a significant impact on
the price of a house.
Even so, adding two variables doesnt help our
predictions that much! Our predictive interval
is still pretty wide.
In many applications we will have LOTS (sometimes
hundreds) of xs. We will want to ask, which
xs really belong in our model?. This is called
model selection. One way to answer this is to
throw out all the xs whose coefficients have
t-stats less than 2. But this isnt necessarily
the BEST way more on this later.
23
Be careful interpreting these tests. Example
1993 data on 50 states and D.C. vcrmrate_93i
Violent crimes per 100,000 population black_93i
proportion of black people in population
Increase the proportion of black people in a
states population by 1 and I predict 28.5 more
violent crimes per 100,000 population
24
metro_93i of states population living in
metro areas unem_93i unemployment rate in
state i pcpolice_93i avg size of police force
per capita in state i prison_93i prison
inmates per 100,000 population
When I control for these other factors, black_93
is no longer significant!
More importantly, Correlation does not imply
causation!!! We should not conclude that police
cause crime!
25
4. Fits, residuals, R-squared, and the overall
F-test
In multiple regression the fit is
"the part of y related to the x's "
as before, the residual is the part left over
Just like for simple regression we would like to
split up Y into two parts and ask how much can
be explained by the xs?
26
In multiple regression, the residuals ei have
sample mean 0 and are uncorrelated with each of
the x's and the fitted values
part of y that has nothing to do with x's
part of y that is explained by xs
27
This is the plot of the residuals from the
multiple regression of price on size, nbath, nbed
vs. the fitted values. We can see the 0
correlation.
Scatterplots of residuals vs. each of the xs
would look similar.
28
So, just as with one x we have
total variation in y
variation explained by x unexplained variation
29
R-squared
the closer R-squared is to 1, the better the fit.
30
In our housing example
31
R2 is also the square of the correlation
between the fitted values, , and the observed
values, y
Regression finds the linear combination of the
x's which is most correlated with y.
(Recall that with just size, the correlation
between fits and y was .553)
So R2 here is just (.663)2 0.439569
32
The "Multiple R" in the StatPro output is the
correlation between y and the fits.
R2 (.663)2 0.439569
33
Aside Model Selection
In general I might have a LOT of xs and Ill
want to ask, which of these variables belong in
my model?
THERE IS NO ONE RIGHT ANSWER TO THIS.
One way is to ask, which of the coefficients are
significant? But we know significant
coefficients dont necessarily mean we will do
better predicting.
Another way might be to ask, what happens to R2
when I add more variables? CAREFUL, though, it
turns out that when you add variables R2 will
NEVER go down!!
34
The overall F-test
The p-value beside "F" is a test of the null
hypothesis
(all the slopes are 0)
We reject the null, at least some of the slopes
are not 0.
35
What does this mean?
I sometimes call the overall F-test the
kitchen sink test. Notice that if the null
hypothesis
is true, then NONE of the xs have ANY
explanatory power in our linear model!!
Weve thrown everything but the kitchen sink at
Y. We want to know, can ANY of our xs predict
Y??
Thats fine. But in practice this test is VERY
sensitive. Youre being maximally skeptical
here, so you will usually reject H0. If on the
other hand we dont reject the null in this test,
we probably need to rethink things!!
36
5. Categorical Variables as Regressors
Here, again, is the first 7 rows of our housing
data
Does whether a house is brick or not affect
the price of the house? This is a categorical
variable. Can we use multiple regression with
categorical x's ?! What about the neighborhood?
(location, location, location!!)
37
Heres the price/size scatterplot again. In this
one, brick houses are in pink. What kind of model
would you like to fit here?
38
Adding a Binary Categorical x
To add "brick" as an explanatory variable in our
regression we create the dummy variable which is
1 if the house is brick and 0 otherwise
the "brick dummy"
. . .
39
Note I created the dummy by using the Excel
formula IF(Brick"Yes",1,0) but we'll see
that StatPro has a nice utility for
creating dummies.
40
As a simple first example, let's regress price
on size and brickdum.
Here is our model
How do you interpret b2 ?
41
What is the expected price of a brick house given
of a given size, s?
(intercept)
(slope)
What is the expected price of a non-brick
house given its size?
(intercept)
(slope)
b2 is the expected difference in price between
a brick and non-brick house controlling for size.
42
Let's try it !!
/- 2se 39.3, this is the best we've done !
what is the brick effect? b2 /- 2 sb
23.4 /- 2(3.7) 23.4 /- 7.4
2
43
We can see the effect of the dummy by
plotting the fitted values vs size.
(StatPro does this for you.) The upper line is
for the brick houses and the lower line is for
the non-brick houses.
44
One more scatterplot with fitted and actual
values. In this case we can really visualize our
modelwe just fit two lines with different
intercepts!! The blue line (nonbrick houses) has
intercept a, the pink lines (brick houses)
intercept is ab2
45
Note You could also create a dummy which was
1 if a house was non brick and 0 if brick. That
would be fine, but the meaning of b2
which change. (Here it would just change
sign.) IMPORTANT You CANNOT put both dummies
in! Given one, the information in the other is
redundant. You will get nasty error messages if
you try to do this!!
46
We can interpret b2 as a shift in the
intercept. But that our model still assumes that
the price difference between a brick and
non-brick house does not depend on the size! In
other words, we are fitting two lines with
different intercepts, but the slopes are still
the same. The two variables do not
"interact". Sometimes we expect variables to
interact. Well get to this next week.
47
Now let's add brick to the regression of price on
size, nbath, and nbed
/- 2se 35.2 Adding brick seems to be a good
idea !!
48
This is a really useful technique! Suppose my
model is Bwghti a b1Faminci
b2Cigsi ei Bwghti birthweight of ith
newborn, in ounces Faminci annual income of
family i, in thousands of Cigsi 1 if mother
smoked during pregnancy, 0 otherwise
What is this telling me about the impact of
smoking on infant health?
Why do you think it is important to have family
income in the model?
49
Adding a Categorical x in General
Let's regress price on size and
neighborhood. This time let's use StatPro's data
utility for creating dummies.
StatPro / Data Utilities / Create Dummy
Variable(s)
50
I used StatPro to create one dummy for each the
neighborhoods.
. . .
eg. Nbhd_1 indicates if the house is in
neighborhood 1 or not
51
Heres the scatterplot broken out by
neighborhoods. Again we might want to fit lines
with different intercepts, particularly for
neighborhood 3.
52
Now we add any TWO of the three dummies. Given
any two, the information in the third
is redundant.
Let's first do price on size and neighborhood
where now I've use N2 to denote the dummy for
neighborhood 2 and same for 3.
53
Our model
b2 difference in intercepts between
neighborhoods 1 and 2 b3 difference in
intercepts between neighborhoods 1 and 3
The neighborhood corresponding to the dummy
we leave out becomes the "base case" we compare
to.
54
Let's try it!
For two houses in neighborhoods 1 and 3 of equal
size, the house in neighborhood 3 is worth
between 34K and 48K more!
/- 2se 30.52 !!
Neighborhood effects are both significant
55
Here is fits vs size. Which line corresponds to
which neighborhood ? Where do you want to live ?
Again we assume size and nbhd do not interact
i.e. the slopes are the same.
56
Aside Omitted Variables
When we just regress size on price, we got a
coefficient of 70. With neighborhood effects
included, the coefficient on size drops to 46.
Why??
Heres the line we would have fitted if we had
ignored neighborhood effects.
57
Aside Omitted Variables
With just size, our data says a bigger house
costs more, but this is partly because a bigger
house is more likely to be in the nicer
neighborhood. If we include the neighborhood in
the regression then we are controlling for
neighborhood effects, so the effect of size looks
smaller.
This happens because size and neighborhood are
correlated!! More on this later
58
ok, let's try price on size, nbed, nbath, brick,
and neighborhood.
Notice, though, that our estimate for the
bedrooms coefficient is no longer significantly
different from zero. This will often happen in
multiple regression.
/- 2se 24 !! Controlling for brick and nbhd
makes our predictions much more accurate.
59
Maybe we don't need bedrooms
Dropping bedrooms did not increase se or
decrease R-Square, so no need to bother with it.
60
Summary Adding a Categorical x
In general to add a categorical x, you can create
dummies, one for each possible category (or level
as we sometimes call it). Use all but one of the
dummies. It does not matter which one you drop
for the fit, but the interpretation of the
coefficients will depend on which one you choose
to drop.
61
Summary of multiple Regression Regression find
s a linear combination of the variables that is
like y.
62
price vs combination of size, nbath, brick, nbhd
price vs size
With more information we can often make more
accurate predictions!
63
The residuals are the part of y not related
to the x's.
64
6. Outliers, Nonlinearities, and Omitted Variables
Regression of murder rates on unemployment rate
and the presence of capital punishment (50 states
over 3 years).
Murder rate doesnt seem correlated with capital
punishment
but seems strongly related to bad economic
conditions.
65
DC, 1990
Heres the scatterplot.
DC, 1993
The three huge outliers are for Washington DC.
DC, 1987
66
Fit looks better but be careful interpreting
this!!
Remember, correlation does not imply causation!!!
Same regression, but without the DC observations.
Murder rates now seem significantly higher in
states with capital punishment. Why did this
change?
DC did not employ capital punishment.
Whether or not we throw these points out, we need
to understand how they affect our results!
67
The Log Transformation
Suppose we have a multiplicative relationship
Here (1n) is a multiplicative error. n is the
percentage error. Often we see this, the size of
the error is a percentage of the expected
response. This is obviously a nonlinear
relationship. Can we estimate this model using
linear regression?
68
Yes! Just take the log of both sides
where a log(c) and e log(1n).
We can estimate a multiplicative relationship by
regressing the log of Y on the log of x.
69
Key Idea Taking the logs turns these
nonlinear relationships into linear ones in terms
of the transformed variables. It also take a
multiplicative (percentage error) and turns it
into the additive error of the regression
model. In practice, logging y and/or x can also
often be a good cure for heteroskedasticity.
And you can always do several different
transformations and compare them.
70
FACT When we do a linear regression, Y a
bX e b tells us when X changes by one
UNIT, by how many UNITS does y change? When we
do the same regression in logs, log(Y) a
b log(X) e b tells us when X changes by
one PERCENT, what is the PERCENTAGE change in Y?
One reason we might want to do this is to avoid
making negative predictions (e.g., housing
prices).
71
Example
Goal relate the brain weight of a mammal to its
body weight. Each observation corresponds to a
mammal species. y brain weight (grams) x body
weight (grams)
Does a linear model make sense ?
72
logy vs logx
Looks pretty nice !! Lets try a linear
regression of log brain weight on log body weight
73
standardized resids vs fits
The big residual is the chinchilla.
Very few people know that the chinchilla is a
master race of supreme intelligence.
74
No.
The book I got this from had chinchilla at 64
grams instead of 6.4 grams (which I found in
another book). The next biggest positive
residual is man, which is what we would have
expected. (Well, maybe before reality TV)
75
We can also model nonlinearities by fitting a
polynomial y polynomial error For example
with two x's we might have
With many x's there are a lot of possibilities!
Note that product terms give us interaction. It
is no longer true that the effect of changing one
x does not depend on the value of the others.
Example Suppose ysalary, x1education,
x2age Including x1x2 in the regression allows
the returns to education to depend on your age.
76
Example Interactions with Dummy Variables
The housing data again.
It makes no sense to square or log a dummy !!!
y price s size N2 dummy for neighborhood 2 N3
dummy for neighborhood 3
model
Slope
interpret
Intercept
77
Fits vs size. Now we see that lines don't have
to be parallel ! But it does not seem that
there is much interaction.
On the other hand the lower slope for the "worst"
neighborhood makes sense !!
78
here is the regression output
what happens if you throw out each variable with
t-statistic less than 2?
79
Omitted Variables (Why do regression
coefficients change?)
Regress price on size
Coefficients Estimate Std. Error t
value Pr(gtt) (Intercept) -10091.130
18966.104 -0.532 0.596 SqFt
70.226 9.426 7.450 1.30e-11
Regress price on size and two dummies for
neighborhood
Coefficients Estimate Std. Error t
value Pr(gtt) (Intercept) 21241.174
13133.642 1.617 0.10835 SqFt
46.386 6.746 6.876 2.67e-10 Nbhd2
10568.698 3301.096 3.202 0.00174 Nbhd3
41535.306 3533.668 11.754 lt 2e-16
When we add the neighborhood, the coefficient
for size drops from 70 to 46. Why?
80
In this plot we have price vs size but, the
points from different neighborhoods are plotted
with different colors and symbols. We also have
the regression lines fit with just the points
in each neighborhood.
Heres the line we would have fitted if we had
ignored neighborhood effects.
81
With just size, our data says a bigger house
costs more, but this is partly because a bigger
house is more likely to be in the nicer
neighborhood. If we include the neighborhood in
the regression then we are controlling for
neighborhood effects, so the effect of size looks
smaller.
This happens because size and neighborhood are
correlated!!
82
Key idea When we add explanatory variables
(xs), to a regression model, the regression
coefficients change. How they change depends on
how the xs are correlated with each other.
83
The Sales Data (Omitted Variable Bias)
In this data we have weekly observations
on sales firms sales in units (in excess of
base level) p1 price charged by our firm (in
excess of base) p2 competitors price (in
excess of base).
p1 p2 Sales 5.13567 5.2042 144.49 3.49
546 8.0597 637.25 7.27534 11.6760 620.79 4.66282 8
.3644 549.01 ... ...
(each row corresponds to a week)
84
If we regress Sales on own price (p1), we obtain
the very surprising conclusion that a higher
price is associated with more sales.
This means we can raise prices and expect to sell
more! WELL BE RICH!!
The regression line has a positive slope !!
85
No. Weve left something out.
Sales on own price
The regression equation is Sales 211
63.7 p1
A multiple regression of Sales on own price (p1)
and competitor's price (p2) yield more sensible
results
The regression equation is Sales 116 -
97.7 p1 109 p2
Remember -97.7 is the affect on sales of a
change in p1 with p2 held fixed. Demand for our
product depends on our price AND our
competitors price, and these prices are
correlated!! We MUST control for our
competitors price when we ask how changing the
price we charge affects sales.
86
How can we see what is going on ?
First look at the scatterplot of p1 versus p2.
In weeks 82 and 99 of our sample, p2 is roughly
the same.
When we look at the relationship between sales
and p1 with p2 held fixed, it is strongly
negative!!
week 99
82
99
week 82
Note the strong relationship between p1 and p2 !!
87
Here we select a subset of points where p1
varies and p2 does is held (approximately)
constant.
Looking at those same points on the Sales vs p1
plot, we see that for a fixed level of p2,
variations in p1 are negatively correlated with
sales!!
88
Different colors indicate different ranges of p2.
for each fixed level of p2 there is a negative
relationship between sales and p1
larger p1 are associated with larger p2
Sales
p1
p2
p1
89
Because p1 and p2 are correlated, when we leave
p2 out of the regression we get a very misleading
answer!
This is known as omitted variable bias.
In this example, since a large p1 is associated
with a large p2, when we regress Sales on just p1
it looks like higher p1 makes sales higher. BUT,
with p2 fixed, a larger p1 leads to lower
sales!! Assuming linearity, the multiple
regression can figure out the effect of p1 with
p2 fixed.
90
LOOK AT YOUR DATA !!!
THINK ABOUT WHAT YOUR MODEL IS SAYING !!!
ALL of the statistical tools weve talked about
are based on models, which make ASSUMPTIONS about
the data we see.
If these model assumptions are violated, your
results can be highly misleading!!
Looking at your data and thinking carefully about
what your model is saying can tell you whether
your results are reasonable!!
Write a Comment
User Comments (0)
About PowerShow.com