Title: Basic linear regression and multiple regression
1- Basic linear regression and multiple regression
- Psych 350
- Lecture 12 R. Chris Fraleyhttp//www.yourperson
ality.net/psych350/fall2012/
2Example
- Lets say we wish to model the relationship
between coffee consumption and happiness
3Some Possible Functions
4Lines
- Linear relationships
- Y a bX
- a Y-intercept (the value of Y when X 0)
- b slope (the rise over the run, the steepness
of the line) a weight
Y 1 2X
5Lines and intercepts
- Y a 2X
- Notice that the implied values of Y go up as we
increase a. - By changing a, we are changing the elevation of
the line.
Y 5 2X
Y 3 2X
Y 1 2X
6Lines and slopes
- Slope as rise over run how much of a change in
Y is there given a 1 unit increase in X. - As we move up 1 unit on X, we go up 2 units on Y
- 2/1 2 (the slope)
rise from 1 to 3 (a 2 unit change)
rise
run
move from 0 to 1
Y 1 2X
7Lines and slopes
- Notice that as we increase the slope, b, we
increase the steepness of the line
10
Y 1 4X
5
HAPPINESS
Y 1 2X
0
-5
-4
-2
0
2
4
COFFEE
8Lines and slopes
b4
10
- We can also have negative slopes and slopes of
zero. - When the slope is zero, the predicted values of Y
are equal to a. Y a 0X - Y a
b2
5
HAPPINESS
0
b0
b-2
-5
b-4
-4
-2
0
2
4
COFFEE
9Other functions
- Quadratic function
- Y a bX2
- a still represents the intercept (value of Y when
X 0) - b still represents a weight, and influences the
magnitude of the squaring function
10Quadratic and intercepts
- As we increase a, the elevation of the curve
increases
30
Y 5 1X2
25
20
HAPPINESS
15
10
5
Y 0 1X2
0
-4
-2
0
2
4
COFFEE
11Quadratic and Weight
- When we increase the weight, b, the quadratic
effect is accentuated
120
Y 0 5X2
100
80
HAPPINESS
60
40
20
Y 0 1X2
0
-4
-2
0
2
4
COFFEE
12Quadratic and Weight
- As before, we can have negative weights for
quadratic functions. - In this case, negative values of b flip the curve
upside-down. - As before, when b 0, the value of Y a for
all values of X.
Y 0 5X2
100
Y 0 1X2
50
HAPPINESS
0
Y 0 0X2
-50
Y 0 1X2
-100
Y 0 5X2
-4
-2
0
2
4
COFFEE
13Linear Quadratic Combinations
- When linear and quadratic terms are present in
the same equation, one can derive j-shaped curves - Y a b1X b2X2
14Some terminology
- When the relation between variables are expressed
in this manner, we call the relevant equation(s)
mathematical models - The intercept and weight values are called
parameters of the model. - Although one can describe the relationship
between two variables in the way we have done
here, for now on well assume that our models are
causal models, such that the variable on the
left-hand side of the equation is assumed to be
caused by the variable(s) on the right-hand side.
15Terminology
- The values of Y in these models are often called
predicted values, sometimes abbreviated as Y-hat
or . Why? They are the values of Y that are
implied by the specific parameters of the model.
16Estimation
- Up to this point, we have assumed that our models
are correct. - There are two important issues we need to deal
with, however - Assuming the basic model is correct (e.g.,
linear), what are the correct parameters for the
model? - Is the basic form of the model correct? That is,
is a linear, as opposed to a quadratic, model the
appropriate model for characterizing the
relationship between variables?
17Estimation
- The process of obtaining the correct parameter
values (assuming we are working with the right
model) is called parameter estimation.
18Parameter Estimation example
- Lets assume that we believe there is a linear
relationship between X and Y. - Assume we have collected the following data
- Which set of parameter values will bring us
closest to representing the data accurately?
19Estimation example
- We begin by picking some values, plugging them
into the linear equation, and seeing how well the
implied values correspond to the observed values - We can quantify what we mean by how well by
examining the difference between the
model-implied Y and the actual Y value - this difference, , is often called
error in prediction
20Estimation example
- Lets try a different value of b and see what
happens - Now the implied values of Y are getting closer to
the actual values of Y, but were still off by
quite a bit
21Estimation example
- Things are getting better, but certainly things
could improve
22Estimation example
23Estimation example
- Now thats very nice
- There is a perfect correspondence between the
implied values of Y and the actual values of Y
24Estimation example
- Whoa. Thats a little worse.
- Simply increasing b doesnt seem to make things
increasingly better
25Estimation example
- Ugg. Things are getting worse again.
26Parameter Estimation example
- Here is one way to think about what were doing
- We are trying to find a set of parameter values
that will give us a smallthe smallestdiscrepancy
between the predicted Y values and the actual
values of Y. - How can we quantify this?
27Parameter Estimation example
- One way to do so is to find the difference
between each value of Y and the corresponding
predicted value (we called these differences
errors before), square these differences, and
average them together
28Parameter Estimation example
- The form of this equation should be familiar.
Notice that it represents some kind of average of
squared deviations - This average is often called error variance.
29Parameter Estimation example
- In estimating the parameters of our model, we are
trying to find a set of parameters that minimizes
the error variance. In other words, we want
to be as small as it possibly can be. - The process of finding this minimum value is
called least-squares estimation.
30Parameter Estimation example
- In this graph I have plotted the error variance
as a function of the different parameter values
we chose for b. - Notice that our error was large at first (at b
-2), but got smaller as we made b larger.
Eventually, the error reached a minimum when b
2 and, then, began to increase again as we made b
larger.
Different values of b
31Parameter Estimation example
- The minimum in this example occurred when b 2.
This is the best value of b, when we define
best as the value that minimizes the error
variance. - There is no other value of b that will make the
error smaller. (0 is as low as you can go.)
Different values of b
32Ways to estimate parameters
- The method we just used is sometimes called the
brute force or gradient descent method to
estimating parameters. - More formally, gradient decent involves starting
with viable parameter value, calculating the
error using slightly different value, moving the
best guess parameter value in the direction of
the smallest error, then repeating this process
until the error is as small as it can be. - Analytic methods
- With simple linear models, the equation is so
simple that brute force methods are unnecessary.
33Analytic least-squares estimation
- Specifically, one can use calculus to find the
values of a and b that will minimize the error
function
34Analytic least-squares estimation
- When this is done (we wont actually do the
calculus here ? ), the obtain the following
equations
35Analytic least-squares estimation
- Thus, we can easily find the least-squares
estimates of a and b from simple knowledge of (1)
the correlation between X and Y, (2) the SDs of
X and Y, and (3) the means of X and Y
36A neat fact
- Notice what happens when X and Y are in standard
score form - Thus,
37- In the parameter estimation example, we dealt
with a situation in which a linear model of the
form Y 2 2X perfectly accounted for the data.
(That is, there was no discrepancy between the
values implied by the model and the actual data.) - Even when this is not the case (i.e., when the
model doesnt explain the data perfectly), we can
still find least squares estimates of the
parameters.
38(No Transcript)
39Error Variance
- In this example, the value of b that minimizes
the error variance is also 2. However, even when
b 2, there are discrepancies between the
predictions entailed by the model and the actual
data values. - Thus, the error variance becomes not only a way
to estimate parameters, but a way to evaluate the
basic model itself.
40R-squared
- In short, when the model is a good representation
of the relationship between Y and X, the error
variance of the model should be relatively low. - This is typically quantified by an index called
the multiple R or the squared version of it, R2.
41R-squared
- R-squared represents the proportion of the
variance in Y that is accounted for by the model - When the model doesnt do any better than
guessing the mean, R2 will equal zero. When the
model is perfect (i.e., it accounts for the data
perfectly), R2 will equal 1.00.
42Neat fact
- When dealing with a simple linear model with one
X, R2 is equal to the correlation of X and Y,
squared. - Why? Keep in mind that R2 is in a standardized
metric in virtue of having divided the error
variance by the variance of Y. Previously, when
working with standardized scores in simple linear
regression equations, we found that the parameter
b is equal to r. Since b is estimated via
least-squares techniques, it is directly related
to R2.
43Why is R2 useful?
- R2 is useful because it is a standard metric for
interpreting model fit. - It doesnt matter how large the variance of Y is
because everything is evaluated relative to the
variance of Y - Set end-points 1 is perfect and 0 is as bad as a
model can be.
44Multiple Regression
- In many situations in personality psychology we
are interested in modeling Y not only as a
function of a single X variable, but potentially
many X variables. - Example We might attempt to explain variation in
academic achievement as a function of SES and
maternal education.
45- Y a b1SES b2MATEDU
- Notice that adding a new variable to the model
is simple. This equation states that Y, academic
achievement, is a function of at least two
things, SES and MATEDU.
46- However, what the regression coefficients now
represent is not merely the change in Y expected
given a 1 unit increase in X. They represent the
change in Y given a 1-unit change in X assuming
all the other variables in the equation equal
zero. - In other words, these coefficients are kind of
like partial correlations (technically, they are
called semi-partial correlations). Were
statistically controlling SES when estimating the
effect of MATEDU.
47- Estimating regression coefficients in SPSS
- Correlations
- SES MATEDU ACHIEVEG5
- SES 1.00 .542 .279
- MATEDU .542 1.00 .364
- ACHIEVEG5 .279 .364 1.00
-
48(No Transcript)
49(No Transcript)
50Note The regression parameter estimates are in
the column labeled B. Constant a intercept
51Achievement 76.86 1.443MATEDU .539SES
52- These parameter estimates imply that moving up
one unit on SES leads to a 1.4 unit increase on
achievement. - Moreover, moving up 1 unit in maternal education
corresponds to a half-unit increase in
achievement.
53- Does this mean that Maternal Education matters
more than SES in predicting educational
achievement? - Not necessarily. As it stands, the two variables
might be on very different metrics. (Perhaps
MATEDU ranges from 0 to 20 and SES ranges from 0
to 4.) To evaluate their relative contributions
to Y, one can standardize both variables or
examine standardized regression coefficients.
54Z(Achievement) 0 .301Z(MATEDU) .118Z(SES)
55The multiple R and the R squared for the full
model are listed here. This particular model
explains 14 of the variance in academic
achievement
56Adding SESSES (SES2) improves R-squared by about
1 These parameters suggest that higher SES
predicts higher achievement, but in a limiting
way. There are diminishing returns on the high
end of SES.
57SES a B1MATEDU B2SES B3SESSES Y-hat
-2 0 .2560 .436-2 -.320-2-2 -2.15
-1 0 .2560 .436-1 -.320-1-1 -0.76
0 0 .2560 .4360 -.32000 0.00
1 0 .2560 .4361 -.32011 0.12
2 0 .2560 .4362 -.32022 -0.41
58Predicted Z(Achievement)
Z(SES)