Basic linear regression and multiple regression - PowerPoint PPT Presentation

About This Presentation
Title:

Basic linear regression and multiple regression

Description:

Basic linear regression and multiple regression Psych 437 - Fraley Example Let s say we wish to model the relationship between coffee consumption and happiness Some ... – PowerPoint PPT presentation

Number of Views:604
Avg rating:3.0/5.0
Slides: 59
Provided by: Psychol72
Category:

less

Transcript and Presenter's Notes

Title: Basic linear regression and multiple regression


1
Basic linear regression and multiple regression
  • Psych 437 - Fraley

2
Example
  • Lets say we wish to model the relationship
    between coffee consumption and happiness

3
Some Possible Functions
4
Lines
  • Linear relationships
  • Y a bX
  • a Y-intercept (the value of Y when X 0)
  • b slope (the rise over the run, the steepness
    of the line) a weight

Y 1 2X
5
Lines and intercepts
  • Y a 2X
  • Notice that the implied values of Y go up as we
    increase a.
  • By changing a, we are changing the elevation of
    the line.

Y 5 2X
Y 3 2X
Y 1 2X
6
Lines and slopes
  • Slope as rise over run how much of a change in
    Y is there given a 1 unit increase in X.
  • As we move up 1 unit on X, we go up 2 units on Y
  • 2/1 2 (the slope)

rise from 1 to 3 (a 2 unit change)
rise
run
move from 0 to 1
Y 1 2X
7
Lines and slopes
  • Notice that as we increase the slope, b, we
    increase the steepness of the line

10
Y 1 4X
5
HAPPINESS
Y 1 2X
0
-5
-4
-2
0
2
4
COFFEE
8
Lines and slopes
b4
10
  • We can also have negative slopes and slopes of
    zero.
  • When the slope is zero, the predicted values of Y
    are equal to a. Y a 0X
  • Y a

b2
5
HAPPINESS
0
b0
b-2
-5
b-4
-4
-2
0
2
4
COFFEE
9
Other functions
  • Quadratic function
  • Y a bX2
  • a still represents the intercept (value of Y when
    X 0)
  • b still represents a weight, and influences the
    magnitude of the squaring function

10
Quadratic and intercepts
  • As we increase a, the elevation of the curve
    increases

30
Y 5 1X2
25
20
HAPPINESS
15
10
5
Y 0 1X2
0
-4
-2
0
2
4
COFFEE
11
Quadratic and Weight
  • When we increase the weight, b, the quadratic
    effect is accentuated

120
Y 0 5X2
100
80
HAPPINESS
60
40
20
Y 0 1X2
0
-4
-2
0
2
4
COFFEE
12
Quadratic and Weight
  • As before, we can have negative weights for
    quadratic functions.
  • In this case, negative values of b flip the curve
    upside-down.
  • As before, when b 0, the value of Y a for
    all values of X.

Y 0 5X2
100
Y 0 1X2
50
HAPPINESS
0
Y 0 0X2
-50
Y 0 1X2
-100
Y 0 5X2
-4
-2
0
2
4
COFFEE
13
Linear Quadratic Combinations
  • When linear and quadratic terms are present in
    the same equation, one can derive j-shaped curves
  • Y a b1X b2X2

14
Some terminology
  • When the relation between variables are expressed
    in this manner, we call the relevant equation(s)
    mathematical models
  • The intercept and weight values are called
    parameters of the model.
  • Although one can describe the relationship
    between two variables in the way we have done
    here, for now on well assume that our models are
    causal models, such that the variable on the
    left-hand side of the equation is assumed to be
    caused by the variable(s) on the right-hand side.

15
Terminology
  • The values of Y in these models are often called
    predicted values, sometimes abbreviated as Y-hat
    or . Why? They are the values of Y that are
    implied by the specific parameters of the model.

16
Estimation
  • Up to this point, we have assumed that our models
    are correct.
  • There are two important issues we need to deal
    with, however
  • Assuming the basic model is correct (e.g.,
    linear), what are the correct parameters for the
    model?
  • Is the basic form of the model correct? That is,
    is a linear, as opposed to a quadratic, model the
    appropriate model for characterizing the
    relationship between variables?

17
Estimation
  • The process of obtaining the correct parameter
    values (assuming we are working with the right
    model) is called parameter estimation.

18
Parameter Estimation example
  • Lets assume that we believe there is a linear
    relationship between X and Y.
  • Assume we have collected the following data
  • Which set of parameter values will bring us
    closest to representing the data accurately?

19
Estimation example
  • We begin by picking some values, plugging them
    into the linear equation, and seeing how well the
    implied values correspond to the observed values
  • We can quantify what we mean by how well by
    examining the difference between the
    model-implied Y and the actual Y value
  • this difference, , is often called
    error in prediction

20
Estimation example
  • Lets try a different value of b and see what
    happens
  • Now the implied values of Y are getting closer to
    the actual values of Y, but were still off by
    quite a bit

21
Estimation example
  • Things are getting better, but certainly things
    could improve

22
Estimation example
  • Ah, much better

23
Estimation example
  • Now thats very nice
  • There is a perfect correspondence between the
    implied values of Y and the actual values of Y

24
Estimation example
  • Whoa. Thats a little worse.
  • Simply increasing b doesnt seem to make things
    increasingly better

25
Estimation example
  • Ugg. Things are getting worse again.

26
Parameter Estimation example
  • Here is one way to think about what were doing
  • We are trying to find a set of parameter values
    that will give us a smallthe smallestdiscrepancy
    between the predicted Y values and the actual
    values of Y.
  • How can we quantify this?

27
Parameter Estimation example
  • One way to do so is to find the difference
    between each value of Y and the corresponding
    predicted value (we called these differences
    errors before), square these differences, and
    average them together

28
Parameter Estimation example
  • The form of this equation should be familiar.
    Notice that it represents some kind of average of
    squared deviations
  • This average is often called error variance.

29
Parameter Estimation example
  • In estimating the parameters of our model, we are
    trying to find a set of parameters that minimizes
    the error variance. In other words, we want
    to be as small as it possibly can be.
  • The process of finding this minimum value is
    called least-squares estimation.

30
Parameter Estimation example
  • In this graph I have plotted the error variance
    as a function of the different parameter values
    we chose for b.
  • Notice that our error was large at first (at b
    -2), but got smaller as we made b larger.
    Eventually, the error reached a minimum when b
    2 and, then, began to increase again as we made b
    larger.

Different values of b
31
Parameter Estimation example
  • The minimum in this example occurred when b 2.
    This is the best value of b, when we define
    best as the value that minimizes the error
    variance.
  • There is no other value of b that will make the
    error smaller. (0 is as low as you can go.)

Different values of b
32
Ways to estimate parameters
  • The method we just used is sometimes called the
    brute force or gradient descent method to
    estimating parameters.
  • More formally, gradient decent involves starting
    with viable parameter value, calculating the
    error using slightly different value, moving the
    best guess parameter value in the direction of
    the smallest error, then repeating this process
    until the error is as small as it can be.
  • Analytic methods
  • With simple linear models, the equation is so
    simple that brute force methods are unnecessary.

33
Analytic least-squares estimation
  • Specifically, one can use calculus to find the
    values of a and b that will minimize the error
    function

34
Analytic least-squares estimation
  • When this is done (we wont actually do the
    calculus here ? ), the obtain the following
    equations

35
Analytic least-squares estimation
  • Thus, we can easily find the least-squares
    estimates of a and b from simple knowledge of (1)
    the correlation between X and Y, (2) the SDs of
    X and Y, and (3) the means of X and Y

36
A neat fact
  • Notice what happens when X and Y are in standard
    score form
  • Thus,

37
  • In the parameter estimation example, we dealt
    with a situation in which a linear model of the
    form Y 2 2X perfectly accounted for the data.
    (That is, there was no discrepancy between the
    values implied by the model and the actual data.)
  • Even when this is not the case (i.e., when the
    model doesnt explain the data perfectly), we can
    still find least squares estimates of the
    parameters.

38
(No Transcript)
39
Error Variance
  • In this example, the value of b that minimizes
    the error variance is also 2. However, even when
    b 2, there are discrepancies between the
    predictions entailed by the model and the actual
    data values.
  • Thus, the error variance becomes not only a way
    to estimate parameters, but a way to evaluate the
    basic model itself.

40
R-squared
  • In short, when the model is a good representation
    of the relationship between Y and X, the error
    variance of the model should be relatively low.
  • This is typically quantified by an index called
    the multiple R or the squared version of it, R2.

41
R-squared
  • R-squared represents the proportion of the
    variance in Y that is accounted for by the model
  • When the model doesnt do any better than
    guessing the mean, R2 will equal zero. When the
    model is perfect (i.e., it accounts for the data
    perfectly), R2 will equal 1.00.

42
Neat fact
  • When dealing with a simple linear model with one
    X, R2 is equal to the correlation of X and Y,
    squared.
  • Why? Keep in mind that R2 is in a standardized
    metric in virtue of having divided the error
    variance by the variance of Y. Previously, when
    working with standardized scores in simple linear
    regression equations, we found that the parameter
    b is equal to r. Since b is estimated via
    least-squares techniques, it is directly related
    to R2.

43
Why is R2 useful?
  • R2 is useful because it is a standard metric for
    interpreting model fit.
  • It doesnt matter how large the variance of Y is
    because everything is evaluated relative to the
    variance of Y
  • Set end-points 1 is perfect and 0 is as bad as a
    model can be.

44
Multiple Regression
  • In many situations in personality psychology we
    are interested in modeling Y not only as a
    function of a single X variable, but potentially
    many X variables.
  • Example We might attempt to explain variation in
    academic achievement as a function of SES and
    maternal education.

45
  • Y a b1SES b2MATEDU
  • Notice that adding a new variable to the model
    is simple. This equation states that academic
    achievement is a function of at least two things,
    SES and MATEDU.

46
  • However, what the regression coefficients now
    represent is not merely the change in Y expected
    given a 1 unit increase in X. They represent the
    change in Y given a 1-unit change in X assuming
    all the other variables in the equation equal
    zero.
  • In other words, these coefficients are kind of
    like partial correlations (technically, they are
    called semi-partial correlations). Were
    statistically controlling SES when estimating the
    effect of MATEDU.

47
  • Estimating regression coefficients in SPSS
  • Correlations
  • SES MATEDU ACHIEVEG5
  • SES 1.00 .542 .279
  • MATEDU .542 1.00 .364
  • ACHIEVEG5 .279 .364 1.00

48
(No Transcript)
49
(No Transcript)
50
Note The regression parameter estimates are in
the column labeled B. Constant a intercept
51
Achievement 76.86 1.443MATEDU .539SES
52
  • These parameter estimates imply that moving up
    one unit on SES leads to a 1.4 unit increase on
    achievement.
  • Moreover, moving up 1 unit in maternal education
    corresponds to a half-unit increase in
    achievement.

53
  • Does this mean that Maternal Education matters
    more than SES in predicting educational
    achievement?
  • Not necessarily. As it stands, the two variables
    might be on very different metrics. (Perhaps
    MATEDU ranges from 0 to 20 and SES ranges from 0
    to 4.) To evaluate their relative contributions
    to Y, one can standardize both variables or
    examine standardized regression coefficients.

54
Z(Achievement) 0 .301Z(MATEDU) .118Z(SES)
55
The multiple R and the R squared for the full
model are listed here. This particular model
explains 14 of the variance in academic
achievement
56
Adding SESSES (SES2) improves R-squared by about
1 These parameters suggest that higher SES
predicts higher achievement, but in a limiting
way. There are diminishing returns on the high
end of SES.
57
SES a B1MATEDU B2SES B3SESSES Y-hat
-2 0 .2560 .436-2 -.320-2-2 -2.15
-1 0 .2560 .436-1 -.320-1-1 -0.76
0 0 .2560 .4360 -.32000 0.00
1 0 .2560 .4361 -.32011 0.12
2 0 .2560 .4362 -.32022 -0.41
58
Predicted Z(Achievement)
Z(SES)
Write a Comment
User Comments (0)
About PowerShow.com