Title: PSY 360
1PSY 360
2Correlation and Regression
- Both examine linear (straight line) relationships
- Both work with a pair of scores, one on each of
two variables, X and Y - Correlation
- Defined as the degree of linear relationship
between X and Y - Is measured/described by the statistic r
- Regression
- Describes the form or function of the linear
relationship between X Y - Is concerned with the prediction of Y from X
- Forms a prediction equation to predict Y from X
3Why do we care?
- Knowing the values of the one most representative
score (central tendency) and a measure of spread
or dispersion (variability) is critical for
DESCRIBING the characteristics of a distribution - However, sometimes we are interested in the
relationship between variables or to be more
precise, how the value of one variable changes
when the value of another variable changes.
Correlation and regression help us understand
these relationships
4Correlation
- The aspect of the data that we
- want to describe/measure is
- the degree of linear relationship
- between X and Y
- The statistic r describes/measures the degree of
linear relationship between X and Y - r?zXzY/N, the average product of z scores for X
and Y - Works with two variables, X and Y
- -1relationships
- Measures only the degree of linear relationship
- r2proportion of variability in Y that is
explained by X - r is undefined if X or Y has zero spread
5Correlation -1
The sign of r shows the type of linear
relationship between X and Y. We can use the
definitionalformula for r and these scatterplots to see
positive, negative and zero relationships
r 1
r 0
r -1
Y Y
Y Y
Y Y
-
-
-
-
X X
X X
X X
6Correlation -1
7Correlation -1
8Correlation -1
9Correlation -1
10Correlation -1
11Interpreting Correlation Coefficients
Size of the correlation coefficient General
Interpretation .8 to 1.0 very strong
relationship .6 to .8 strong relationship
.4 to .6 moderate relationship .2 to
.4 weak relationship .0 to .2 weak or
no relationship
12Interpreting Correlation Coefficients
Size of the correlation coefficient General
Interpretation .8 to 1.0 very strong
relationship .6 to .8 strong relationship
.4 to .6 moderate relationship .2 to
.4 weak relationship .0 to .2 weak or
no relationship
13Interpreting Correlation Coefficients
Size of the correlation coefficient General
Interpretation .8 to 1.0 very strong
relationship .6 to .8 strong relationship
.4 to .6 moderate relationship .2 to
.4 weak relationship .0 to .2 weak or
no relationship
14Interpreting Correlation Coefficients
Size of the correlation coefficient General
Interpretation .8 to 1.0 very strong
relationship .6 to .8 strong relationship
.4 to .6 moderate relationship .2 to
.4 weak relationship .0 to .2 weak or
no relationship
15Interpreting Correlation Coefficients
Size of the correlation coefficient General
Interpretation .8 to 1.0 very strong
relationship .6 to .8 strong relationship
.4 to .6 moderate relationship .2 to
.4 weak relationship .0 to .2 weak or
no relationship
16Correlation Linear
- If there is a curvilinear
- relationship between X and Y,
- then r will not detect it. The value of r
will be zero if there is no linear relationship
between X and Y.
r 0
r 0
17Correlation r2
- r2proportion of variability in Y
- that is explained by X.
- If r.5, r2.25, so the proportion of
- variability in Y that is explained by X is .25
(as a percentage, this shows 25 explained by X,
75 unexplained). - Scatterplots
- r.5, r2.25 r.7, r2.49 r.9,
r2.81 - Venn Diagrams r2 is represented by the
proportion of overlap.
- Y X Y X Y X
18Correlation Undefined
- If there is no spread in X or Y, then r is
undefined. Note that any z is undefined if the
standard deviation is zero, and r?zXzY/N.
sY0
sX0
Y
Y
r is undefined
r is undefined
X
X
19Correlation
- Example of correlation
- Murder rates and ice cream sales are positively
correlated - As murder rates increase, ice cream sales also
increase - Why?
- CORRELATION DOES NOT MEAN CAUSATION
- Murders may cause increases in ice cream sales
- Ice cream sales may cause more murders
- Some other variable may cause both murders and
ice cream sales
20Correlation
- Things to remember
- Correlations can range from -1 to 1
- The absolute value of the correlation coefficient
reflects the strength of the correlation. So a
correlation of -.7 is stronger than a correlation
of .5 - Do not assign a value judgment to the sign of the
correlation. Many students assume that negative
correlations are bad and positive correlations
are good. This is not true! - Population correlation coefficient, ? (rho)
- Impact on r
- Restriction of range
- Extreme scores (outliers)
21Regression
- Not only can we compute the degree to which two
variables are related (correlation coefficient),
but we can use these correlations as the basis
for predicting the value of one variable from the
value of another - Prediction is an activity that computes future
outcomes from present ones. When we want to
predict one variable from another, we need to
first compute the correlation between the two
variables
22Regression
Total High School GPA and First-Year College GPA
are Correlated
23Regression
r .68 for these two variables
24Regression
- Regression is concerned with
- forming a prediction equation to
- predict Y from X
- Uses the formula for a straight line, YbXa
- Y is the predicted Y score on the criterion
variable - b is the slope, b?Y/ ? Xrise/run
- X is a score on the predictor variable
- a is the Y-intercept, where the line crosses the
Y axis, the value of Y when X0 - Example if b.695, a.739, and X3.5,
- then Y .695(3.5).739 3.17
25Regression
Regression line of Y on X
26Regression
Prediction of Y, given X 3.5
27Regression
- Linear only
- Generalize only for X values in your sample
- Actual observed Y is different from Y by an
amount called error, e, that is, YYe - Error in regression is eY-Y
- Many different potential regression lines
28Regression Best Line
- There are many different potential regression
lines, but only one best-fitting line
- The statistics b and a are computed so as to
minimize the sum of squared errors, - ?e2?(Y-Y)2 is a minimum which is called the
Least Squares Criterion. - This means that it minimizes the distance between
each individual point and the regression line
29Regression
- Error in prediction the distance between
each individual data point and the
regression line (a direct
reflection of the correlation
between two variables)
Error in prediction
X 3.3, Y 3.7
30Regression sy.x
- Standard error of estimate is a
- statistic that measures/describes
- spread of errors or Y scores
- in regression.
- syx is the standard deviation of errors in
regression - syx ??e2/(N-2) ??(Y-Y)2/(N-2).
- syx ?(N-1)/(N-2)(sy)?(1-r2)
- As r2 increases, syx decreases. For example, if
N100 and sy4 - r2 syx
- .2 3.94
- .4 3.68
- .6 3.22
- .8 2.41
- .9 1.75
syx is the standard deviation of Y around the
regression line Y
31Regression Partitioning
- Partitioning total variability
- Total Explained Not Explained
- This is true for proportion of spread and amount
of spread. - Proportion 1 r2 (1-r2)
- Amount s2y r2s2y (1-r2)s2y
- Formulas Total Expl.
Not Expl. - Proportion
- Amount
1
r2
1-r2
s2y
r2.s2y
(1-r2)s2y
- Example Total Expl.
Not Expl. - r.7, s2y150, Proportion
- Amount
1
.49
.51
150
73.5
76.5
32- Quiz 1 Quiz 2
- 9.8 10.4
- 8.6 9.9
- 10.0 10.4
- 7.0 7.8
- 9.1 8.5
- 8.1 9.4
- 7.8 9.4
- 6.1 5.4
- 4.8 5.0
- 7.6 8.0
- 7.3 9.5
- 3.8 1.5
- 7.4 3.5
- 5.8 4.2
- 8.8 7.4
- 8.8 5.5
- 6.0 7.9
- 6.1 6.4
Compute the correlation between Quiz 1 and Quiz 2
scores using SPSS. Run a regression model
predicting Quiz 2 scores using Quiz 1 scores.
What is the slope? What is the y-intercept? Based
on the regression results, estimate the Quiz 2
score of someone who earned a 9.0 on Quiz 1.