Title: Pearson's correlation
1Pearson's correlation
Diane S. Mendoza
2- It is named after Karl Pearson who developed the
correlational method to do agricultural research.
- designated by the Greek letter rho (?)
- The product moment part of the name comes from
the way in which it is calculated, by summing up
the products of the deviations of the scores from
the mean. - A correlation is a number between -1 and 1 that
measures the degree of association between two
variables (call them X and Y). - A positive value for the correlation implies a
positive association - A negative value for the correlation implies a
negative or inverse association
3The formula for the Pearson correlation
Suppose we have two variables X and Y, with means
XBAR and YBAR respectively and standard
deviations SX and SY respectively. The
correlation is computed as
as the sum of the product of the Z-scores for the
two variables divided by the number of scores.
4If we substitute the formulas for the Z-scores
into this formula we get the following formula
for the Pearson Product Moment Correlation
Coefficient, which we will use as a definitional
The numerator of this formula says that we sum up
the products of the deviations of a subject's X
score from the mean of the Xs and the deviation
of the subject's Y score from the mean of the Ys.
This summation of the product of the deviation
scores is divided by the number of subjects times
the standard deviation of the X variable times
the standard deviation of the Y variable
5- When will a correlation be positive?
- Suppose that an X value was above average, and
that the associated Y value was also above
average. Then the product would be the product of
two positive numbers which would be positive. - If the X value and the Y value were both below
average, then the product above would be of two
negative numbers, which would also be positive. - Therefore, a positive correlation is evidence of
a general tendency that large values of X are
associated with large values of Y and small
values of X are associated with small values of Y.
6- When will a correlation be negative?
- Suppose that an X value was above average, and
that the associated Y value was instead below
average. Then the product would be the product of
a positive and a negative number which would make
the product negative. - If the X value was below average and the Y value
was above average, then the product above would
be also be negative. - Therefore, a negative correlation is evidence of
a general tendency that large values of X are
associated with small values of Y and small
values of X are associated with large values of Y.
7Interpretation of the correlation
coefficient The correlation coefficient measures
the strength of a linear relationship between two
variables. The correlation coefficient is always
between -1 and 1. The closer the correlation is
to /-1, the closer to a perfect linear
relationship. Here is to interpret
correlations. -1.0 to -0.7 strong negative
association. -0.7 to -0.3 weak negative
association. -0.3 to 0.3 little or no
association. 0.3 to 0.7 weak positive
association. 0.7 to 1.0 strong positive
8- Let's calculate the correlation between Reading
(X) and Spelling (Y) for the 10 students. There
is a fair amount of calculation required as you
can see from the table below. First we have to
sum up the X values (55) and then divide this
number by the number of subjects (10) to find the
mean for the X values (5.5). Then we have to do
the same thing with the Y values to find their
mean (10.3).
We then calculate
The correlation we obtained was -.36, showing us
that there is a small negative correlation
between reading and spelling. The correlation
coefficient is a number that can range from -1
(perfect negative correlation) through 0 (no
correlation) to 1 (perfect positive correlation).
10The computational formula for the Pearsonian r is
- By looking at the formula we can see that we need
the following items to calculate r using the raw
score formula - The number of subjects, N
- The sum of each subjects X score times the Y
score, summation XY - The sum of the X scores, summation X
- The sum of the Y scores, summation Y
- The sum of the squared X scores, summation X
squared - The sum of the squared Y scores, summation Y
11(No Transcript)
12In we plug each of these sums into the raw score
formula we can calculate the correlation
We can see that we got the same answer for the
correlation coefficient (-.36) with the raw score
formula as we did with the definitional formula.