Title: Chapter 7 Part 1
1Chapter 7 -Part 1
2Correlation Topics
- Correlational research what is it and how do
you do co-relational research? - The three questions
- Is it a linear or curvilinear correlation?
- Is it a positive or negative relationship?
- How strong is the relationship?
- Solving these questions with t scores and r, the
estimated correlation coefficient derived from
the tx and ty scores of individuals in a random
sample.
3Correlational research how to start.
- To begin a correlational study, we select a
population or, far more frequently, select a
random sample from a population. - We then obtain two scores from each individual,
one score on each of two variables. These are
usually variables that we think might be related
to each other for interesting reasons). We call
one variable X and the other Y.
4Rho and, its estimate, r.
- Since we use samples most of the time, for the
most part, we will use the formulae and symbols
for estimating the correlation in the population
from a sample.) - The actual correlation in the population is
called rho or Pearsons rho. - Your best estimate or rho, derived from a random
sample, is called r or Pearsons r. - Pearson invented the technique.
5Comparing tX tY scores to compute r
- We translate the raw scores on the X variable to
t scores (called tX scores) and raw scores on the
Y variable to tY scores. - So each individual has a pair of scores, a tX
score and a tY score. - You determine how similar or different the tX and
tY scores in the pairs are, on the average, by
subtracting tY from tX, then squaring, summing,
and (kind of) averaging the tX and tY differences.
6The estimated correlation coefficient, Pearsons r
- With a simple formula, you transform the average
squared differences between the tX tY scores to
Pearsons correlation coefficient, r - Pearsons r indicates (with a single number),
both the direction and strength of the
relationship between the two variables in your
sample. - r also estimates the correlation in the
population from which the sample was drawn - In Ch. 8, you will learn when you can use r that
way.
7Going from pairs of raw scores to r Linearity -
A preliminary question.
- Once you have scores on two variables, you
- ask, Is this a linear or curvilinear
relationship? - If you mistake a curvilinear relationship for a
linear one and then use the correlation to
predict values of Y from values of X, you can
wind up predicting that the average 70 year old
will be 13 feet tall! - We dont like making that kind of mistake.
- So you have to watch out for curvilinearity.
8Linearity vs. Curvilinearity
- In a linear relationship, as scores on one
- variable go from low to high, scores on the
- other variable either generally increase or
- generally decrease.
- In a curvilinear relationship, as scores on one
- variable go from low to high, scores on the
- other variable change directions. They can go
- 1.)down and then up, 2.) up and then down, 3.)
up and down and then up again, 4.) up or down
then flat, 5.) and so on. -
9Curvilinearity An example
- New furniture can be fairly expensive.
Alternatively, it is hard to get very much for
used furniture, unless it is very old. At some
point such furniture, if in reasonably good
condition, becomes a set of antiques and can be
worth a good deal of money. - Thus the value of furniture goes from high to
low, then, when enough time has passed from low
to very high. - GRAPH that relationship. See how the line changes
direction.
10Examples of linear relationships.
- For example, think of the relationship of the
size of a pleasure boat (X) and its cost (Y). - As one variable (boat size) increases, scores
- on the other variable (cost) also increase.
- Another example of a linear relationship the
relationship between the size of a car and the
number of miles per gallon it gets. - In general, as cars get gradually larger (X),
they tend to get fewer miles per gallon (Y).
11A curvilinear relationship
- In a curvilinear relationship, as scores on the X
variable go gradually from low to high, the Y
variable changes direction. - For example, think of the relationship between
age (X) and height (Y). - As age increases from 0-14 or so, height
increases also. - But then people stop growing. As age increases,
height stays the same. - Thus the Y variable, height, changes direction.
It goes from gradually rising to flat. - If you graph age and height, the best fitting
line is a curved line.
12Correlation Characteristics Which line best
shows the relationship between age (X) and height
(Y)
Linear vs Curvilinear
13Another non-linear relationship shortstops and
linemen great shortstops may be too small to be
great football lineman.
Football potential Terrible Average Average Very
Good Excellent Good Poor
Is this a linear relationship?
14Plot the dots!
- To check whether a relationship is linear, make a
graph and place the scores on it. - Thats what I mean by Plot the dots.
- If you really want to know what is going on with
data, Plot the dots! - Here is a graph for the baseball skills and
football potential data.
15When you plot the dots, is this linear?
Football Skill
Chuck
Frank
Al
Baseball Skill
Ben
Ed
George
NO! It is best described by a curved line. It is
a curvilinear relationship!
David
16After you know a correlation is linear, there are
other two questions Direction and Strength of a
correlation. But first, a definition of high and
low scores.
- Definition of high and low scores
- High scores are scores above the mean. They are
represented by positive t scores. - Low scores are scores below the mean of each
variable. - They are represented by negative t scores.
17Positive relationships
- In a positive relationship, as X scores gradually
increase, Y scores tend to increase as well.
Example The longer a sailboat is, the more it
tends to cost. As length goes up, price tends to
go up. - In a positive correlation, X and Y scores tend to
be on the same side of their respective means.
Scores below the mean on X are paired with scores
below the mean on Y and scores above the mean on
X tend to be paired with scores above the mean on
Y. - As a result, the tX and tY scores tend to be
similar and the difference between them (tX tY)
tends to be small. - Since (tX tY) is small, the squared difference
between them, (tX tY)2 also tends to be small
18(No Transcript)
19In a positive correlation, the tX and tY scores
are relatively __________, so the difference and
the squared difference between the t scores in
each pair tends to be ________.
20In a positive correlation, the tX and tY scores
are relatively similar, so the difference and the
squared difference between the t scores in each
pair tends to be small (or, to put it another
way, close to zero).
21Graphing a positive relationship.
- In a positive correlation high scores on X tend
to go with high scores on Y. On a graph, as the
line runs from left to right, scores increase on
the X axis. At the same time, Y scores also
generally get higher. So, the line will tend to
rise as it runs. - Remember from math, slope equals how far a line
rises on the Y axis for each unit it moves from
left to right or runs along the X axis. - If a line rises from left to right, rise is
positive. Run is always positive. So a positive
rise divided by an (always) positive run results
in a positive slope. (Thats why we call it a
positive correlation.)
22Positive vs Negative scatterplot
23Graphic display of a strong POSITIVE correlation.
24Negative relationships
- In a negative relationship, as X scores gradually
increase, Y scores tend to decrease. Example
The larger a car is, the fewer miles it tends to
get for each gallon of gas. As size goes up,
miles per gallon tends to go down. - In a negative correlation, X and Y scores tend to
be on opposite sides of their respective means. - As a result, the tX and tY scores tend to be
dissimilar and the difference between them (tX
tY) tends to be large. - Since (tX tY) is large, the squared difference
between them, (tX tY)2 also tends to be large.
25Graphing a negative relationship
- In a negative correlation, high scores on X tend
to go with low scores on Y. On a graph, as the
line runs from left to right, scores increase on
the X axis. At the same time, Y scores get lower.
So, the line will tend to fall as it runs. - Remember from math, slope equals how far a line
rises on the Y axis for each unit it moves from
left to right or runs along the X axis. - If a line falls from left to right, rise is
negative. Run is always positive. So a negative
rise divided by an (always) positive run results
in a negative slope. (Thats why we call it a
negative correlation.)
26Positive vs Negative scatterplot
27Summary
- When t scores are consistently more similar than
different, we have a positive correlation. On a
graph the dots will rise from your left to your
right. So, a best fitting line will have a
positive slope. - When t scores are consistently more different
than similar, we have a negative correlation. On
a graph the dots will fall from your left to your
right. So, a best fitting line will have a
negative slope.
28Positive vs Negative scatterplot
29How strong is the relationship between the tX and
tY scores?
- Here the question is about the consistency with
which tX and tY scores are either similar or
dissimilar.
30t scores sign and size
- There are two aspects to the consistency of the
relationship between tX and tY scores. - First, are the t scores consistently of the same
sign (positive correlation) or opposite signs
(negative correlation). - If they are almost always one way or the other,
you have at least a moderately strong
relationship. - On the other hand, if you sometimes see t scores
on the same side of the mean and sometimes on
opposite sides, you have a relatively weak
correlation.
31t scores sign and size
- If there is a consistent pattern of same signed t
scores (positive correlation) or a consistent
pattern of opposite signed t scores (negative
correlation), then whether the tX and tY scores
are about the same distance from the mean comes
into play. - The large majority of t scores (close to 90),
usually range from 1.50 to 1.50 - Given a consistent positive or negative
correlation, the more similar in size the t
scores, the stronger the correlation. This is
especially true at the extremes (t lt-1.5 or t
gt1.5)
32Positive correlations
- Perfect tX and tY scores are all the same sign
and are identical in size. - Strong tX and tY scores are almost all the same
sign and are fairly similar in size. - Moderate tX and tY scores are predominately the
same sign. This is especially true for pairs in
which one of the values is one or more standard
deviations from the mean. Size may be fairly
dissimilar. - Weak tX and tY scores are a little more often
the same sign than opposite in sign. Nothing can
be said about size.
33Negative correlations
- Perfect tX and tY scores are all of the opposite
sign and are identical in size. - Strong tX and tY scores are almost all of
opposite sign and are fairly similar in size. - Moderate tX and tY scores are predominately
opposite in sign. This is especially true for
pairs in which one of the values is one or more
standard deviations from the mean. Size may be
fairly dissimilar. - Weak tX and tY scores are a little more often of
opposite signs than the same in sign. Nothing can
be said about size.
34Unrelated (independent) variables
- When the size and sign of the tX scores bears no
relationship to the size and sign of the tY
scores, the variables are unrelated. - We also can call the variables independent of
or orthogonal to each other. The three terms,
unrelated, independent and orthogonal are
synonymous in this context.
35Graphing it on t axes The strength of a
relationship tells us approximately how the dots
representing pairs of t scores will fall around a
best fitting line.
- Perfect - scores fall exactly on a straight line
whose slope will be 1.00 or 1.00. - Strong - most scores fall near the line whose
slope will be close to .750 or -.750. - Moderate - some are near the line, some not. The
slope of the line will be close to .500 or -.500.
36Graphing it on t axes The strength of a
relationship tells us approximately how the dots
representing pairs of t scores will fall around a
best fitting line.
- Weak some scores fall fairly close to the line,
but others fall quite far from it. The slope of
the line will be close to .250 or -.250 - Independent - the scores are not close to the
line and form a circular or square pattern. The
best fitting line will be the X axis, a line with
a slope of 0.000.
37Strength of a relationship
38Strength of a relationship
39Strength of a relationship
Moderate
40Strength of a relationship
41Computing the correlation coefficient.
42Comparing apples to oranges? Use Z or t scores!
- You can use correlation to look for the
relationship between ANY two values that you can
measure of a single subject. - However, there may not be any relationship (the
variables may be independent). - A correlation tells us if scores are consistently
similar on two measures, consistently different
from each other, or have no real pattern
43Comparing apples to oranges? Use t scores!
- To compare scores on two different variables, you
transform them into ZX and ZY scores if you are
studying a population or tX and tY scores if you
have a sample. - ZX and ZY scores (or tX and tY scores) can be
directly compared to each other to see whether
they are consistently similar, consistently quite
different, or show no consistent pattern of
similarity or difference
44Comparing variables
- Anxiety symptoms, e.g., heartbeat, with number of
hours driving to class. - Hat size with drawing ability.
- Math ability with verbal ability.
- Number of children with IQ.
- Turn them all into Z or t scores
45Pearsons Correlation Coefficient
- coefficient - noun, a number that serves as a
measure of some property. - The correlation coefficient indexes BOTH the
consistency and direction of a correlation with a
single number
46rho the population parameter
- Pearsons rho (?) is the parameter that
characterizes the strength and direction of a
linear relationship (and only a linear
relationship) between two variables. To compute
rho, you must have the entire population. Then
you can compute sigma, mu, Z scores and rho. - The formula rho 1 -(1/2 ?(ZX - ZY)2 / (NP))
where NP is the number of pairs of Z scores in
the population - In English The correlation coefficient equals 1
minus half the average squared distance between
the pairs of Z scores.
47Pearsons rho
- When you have a perfect positive correlation, the
Z scores will be identical in size and sign. So
the average squared distance will be zero and - rho 1.000-1/2(0.000) 1.000
48Pearsons rho
- When you have a perfect negative correlation, the
Z scores will be identical in size and opposite
in sign. It can be proven algebraically that the
average squared distance in that case will be
4.000 - rho 1.000-1/2(4.000) -1.000
-
49Pearsons rho
- When you have two totally independent variables,
the average squared distance will be 2.000
(halfway between 0.000 and 4.000). - rho 1.000-1/2(2.000) 0.000
50Pearsons Correlation Coefficient
- Thus, rho varies from -1.000 (perfect negative
correlation to 0.000 (independent variables) to
1.000 (perfect positive correlation). - A negative value indicates a negative
relationship a positive value indicates a
positive relationship. - Values of r close to 1.000 or -1.000 indicate a
strong (consistent) relationship values close
to 0.000 indicate a weak (inconsistent) or
independent relationship.
51Estimating rho with r
- Computing rho involves finding the actual average
squared distance between the ZX and ZY scores in
the whole population. - In computing r, we are estimating rho.
52The formula for r
- Pearsons r is a least squares, unbiased estimate
of rho, based on the relationships found between
tX and tY scores in a random sample. - r 1 - (1/2 ?(tX - tY)2 / (nP - 1)) where nP-1
equals one less than the number of pairs of t
scores in the sample. - In English Pearsons r equals 1.000 minus half
the estimated average squared difference between
the Z scores in the population based on squared
differences between the t scores in the sample.
53Look at those formulae again.
- ?(ZX - ZY)2 / (NP) is the average squared
distance between the Z scores. - The rest of the formula, simply transforms the
average squared distance between the Z scores
into a variable that goes from 1.000 to 1.000. - rho 1 -(1/2 ?(ZX - ZY)2 / (NP)) where NP is the
number of pairs of Z scores in the population
54Look at those formulae again.
- . ?(tX - tY)2 / (nP - 1)) is a least squared,
unbiased estimate of the average squared
difference between the Z scores in the population
based on the differences between the tX and tY
scores in a random sample. - The rest of the formula, simply transforms the
estimated average squared distance between the Z
scores into a variable that goes from 1.000 to
1.000. - r 1 - (1/2 ?(tX - tY)2 / (nP - 1)) where nP-1
equals one less than the number of pairs of t
scores in the sample. - REMEMBER, t scores are estimated Z scores
55Thus, r, the least squared, unbiased estimate of
rho, is basically an estimate of the average
squared difference between the ZX and ZY scores
in the population transformed into a variable
that goes from -1.00 to 1.00.
56Similarities of r and rho
- r and rho vary from -1.000 to 1.000.
- For both r and rho, a negative value indicates a
negative relationship a positive value indicates
a positive relationship. - Values of r or rho close to 1.000 or -1.000
indicate a strong (consistent) relationship
values close to 0.000 indicate a weak
(inconsistent) or independent relationship.
57Since we almost always are studying random
samples, not populations, we almost always
compute Pearsons r, not Pearsons rho.
58r, strength and direction
Perfect, positive 1.00 Strong, positive
.75 Moderate, positive .50 Weak, positive
.25 Independent .00 Weak, negative -
.25 Moderate, negative - .50 Strong, negative
- .75 Perfect, negative -1.00
59Calculating Pearsons r
- Select a random sample from a population obtain
scores on two variables, which we will call X and
Y. - Convert all the scores into t scores.
60Calculating Pearsons r
- First, subtract the tY score from the tX score in
each pair. - Then square all of the differences and add them
up, that is, ?(tX - tY)2.
61Calculating Pearsons r
- Estimate the average squared distance between ZX
and ZY by dividing by the sum of squared
differences between the t scores by (nP -
1). ?(tX - tY)2 / (nP - 1) - To turn this estimate into Pearsons r, use the
formula r 1 - (1/2 ?(tX - tY)2 / (nP - 1))
62Example Calculate t scores for X
DATA 2 4 6 8 10
MSW 40.00/(5-1) 10
sX 3.16
63Calculate t scores for Y
DATA 9 11 10 12 13
MSW 10.00/(5-1) 2.50
sY 1.58
64Calculate r
tY -1.26 0.00 -0.63 0.63 1.26
tX -1.26 -0.63 0.00 0.63 1.26
tX - tY 0.00 -0.63 0.63 0.00 0.00
(tX - tY)2 0.00 0.40 0.40 0.00 0.00
This is a very strong, positive relationship.
? (tX - tY)2 / (nP - 1)0.200
r 1.000 - (1/2 (? (tX - tY)2 / (nP - 1)))
r 1.000 - (1/2 .200)
1 - .100 .900
65By the way - True graphs.
- Ch.7 has true graphs, displays in which each dot
stands for a score on two (in this case) or more
(in more advanced cases) variables. - In Ch. 1 through Ch. 6, most of the figures
represented the frequency of scores on a single
variable. - Formally, displays of frequencies are figures,
but they are not graphs.
66Note seeming exception
- Usually we divide a sum of squared deviations
around a mean by df to estimate the variance. - Here the sum of squares is not around a mean and
we are not estimating a variance. - So you divide ?(tX - tY)2 by (nP - 1)
- nP - 1 is not df for corr regression (dfREG
nP - 2)
67End ch. 7, part 1 slides here.