Chapter 7 Part 1 - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Chapter 7 Part 1

Description:

How strong is the relationship? Solving these questions with t scores and ... good condition, becomes a set of antiques and can be worth a good deal of money. ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 68
Provided by: john77
Category:

less

Transcript and Presenter's Notes

Title: Chapter 7 Part 1


1
Chapter 7 -Part 1
  • Correlation

2
Correlation Topics
  • Correlational research what is it and how do
    you do co-relational research?
  • The three questions
  • Is it a linear or curvilinear correlation?
  • Is it a positive or negative relationship?
  • How strong is the relationship?
  • Solving these questions with t scores and r, the
    estimated correlation coefficient derived from
    the tx and ty scores of individuals in a random
    sample.

3
Correlational research how to start.
  • To begin a correlational study, we select a
    population or, far more frequently, select a
    random sample from a population.
  • We then obtain two scores from each individual,
    one score on each of two variables. These are
    usually variables that we think might be related
    to each other for interesting reasons). We call
    one variable X and the other Y.

4
Rho and, its estimate, r.
  • Since we use samples most of the time, for the
    most part, we will use the formulae and symbols
    for estimating the correlation in the population
    from a sample.)
  • The actual correlation in the population is
    called rho or Pearsons rho.
  • Your best estimate or rho, derived from a random
    sample, is called r or Pearsons r.
  • Pearson invented the technique.

5
Comparing tX tY scores to compute r
  • We translate the raw scores on the X variable to
    t scores (called tX scores) and raw scores on the
    Y variable to tY scores.
  • So each individual has a pair of scores, a tX
    score and a tY score.
  • You determine how similar or different the tX and
    tY scores in the pairs are, on the average, by
    subtracting tY from tX, then squaring, summing,
    and (kind of) averaging the tX and tY differences.

6
The estimated correlation coefficient, Pearsons r
  • With a simple formula, you transform the average
    squared differences between the tX tY scores to
    Pearsons correlation coefficient, r
  • Pearsons r indicates (with a single number),
    both the direction and strength of the
    relationship between the two variables in your
    sample.
  • r also estimates the correlation in the
    population from which the sample was drawn
  • In Ch. 8, you will learn when you can use r that
    way.

7
Going from pairs of raw scores to r Linearity -
A preliminary question.
  • Once you have scores on two variables, you
  • ask, Is this a linear or curvilinear
    relationship?
  • If you mistake a curvilinear relationship for a
    linear one and then use the correlation to
    predict values of Y from values of X, you can
    wind up predicting that the average 70 year old
    will be 13 feet tall!
  • We dont like making that kind of mistake.
  • So you have to watch out for curvilinearity.

8
Linearity vs. Curvilinearity
  • In a linear relationship, as scores on one
  • variable go from low to high, scores on the
  • other variable either generally increase or
  • generally decrease.
  • In a curvilinear relationship, as scores on one
  • variable go from low to high, scores on the
  • other variable change directions. They can go
  • 1.)down and then up, 2.) up and then down, 3.)
    up and down and then up again, 4.) up or down
    then flat, 5.) and so on.

9
Curvilinearity An example
  • New furniture can be fairly expensive.
    Alternatively, it is hard to get very much for
    used furniture, unless it is very old. At some
    point such furniture, if in reasonably good
    condition, becomes a set of antiques and can be
    worth a good deal of money.
  • Thus the value of furniture goes from high to
    low, then, when enough time has passed from low
    to very high.
  • GRAPH that relationship. See how the line changes
    direction.

10
Examples of linear relationships.
  • For example, think of the relationship of the
    size of a pleasure boat (X) and its cost (Y).
  • As one variable (boat size) increases, scores
  • on the other variable (cost) also increase.
  • Another example of a linear relationship the
    relationship between the size of a car and the
    number of miles per gallon it gets.
  • In general, as cars get gradually larger (X),
    they tend to get fewer miles per gallon (Y).

11
A curvilinear relationship
  • In a curvilinear relationship, as scores on the X
    variable go gradually from low to high, the Y
    variable changes direction.
  • For example, think of the relationship between
    age (X) and height (Y).
  • As age increases from 0-14 or so, height
    increases also.
  • But then people stop growing. As age increases,
    height stays the same.
  • Thus the Y variable, height, changes direction.
    It goes from gradually rising to flat.
  • If you graph age and height, the best fitting
    line is a curved line.

12
Correlation Characteristics Which line best
shows the relationship between age (X) and height
(Y)
Linear vs Curvilinear
13
Another non-linear relationship shortstops and
linemen great shortstops may be too small to be
great football lineman.
Football potential Terrible Average Average Very
Good Excellent Good Poor
Is this a linear relationship?
14
Plot the dots!
  • To check whether a relationship is linear, make a
    graph and place the scores on it.
  • Thats what I mean by Plot the dots.
  • If you really want to know what is going on with
    data, Plot the dots!
  • Here is a graph for the baseball skills and
    football potential data.

15
When you plot the dots, is this linear?
Football Skill
Chuck
Frank
Al
Baseball Skill
Ben
Ed
George
NO! It is best described by a curved line. It is
a curvilinear relationship!
David
16
After you know a correlation is linear, there are
other two questions Direction and Strength of a
correlation. But first, a definition of high and
low scores.
  • Definition of high and low scores
  • High scores are scores above the mean. They are
    represented by positive t scores.
  • Low scores are scores below the mean of each
    variable.
  • They are represented by negative t scores.

17
Positive relationships
  • In a positive relationship, as X scores gradually
    increase, Y scores tend to increase as well.
    Example The longer a sailboat is, the more it
    tends to cost. As length goes up, price tends to
    go up.
  • In a positive correlation, X and Y scores tend to
    be on the same side of their respective means.
    Scores below the mean on X are paired with scores
    below the mean on Y and scores above the mean on
    X tend to be paired with scores above the mean on
    Y.
  • As a result, the tX and tY scores tend to be
    similar and the difference between them (tX tY)
    tends to be small.
  • Since (tX tY) is small, the squared difference
    between them, (tX tY)2 also tends to be small

18
(No Transcript)
19
In a positive correlation, the tX and tY scores
are relatively __________, so the difference and
the squared difference between the t scores in
each pair tends to be ________.
20
In a positive correlation, the tX and tY scores
are relatively similar, so the difference and the
squared difference between the t scores in each
pair tends to be small (or, to put it another
way, close to zero).
21
Graphing a positive relationship.
  • In a positive correlation high scores on X tend
    to go with high scores on Y. On a graph, as the
    line runs from left to right, scores increase on
    the X axis. At the same time, Y scores also
    generally get higher. So, the line will tend to
    rise as it runs.
  • Remember from math, slope equals how far a line
    rises on the Y axis for each unit it moves from
    left to right or runs along the X axis.
  • If a line rises from left to right, rise is
    positive. Run is always positive. So a positive
    rise divided by an (always) positive run results
    in a positive slope. (Thats why we call it a
    positive correlation.)

22
Positive vs Negative scatterplot
23
Graphic display of a strong POSITIVE correlation.
24
Negative relationships
  • In a negative relationship, as X scores gradually
    increase, Y scores tend to decrease. Example
    The larger a car is, the fewer miles it tends to
    get for each gallon of gas. As size goes up,
    miles per gallon tends to go down.
  • In a negative correlation, X and Y scores tend to
    be on opposite sides of their respective means.
  • As a result, the tX and tY scores tend to be
    dissimilar and the difference between them (tX
    tY) tends to be large.
  • Since (tX tY) is large, the squared difference
    between them, (tX tY)2 also tends to be large.

25
Graphing a negative relationship
  • In a negative correlation, high scores on X tend
    to go with low scores on Y. On a graph, as the
    line runs from left to right, scores increase on
    the X axis. At the same time, Y scores get lower.
    So, the line will tend to fall as it runs.
  • Remember from math, slope equals how far a line
    rises on the Y axis for each unit it moves from
    left to right or runs along the X axis.
  • If a line falls from left to right, rise is
    negative. Run is always positive. So a negative
    rise divided by an (always) positive run results
    in a negative slope. (Thats why we call it a
    negative correlation.)

26
Positive vs Negative scatterplot
27
Summary
  • When t scores are consistently more similar than
    different, we have a positive correlation. On a
    graph the dots will rise from your left to your
    right. So, a best fitting line will have a
    positive slope.
  • When t scores are consistently more different
    than similar, we have a negative correlation. On
    a graph the dots will fall from your left to your
    right. So, a best fitting line will have a
    negative slope.

28
Positive vs Negative scatterplot
29
How strong is the relationship between the tX and
tY scores?
  • Here the question is about the consistency with
    which tX and tY scores are either similar or
    dissimilar.

30
t scores sign and size
  • There are two aspects to the consistency of the
    relationship between tX and tY scores.
  • First, are the t scores consistently of the same
    sign (positive correlation) or opposite signs
    (negative correlation).
  • If they are almost always one way or the other,
    you have at least a moderately strong
    relationship.
  • On the other hand, if you sometimes see t scores
    on the same side of the mean and sometimes on
    opposite sides, you have a relatively weak
    correlation.

31
t scores sign and size
  • If there is a consistent pattern of same signed t
    scores (positive correlation) or a consistent
    pattern of opposite signed t scores (negative
    correlation), then whether the tX and tY scores
    are about the same distance from the mean comes
    into play.
  • The large majority of t scores (close to 90),
    usually range from 1.50 to 1.50
  • Given a consistent positive or negative
    correlation, the more similar in size the t
    scores, the stronger the correlation. This is
    especially true at the extremes (t lt-1.5 or t
    gt1.5)

32
Positive correlations
  • Perfect tX and tY scores are all the same sign
    and are identical in size.
  • Strong tX and tY scores are almost all the same
    sign and are fairly similar in size.
  • Moderate tX and tY scores are predominately the
    same sign. This is especially true for pairs in
    which one of the values is one or more standard
    deviations from the mean. Size may be fairly
    dissimilar.
  • Weak tX and tY scores are a little more often
    the same sign than opposite in sign. Nothing can
    be said about size.

33
Negative correlations
  • Perfect tX and tY scores are all of the opposite
    sign and are identical in size.
  • Strong tX and tY scores are almost all of
    opposite sign and are fairly similar in size.
  • Moderate tX and tY scores are predominately
    opposite in sign. This is especially true for
    pairs in which one of the values is one or more
    standard deviations from the mean. Size may be
    fairly dissimilar.
  • Weak tX and tY scores are a little more often of
    opposite signs than the same in sign. Nothing can
    be said about size.

34
Unrelated (independent) variables
  • When the size and sign of the tX scores bears no
    relationship to the size and sign of the tY
    scores, the variables are unrelated.
  • We also can call the variables independent of
    or orthogonal to each other. The three terms,
    unrelated, independent and orthogonal are
    synonymous in this context.

35
Graphing it on t axes The strength of a
relationship tells us approximately how the dots
representing pairs of t scores will fall around a
best fitting line.
  • Perfect - scores fall exactly on a straight line
    whose slope will be 1.00 or 1.00.
  • Strong - most scores fall near the line whose
    slope will be close to .750 or -.750.
  • Moderate - some are near the line, some not. The
    slope of the line will be close to .500 or -.500.

36
Graphing it on t axes The strength of a
relationship tells us approximately how the dots
representing pairs of t scores will fall around a
best fitting line.
  • Weak some scores fall fairly close to the line,
    but others fall quite far from it. The slope of
    the line will be close to .250 or -.250
  • Independent - the scores are not close to the
    line and form a circular or square pattern. The
    best fitting line will be the X axis, a line with
    a slope of 0.000.

37
Strength of a relationship
38
Strength of a relationship
39
Strength of a relationship
Moderate
40
Strength of a relationship
41
Computing the correlation coefficient.
42
Comparing apples to oranges? Use Z or t scores!
  • You can use correlation to look for the
    relationship between ANY two values that you can
    measure of a single subject.
  • However, there may not be any relationship (the
    variables may be independent).
  • A correlation tells us if scores are consistently
    similar on two measures, consistently different
    from each other, or have no real pattern

43
Comparing apples to oranges? Use t scores!
  • To compare scores on two different variables, you
    transform them into ZX and ZY scores if you are
    studying a population or tX and tY scores if you
    have a sample.
  • ZX and ZY scores (or tX and tY scores) can be
    directly compared to each other to see whether
    they are consistently similar, consistently quite
    different, or show no consistent pattern of
    similarity or difference

44
Comparing variables
  • Anxiety symptoms, e.g., heartbeat, with number of
    hours driving to class.
  • Hat size with drawing ability.
  • Math ability with verbal ability.
  • Number of children with IQ.
  • Turn them all into Z or t scores

45
Pearsons Correlation Coefficient
  • coefficient - noun, a number that serves as a
    measure of some property.
  • The correlation coefficient indexes BOTH the
    consistency and direction of a correlation with a
    single number

46
rho the population parameter
  • Pearsons rho (?) is the parameter that
    characterizes the strength and direction of a
    linear relationship (and only a linear
    relationship) between two variables. To compute
    rho, you must have the entire population. Then
    you can compute sigma, mu, Z scores and rho.
  • The formula rho 1 -(1/2 ?(ZX - ZY)2 / (NP))
    where NP is the number of pairs of Z scores in
    the population
  • In English The correlation coefficient equals 1
    minus half the average squared distance between
    the pairs of Z scores.

47
Pearsons rho
  • When you have a perfect positive correlation, the
    Z scores will be identical in size and sign. So
    the average squared distance will be zero and
  • rho 1.000-1/2(0.000) 1.000

48
Pearsons rho
  • When you have a perfect negative correlation, the
    Z scores will be identical in size and opposite
    in sign. It can be proven algebraically that the
    average squared distance in that case will be
    4.000
  • rho 1.000-1/2(4.000) -1.000

49
Pearsons rho
  • When you have two totally independent variables,
    the average squared distance will be 2.000
    (halfway between 0.000 and 4.000).
  • rho 1.000-1/2(2.000) 0.000

50
Pearsons Correlation Coefficient
  • Thus, rho varies from -1.000 (perfect negative
    correlation to 0.000 (independent variables) to
    1.000 (perfect positive correlation).
  • A negative value indicates a negative
    relationship a positive value indicates a
    positive relationship.
  • Values of r close to 1.000 or -1.000 indicate a
    strong (consistent) relationship values close
    to 0.000 indicate a weak (inconsistent) or
    independent relationship.

51
Estimating rho with r
  • Computing rho involves finding the actual average
    squared distance between the ZX and ZY scores in
    the whole population.
  • In computing r, we are estimating rho.

52
The formula for r
  • Pearsons r is a least squares, unbiased estimate
    of rho, based on the relationships found between
    tX and tY scores in a random sample.
  • r 1 - (1/2 ?(tX - tY)2 / (nP - 1)) where nP-1
    equals one less than the number of pairs of t
    scores in the sample.
  • In English Pearsons r equals 1.000 minus half
    the estimated average squared difference between
    the Z scores in the population based on squared
    differences between the t scores in the sample.

53
Look at those formulae again.
  • ?(ZX - ZY)2 / (NP) is the average squared
    distance between the Z scores.
  • The rest of the formula, simply transforms the
    average squared distance between the Z scores
    into a variable that goes from 1.000 to 1.000.
  • rho 1 -(1/2 ?(ZX - ZY)2 / (NP)) where NP is the
    number of pairs of Z scores in the population

54
Look at those formulae again.
  • . ?(tX - tY)2 / (nP - 1)) is a least squared,
    unbiased estimate of the average squared
    difference between the Z scores in the population
    based on the differences between the tX and tY
    scores in a random sample.
  • The rest of the formula, simply transforms the
    estimated average squared distance between the Z
    scores into a variable that goes from 1.000 to
    1.000.
  • r 1 - (1/2 ?(tX - tY)2 / (nP - 1)) where nP-1
    equals one less than the number of pairs of t
    scores in the sample.
  • REMEMBER, t scores are estimated Z scores

55
Thus, r, the least squared, unbiased estimate of
rho, is basically an estimate of the average
squared difference between the ZX and ZY scores
in the population transformed into a variable
that goes from -1.00 to 1.00.
56
Similarities of r and rho
  • r and rho vary from -1.000 to 1.000.
  • For both r and rho, a negative value indicates a
    negative relationship a positive value indicates
    a positive relationship.
  • Values of r or rho close to 1.000 or -1.000
    indicate a strong (consistent) relationship
    values close to 0.000 indicate a weak
    (inconsistent) or independent relationship.

57
Since we almost always are studying random
samples, not populations, we almost always
compute Pearsons r, not Pearsons rho.
58
r, strength and direction
Perfect, positive 1.00 Strong, positive
.75 Moderate, positive .50 Weak, positive
.25 Independent .00 Weak, negative -
.25 Moderate, negative - .50 Strong, negative
- .75 Perfect, negative -1.00
59
Calculating Pearsons r
  • Select a random sample from a population obtain
    scores on two variables, which we will call X and
    Y.
  • Convert all the scores into t scores.

60
Calculating Pearsons r
  • First, subtract the tY score from the tX score in
    each pair.
  • Then square all of the differences and add them
    up, that is, ?(tX - tY)2.

61
Calculating Pearsons r
  • Estimate the average squared distance between ZX
    and ZY by dividing by the sum of squared
    differences between the t scores by (nP -
    1). ?(tX - tY)2 / (nP - 1)
  • To turn this estimate into Pearsons r, use the
    formula r 1 - (1/2 ?(tX - tY)2 / (nP - 1))

62
Example Calculate t scores for X
DATA 2 4 6 8 10
MSW 40.00/(5-1) 10
sX 3.16
63
Calculate t scores for Y
DATA 9 11 10 12 13
MSW 10.00/(5-1) 2.50
sY 1.58
64
Calculate r
tY -1.26 0.00 -0.63 0.63 1.26
tX -1.26 -0.63 0.00 0.63 1.26
tX - tY 0.00 -0.63 0.63 0.00 0.00
(tX - tY)2 0.00 0.40 0.40 0.00 0.00
This is a very strong, positive relationship.
? (tX - tY)2 / (nP - 1)0.200
r 1.000 - (1/2 (? (tX - tY)2 / (nP - 1)))
r 1.000 - (1/2 .200)
1 - .100 .900
65
By the way - True graphs.
  • Ch.7 has true graphs, displays in which each dot
    stands for a score on two (in this case) or more
    (in more advanced cases) variables.
  • In Ch. 1 through Ch. 6, most of the figures
    represented the frequency of scores on a single
    variable.
  • Formally, displays of frequencies are figures,
    but they are not graphs.

66
Note seeming exception
  • Usually we divide a sum of squared deviations
    around a mean by df to estimate the variance.
  • Here the sum of squares is not around a mean and
    we are not estimating a variance.
  • So you divide ?(tX - tY)2 by (nP - 1)
  • nP - 1 is not df for corr regression (dfREG
    nP - 2)

67
End ch. 7, part 1 slides here.
Write a Comment
User Comments (0)
About PowerShow.com