Linear correlation - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Linear correlation

Description:

Properties of r. A set of z scores have an expected mean of 0. ... Sounds very sophisticated. Could be almost anything. Correlation and non-affine relationships ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 23
Provided by: MarkB9
Category:

less

Transcript and Presenter's Notes

Title: Linear correlation


1
Chapter 9
  • Linear correlation

2
Relationships between variables
  • So far we have been testing hypotheses about
    differences.
  • There are other things we can do with statistics.
  • We can investigate the relationships between
    variables.

3
The value of relationshipsKnowledge compression
  • If A depends on B and B depends on C and C
    depends on D and so on
  • Then we can compress the amount of information we
    need to have about the world.
  • For example, in the above, knowing about D tells
    us what we need to know about C, B, and A.
  • Our mental capacity is finite, so it pays to be
    efficient.

4
The value of relationshipsStructure
  • Imagine a world where all variables varied
    independently from each other.
  • This would be a very chaotic and meaningless
    world.
  • The relationships between variables gives
    structure and meaning to life.
  • So, relationships are good, but what sort of
    relationships should we look for?
  • And, how should be look for them?

5
Challenge 1
  • Given Xi1, 2, 3, 4, 5, Yi10, 8, 6, 4,
    2
  • Arrange these numbers into pairs, attempting to
    maximize the sum of the products of the pairs.
  • For example
  • 1 8
  • 2 10
  • 3 4
  • 4 6
  • 5 2
  • 18 210 34 46 52 74

6
Challenge 2
  • Given 200 yards of fencing, find the rectangular
    cattle pen dimensions with the maximal area
  • For example
  • 75yd 25yd 1875

7
What have we learned?
  • By matching ______ coefficients, we maximize a
    product.
  • Implicit in this pairing is a relationship
  • When Xi is large Yi is large and when Xi is small
    Yi is small.
  • Many pairs of variables have such a relationship.
  • The more you drink in an hour the more
    intoxicated you are.
  • The more gas you have in your tank, the farther
    you can drive.
  • The colder it is, the thicker the ice on a lake.

8
Applying what weve learned
  • We now have an interesting relationship and a way
    to measure it.
  • The relationship As Xi increases Yi increases.
    As Xi decreases Yi decreases.
  • The measure
  • Divide by N because we dont care how many scores
    we have.
  • If the Xi and Yi are properly matched, our
    measure will be high.
  • If the pairing is random, it will be low.

9
Our measure of correlation is not perfect
  • Consider an extremely simple example.
  • Xi is perfectly correlated with Xi.
  • Example Let Ci be the highs in a 6 day forecast.
  • Ci -13, -15, -21, -17, 14, -8
  • correlation (-13-13) (-15-15) (-21-21)
    (-17-17) (-14-14) (-8-8)
  • Now, what if we substitute the Fahrenheit
    equivalents for the second variable?
  • F9/5 C 32
  • correlation (-138) (-155) (-21-5)
    (-172) (-147) (-88)
  • Problem our correlation will change based purely
    on a change of unit.

10
Our measure of correlation is not perfect
  • What happened?
  • F9/5 C 32
  • Multiplying by 9/5 scales all scores.
  • Adding 32 translates the mean.
  • We dont want things to change if we scale by
    some factor or add a constant.
  • What did we do last time?

11
Solution
  • Standardize by mapping our raw scores into z
    values.
  • This neutralizes any scaling or translation of
    the mean.

12
Properties of r
  • A set of z scores have an expected mean of 0.
  • Thus, a z score is as likely to be positive as
    negative.
  • Therefore, if the zXi and zYi are paired
    randomly, what will be the correlation r?
  • So, no correlation implies r ___________________
    _

13
Properties of r
  • We know that zi is perfectly correlated with
    itself.
  • What correlation r does this give?
  • This is the variance of z.
  • What is the variance of z (the standard normal
    distribution)?
  • Therefore, perfect correlation means r
    ___________

14
Graphing correlationwith SPSS
  • Using grades.sav
  • Graphs-gtScatter/Dot
  • Simple Scatter
  • Define
  • Select total for X.
  • Select percent for Y
  • Guess if the correlation r will be large or
    small.
  • OK

15
Computing correlationwith SPSS
  • Now compute r.
  • Analyze -gt Correlate -gt Bivariate
  • Choose total and percent.
  • Check Pearson.
  • OK
  • Next compute the graph and the correlation for
  • Quiz1 and final
  • Id and GPA
  • Predict the degree of correlation before testing
    with SPSS.

16
Affine functions and relationships
  • A straight line plot represents an affine
    function.
  • Y aX b
  • Often mistakenly called linear.
  • Y aX
  • Non-linear
  • Sounds very sophisticated
  • Could be almost anything

17
Correlation and non-affine relationships
  • Some variables are related by non-affine
    relationships.
  • This doesnt mean they are any less related.
  • Correlation, as weve defined it, will not detect
    this relationship.

18
An efficient formula for computation of
correlation
  • It eliminates the need for repeated subtraction
    of the means.
  • It is partially derived on page 253.
  • A bias is introduced in the case of estimating ?
    with s.

19
Given some r?0, can can we assume that there is
some correlation?
  • Non-zero r might have arisen by chance.
  • As usual, look to the null hypothesis
  • ?????0, where ? is the correlation for the
    entire population.
  • It turns out that the distribution of r, assuming
    ??0, is normal, or at least a t distribution.
  • So we can do a t test to see if the r?0 we got
    arose by chance.

20
All we need is the standard error for r
  • The gods tell us that the standard error for r
    is
  • So, by substitution
  • The gods also tell us that df for r is N-2.

21
New assumptions
  • Distributions are normal bivariate.
  • This can be assumed if the the sample is large
    (gt30) and the bivariate distribution is
    approximately normal.

22
Exercises
  • Page 251 1-4, 9
  • Page 262 1, 2, 5, 7 (Use SPSS.), 8
Write a Comment
User Comments (0)
About PowerShow.com