Title: Linear correlation
1Chapter 9
2Relationships between variables
- So far we have been testing hypotheses about
differences. - There are other things we can do with statistics.
- We can investigate the relationships between
variables.
3The value of relationshipsKnowledge compression
- If A depends on B and B depends on C and C
depends on D and so on - Then we can compress the amount of information we
need to have about the world. - For example, in the above, knowing about D tells
us what we need to know about C, B, and A. - Our mental capacity is finite, so it pays to be
efficient.
4The value of relationshipsStructure
- Imagine a world where all variables varied
independently from each other. - This would be a very chaotic and meaningless
world. - The relationships between variables gives
structure and meaning to life. - So, relationships are good, but what sort of
relationships should we look for? - And, how should be look for them?
5Challenge 1
- Given Xi1, 2, 3, 4, 5, Yi10, 8, 6, 4,
2 - Arrange these numbers into pairs, attempting to
maximize the sum of the products of the pairs. - For example
- 1 8
- 2 10
- 3 4
- 4 6
- 5 2
- 18 210 34 46 52 74
6Challenge 2
- Given 200 yards of fencing, find the rectangular
cattle pen dimensions with the maximal area - For example
- 75yd 25yd 1875
7What have we learned?
- By matching ______ coefficients, we maximize a
product. - Implicit in this pairing is a relationship
- When Xi is large Yi is large and when Xi is small
Yi is small. - Many pairs of variables have such a relationship.
- The more you drink in an hour the more
intoxicated you are. - The more gas you have in your tank, the farther
you can drive. - The colder it is, the thicker the ice on a lake.
8Applying what weve learned
- We now have an interesting relationship and a way
to measure it. - The relationship As Xi increases Yi increases.
As Xi decreases Yi decreases. - The measure
- Divide by N because we dont care how many scores
we have. - If the Xi and Yi are properly matched, our
measure will be high. - If the pairing is random, it will be low.
9Our measure of correlation is not perfect
- Consider an extremely simple example.
- Xi is perfectly correlated with Xi.
- Example Let Ci be the highs in a 6 day forecast.
- Ci -13, -15, -21, -17, 14, -8
- correlation (-13-13) (-15-15) (-21-21)
(-17-17) (-14-14) (-8-8) - Now, what if we substitute the Fahrenheit
equivalents for the second variable? - F9/5 C 32
- correlation (-138) (-155) (-21-5)
(-172) (-147) (-88) - Problem our correlation will change based purely
on a change of unit.
10Our measure of correlation is not perfect
- What happened?
- F9/5 C 32
- Multiplying by 9/5 scales all scores.
- Adding 32 translates the mean.
- We dont want things to change if we scale by
some factor or add a constant. - What did we do last time?
11Solution
- Standardize by mapping our raw scores into z
values. - This neutralizes any scaling or translation of
the mean.
12Properties of r
- A set of z scores have an expected mean of 0.
- Thus, a z score is as likely to be positive as
negative. - Therefore, if the zXi and zYi are paired
randomly, what will be the correlation r? - So, no correlation implies r ___________________
_
13Properties of r
- We know that zi is perfectly correlated with
itself. - What correlation r does this give?
- This is the variance of z.
- What is the variance of z (the standard normal
distribution)? - Therefore, perfect correlation means r
___________
14Graphing correlationwith SPSS
- Using grades.sav
- Graphs-gtScatter/Dot
- Simple Scatter
- Define
- Select total for X.
- Select percent for Y
- Guess if the correlation r will be large or
small. - OK
15Computing correlationwith SPSS
- Now compute r.
- Analyze -gt Correlate -gt Bivariate
- Choose total and percent.
- Check Pearson.
- OK
- Next compute the graph and the correlation for
- Quiz1 and final
- Id and GPA
- Predict the degree of correlation before testing
with SPSS.
16Affine functions and relationships
- A straight line plot represents an affine
function. - Y aX b
- Often mistakenly called linear.
- Y aX
- Non-linear
- Sounds very sophisticated
- Could be almost anything
17Correlation and non-affine relationships
- Some variables are related by non-affine
relationships. - This doesnt mean they are any less related.
- Correlation, as weve defined it, will not detect
this relationship.
18An efficient formula for computation of
correlation
- It eliminates the need for repeated subtraction
of the means. - It is partially derived on page 253.
- A bias is introduced in the case of estimating ?
with s.
19Given some r?0, can can we assume that there is
some correlation?
- Non-zero r might have arisen by chance.
- As usual, look to the null hypothesis
- ?????0, where ? is the correlation for the
entire population. - It turns out that the distribution of r, assuming
??0, is normal, or at least a t distribution. - So we can do a t test to see if the r?0 we got
arose by chance.
20All we need is the standard error for r
- The gods tell us that the standard error for r
is - So, by substitution
- The gods also tell us that df for r is N-2.
21New assumptions
- Distributions are normal bivariate.
- This can be assumed if the the sample is large
(gt30) and the bivariate distribution is
approximately normal.
22Exercises
- Page 251 1-4, 9
- Page 262 1, 2, 5, 7 (Use SPSS.), 8