Title: Chapter_4_Field_2005 Correlation
1Chapter_4_Field_2005 Correlation
- If the cock crows on the dunghill, the weather
will change or remain as it is (original German
saying)?
Kikeriki
2What is a correlation?
- A correlation is a measure of a linear relation
between variables (Clark, 2005,107)? - Statistically, two measures express such a
relation
Covariance Correlation coefficient(s)?
3Covariation
- Reminder Variation is the variability of a
measure in a sample, e.g., height of students in
this class - Covariation is the relation between the variance
of two variables, e.g., height and weight of
students in this class. - Q Are changes in one variable related to similar
changes in the other variable?
4Measure of Covariance
- _ _
- Cov (x,y) ? (xi - x) (yi y)?
- N 1
- Compare measure of variance
- _
- Variance (s2) ? (xi - x)2
- N 1
- Instead of squaring the differences between x and
- _
- the mean x, as in the measure of variance, we
multiply them with the differences between the - _
- other variable y - y .
X weight Y height
5Problem with covariance
- The measure of covariance is not standardized,
e.g., one cannot compare the covariance between
two sets of data that are measured in different
units. - --gt convert covariance in a standard set of
units. - The standardized unit of measurement is the
Standard Deviation SD of the mean.
6Pearson product-moment correlation coefficient r
- Correlation coefficient
- _ _
- r cov (x,y) ? (xi - x) (yi y)?
- sxsy (N 1)sxsy
- sxSD of the first variable
- sySD of the second variable
- --gt Due to standardization, r will always fall in
between -1 and 1. - Note that r is also used as a measure of the
effect size of an experiment (see chapter 1)?
7Various kinds of correlation
r 1
rpos, weak
rpos, strong
r0
rneg, strong
r - 1
r nonlinear
r nonlinear
8Expl. Correlation analysis using SPSS (Field,
2005, pp 112, data set ExamAnxiety.sav)?
- 1. Visual inspection Scatterplot
Graphs--gt Interactive --gt scatterplot Exam
Performance y Exam Anxiety x Gender
Style (Fit None)?
9Output of scatterplotExam perform x exam anxiety
- Most students have high levels of anxiety
- No outliers
- Negative correlation
- No gender effect
103D-scatterplots
Graphs --gt Interactive --gt Scatterplot
Use the following values
- 3D-scatterplot of exam performance plotted
against exam anxiety and the amount of time spent
revising
11Overlay scatterplots(not to be used with
'interactive graph'!)?
- In an Overlay scatterplot pairs of variables are
plotted on the same axis - Graphs --gt Scatter, choose 'Overlay'
Use swap pairs if necessary
Then 'define' the following pairs exam (Y)
anxiety (X) and overlay it with exam (Y)
revise (X) (use swap pairs if necessary
for bringing Y and X in the right order)?
12Overlay scatterplot
- Exam scores against both exam anciety and time
spent revising
--gt pos rel between exam performance and exam
revision --gt neg rel between exam performance and
exam anxiety
Exam Per formance
Exam anxiety/Time spent revising
13Matrix scatterplotshows relation between all
combinations of different pairs of variables
perf (Y) anxiety (x)?
Perf (Y) rev time (X)?
- Graphs --gt scatter --gt Matrix --gt Define
Anxiety (Y) perf (X)?
Anxiety (Y) rev time (X)?
outlier
Rev time (Y) perf (X)?
Rev time (Y) anxiety (X)?
14Bivariate Correlationusing file Advert.sav
Q How are watching ads and buying the product
related? Task Bivariate correlation of both
variables
Analyze gt Correlate --gt Bivariate
- Bivariate Pearson's product-moment corr
- Corr coeff requires interval scale level
- Test for significance requires normal distribution
15Bivariate Correlationsusing ExamAnxiety.sav
Analyze --gt Correlate --gt Bivariate
- Neg corr between anxiety and perform, r -.441
- Pos corr between revision and perform, r .397
- Neg corr between anxiety and revision, r -.709
16The coefficient of determination R2
- When we want to know how much of the overall
variability in the first variable can be
determined by the second variable, we square
Pearson's r. - This coefficient of determination is written as
R2 - Exp. r .871 (ad x packets bought) has a
- R2 .758
- 75 of the variance in the buying behavior can be
accounted for by number of ads - Caveat Still, R2 cannot be interpreted in a
causal way.
17Spearman's correlation coefficient rsusing the
grades.sav data
- rs is used when the variables are not measured
on an interval but on an ordinal scale. On the
ordinal level, the assumption of a normal
distribution does not have to be made.
- Expl Corr between Statistics Grades x Math grades
There is a pos correlation between Math and
statistics grades, rs.455
1-tailed test because of a directed hypothesis
(positive relation between math x stats)?
18Kendall's tau ??coefficientusing the grades.sav
data
- Kendall's tau ??is also a non-parametric
correlation coefficient - It is used for small data sets with large numbers
of ranks. - It is said to be a more accurate guess at the
true correlation than Spearman's rs
Analyze --gt Corr --gt Bivariate--gt Kendall's tau
Same positive corr for Kendall's ? between
statistics and math grades as in Spearman's rs.
But ? is smaller.
19Biserial and point-biserial correlations
- Biserial and point-biserial correlations are used
when one of the variables is only dichotomous
- The Point-Biserial corr coeff rpb is used when
the underlying variable is truly dichotomous,
e.g., male/female pregnant/not dead/alive, etc. - The Point-Biserial corr coeff is Pearson's r
- The Biserial corr coeff rb is used when the
underlying variable is dichotomous on the
surface, e.g., having passed or failed in an
exam, but underlyingly continuous, as expressed
in the exact points earned. - Rb cannot directly be calculated by SPSS
20Point-biserial Correlationusing the pbcorr.sav
data
- Q How is 'time roaming around' in a cat sample
correlated with gender (male, female)? - The Pearson coeff r .378
- R2 .143, i.e., gender accounts for 14.3 of the
variance of cats' roaming around - Whether the coeffient is pos or neg depends on
which category is assigned which code. It
reverses (from pos to neg) if instead of 'gender'
the variable 'recode' is used (1female 0male)?
21Point-biserial Correlationusing the pbcorr.sav
data
- Analyze --gt Correlate --gt Bivariate --gt Pearson
Male 1 female 0
Whether the corr is pos or neg depends entirely
on the coding of the variables (malefemale)?
Male 0 female 1
22Computing rb (biserial r) from rpb
(point-biserial r)?
- If the underlying variable 'gender' is not truly
dichotomous (because of some neutered male cats),
then rb coefficient can be used, using the
equation - (E 4.4) Rb rpb ? (P1P2)?
- y
- where P1 is the proportion of cases that fell
into category 1 and P2 of category 2 (male and
female).
23Computing rb from rpb
- In the Menue 'Frequencies' we can obtain that
- P153.3 male
- P246.7 female
- Y the value of the normal distribution where P1
stops and P2 begins - In the Appendix A.1. , we find .3977 for .468 as
the smaller portion and .532 as the bigger
portion. - Computing Equation 4.4. with y.3977, yields a rb
of .475 - Rb is much higher than rpb!
- --gt It makes a difference, if the variable is
truly dichotomous or continuous!
24Correlation and causalityThe standard disclaimer
- The correlation between two variables is an
undirected relationship - A causal interpretation is a directed
relationship with the causing variable (causer)
necessarily preceding and determining the caused
variable (causee), in a meaningful way - Problem 1 There could be a 3rd Variable
mediating between the first two variables. - Problem 2 There could be a complex interaction
between the two variables, e.g., a positive
feedback loop between exam anxiety and exam
performance - Therefore, never ever interpret a correlation as
a causal story
25ExampleCorrelation ? Causation
- Correlation
- The number or storks and the number of human
babies in a country are positively correlated. - Causation?
- Does the birth rate go up because there are more
storks? Do the storks bring the babies after all?
26Partial correlationusing the ExamAnxiety.sav data
- A partial correlation correlates two variables
while keeping constant one or more additional
variables - In the examAnxiety data,
- Revision time is related to exam perform
- Revision time is related to anxiety
- In order to find out the true correlation between
exam performance and anxiety, we have to partial
out revision time
27Segmenting the variance
- Anxiety accounts for 19.4 of the variance of
perfomance (r -.441)? - Rev time accounts for 15.7 of the variance of
performance (r .397)? - Rev time accounts for 50.2 of the variance of
anxiety (r -.709)? - --gtparts of the 19.4 of which anxiety accounts
for the performance variation may actually be
accounted for by Rev time - --gt Partial correlations may address the
third-variable problem to some extent
28Diagram depicting partial correlations
Unique Variance explained by revision time
Exam performance
Revision time
Exam Anxiety
Unique variance explained by exam anxiety
Variance explained by both exam anxiety and
revision time
29Partial correlation between exam anxiety and exam
performance while controlling for revision time
Analyze --gt Correlate --gt Partial
30Partial Correlation
- The partial correlation between anxiety and
performance is r -.2467 (R2 .06), having
taken out revision time. - (Originally it had been r -.441)?
- R2 has shrunk from 19.4 to 6
31Semi-partial (Part) correlations
- In a semi-partial (part) correlation, the effect
of a third variable on one of the two variables
is controlled, so that the unique corr between
the two can be assessed.
- In a partial correlation, the effect of a third
variable on both of the two variables is
controlled
Revision
Revision
Exam
Anxiety
Anxiety
Exam
32Default Homework
- Answer the 'Smart Alex's tasks' in Chapter 4 (p.
141)?
33Collective Homework
- Obtain a sample from this Statistics course with
the following variables - Weight, height, age, sex
- Explore the data with the commands for
descriptive statistics you have learned so far
Frequencies, mean, median, mode, SD, SE, range,
variance, etc. - Inspect the sample visually with histograms, box
plot, scatter plots (simple, overlay, multiple)? - Test for normal distribution (K-S-Test) and
homogeneity of variances (Levene), split files if
necessary - Correlate all variable, see how high the corr
are, if pos or neg - If necessary, partial out variances