Chapter_4_Field_2005 Correlation - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Chapter_4_Field_2005 Correlation

Description:

If the cock crows on the dunghill, the weather will change or remain as it is ... 'A correlation is a measure of a linear relation between variables' (Clark, ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 34
Provided by: iiMet
Category:

less

Transcript and Presenter's Notes

Title: Chapter_4_Field_2005 Correlation


1
Chapter_4_Field_2005 Correlation
  • If the cock crows on the dunghill, the weather
    will change or remain as it is (original German
    saying)?

Kikeriki
2
What is a correlation?
  • A correlation is a measure of a linear relation
    between variables (Clark, 2005,107)?
  • Statistically, two measures express such a
    relation

Covariance Correlation coefficient(s)?
3
Covariation
  • Reminder Variation is the variability of a
    measure in a sample, e.g., height of students in
    this class
  • Covariation is the relation between the variance
    of two variables, e.g., height and weight of
    students in this class.
  • Q Are changes in one variable related to similar
    changes in the other variable?

4
Measure of Covariance
  • _ _
  • Cov (x,y) ? (xi - x) (yi y)?
  • N 1
  • Compare measure of variance
  • _
  • Variance (s2) ? (xi - x)2
  • N 1
  • Instead of squaring the differences between x and
  • _
  • the mean x, as in the measure of variance, we
    multiply them with the differences between the
  • _
  • other variable y - y .

X weight Y height
5
Problem with covariance
  • The measure of covariance is not standardized,
    e.g., one cannot compare the covariance between
    two sets of data that are measured in different
    units.
  • --gt convert covariance in a standard set of
    units.
  • The standardized unit of measurement is the
    Standard Deviation SD of the mean.

6
Pearson product-moment correlation coefficient r
  • Correlation coefficient
  • _ _
  • r cov (x,y) ? (xi - x) (yi y)?
  • sxsy (N 1)sxsy
  • sxSD of the first variable
  • sySD of the second variable
  • --gt Due to standardization, r will always fall in
    between -1 and 1.
  • Note that r is also used as a measure of the
    effect size of an experiment (see chapter 1)?

7
Various kinds of correlation
r 1
rpos, weak
rpos, strong
r0
rneg, strong
r - 1
r nonlinear
r nonlinear
8
Expl. Correlation analysis using SPSS (Field,
2005, pp 112, data set ExamAnxiety.sav)?
  • 1. Visual inspection Scatterplot

Graphs--gt Interactive --gt scatterplot Exam
Performance y Exam Anxiety x Gender
Style (Fit None)?
9
Output of scatterplotExam perform x exam anxiety
  • Most students have high levels of anxiety
  • No outliers
  • Negative correlation
  • No gender effect

10
3D-scatterplots
Graphs --gt Interactive --gt Scatterplot
Use the following values
  • 3D-scatterplot of exam performance plotted
    against exam anxiety and the amount of time spent
    revising

11
Overlay scatterplots(not to be used with
'interactive graph'!)?
  • In an Overlay scatterplot pairs of variables are
    plotted on the same axis
  • Graphs --gt Scatter, choose 'Overlay'

Use swap pairs if necessary
Then 'define' the following pairs exam (Y)
anxiety (X) and overlay it with exam (Y)
revise (X) (use swap pairs if necessary
for bringing Y and X in the right order)?
12
Overlay scatterplot
  • Exam scores against both exam anciety and time
    spent revising

--gt pos rel between exam performance and exam
revision --gt neg rel between exam performance and
exam anxiety
Exam Per formance
Exam anxiety/Time spent revising
13
Matrix scatterplotshows relation between all
combinations of different pairs of variables
perf (Y) anxiety (x)?
Perf (Y) rev time (X)?
  • Graphs --gt scatter --gt Matrix --gt Define

Anxiety (Y) perf (X)?
Anxiety (Y) rev time (X)?
outlier
Rev time (Y) perf (X)?
Rev time (Y) anxiety (X)?
14
Bivariate Correlationusing file Advert.sav
Q How are watching ads and buying the product
related? Task Bivariate correlation of both
variables
Analyze gt Correlate --gt Bivariate
  • Bivariate Pearson's product-moment corr
  • Corr coeff requires interval scale level
  • Test for significance requires normal distribution

15
Bivariate Correlationsusing ExamAnxiety.sav
Analyze --gt Correlate --gt Bivariate
  • Neg corr between anxiety and perform, r -.441
  • Pos corr between revision and perform, r .397
  • Neg corr between anxiety and revision, r -.709

16
The coefficient of determination R2
  • When we want to know how much of the overall
    variability in the first variable can be
    determined by the second variable, we square
    Pearson's r.
  • This coefficient of determination is written as
    R2
  • Exp. r .871 (ad x packets bought) has a
  • R2 .758
  • 75 of the variance in the buying behavior can be
    accounted for by number of ads
  • Caveat Still, R2 cannot be interpreted in a
    causal way.

17
Spearman's correlation coefficient rsusing the
grades.sav data
  • rs is used when the variables are not measured
    on an interval but on an ordinal scale. On the
    ordinal level, the assumption of a normal
    distribution does not have to be made.
  • Expl Corr between Statistics Grades x Math grades

There is a pos correlation between Math and
statistics grades, rs.455
1-tailed test because of a directed hypothesis
(positive relation between math x stats)?
18
Kendall's tau ??coefficientusing the grades.sav
data
  • Kendall's tau ??is also a non-parametric
    correlation coefficient
  • It is used for small data sets with large numbers
    of ranks.
  • It is said to be a more accurate guess at the
    true correlation than Spearman's rs

Analyze --gt Corr --gt Bivariate--gt Kendall's tau
Same positive corr for Kendall's ? between
statistics and math grades as in Spearman's rs.
But ? is smaller.
19
Biserial and point-biserial correlations
  • Biserial and point-biserial correlations are used
    when one of the variables is only dichotomous
  • The Point-Biserial corr coeff rpb is used when
    the underlying variable is truly dichotomous,
    e.g., male/female pregnant/not dead/alive, etc.
  • The Point-Biserial corr coeff is Pearson's r
  • The Biserial corr coeff rb is used when the
    underlying variable is dichotomous on the
    surface, e.g., having passed or failed in an
    exam, but underlyingly continuous, as expressed
    in the exact points earned.
  • Rb cannot directly be calculated by SPSS

20
Point-biserial Correlationusing the pbcorr.sav
data
  • Q How is 'time roaming around' in a cat sample
    correlated with gender (male, female)?
  • The Pearson coeff r .378
  • R2 .143, i.e., gender accounts for 14.3 of the
    variance of cats' roaming around
  • Whether the coeffient is pos or neg depends on
    which category is assigned which code. It
    reverses (from pos to neg) if instead of 'gender'
    the variable 'recode' is used (1female 0male)?

21
Point-biserial Correlationusing the pbcorr.sav
data
  • Analyze --gt Correlate --gt Bivariate --gt Pearson

Male 1 female 0
Whether the corr is pos or neg depends entirely
on the coding of the variables (malefemale)?
Male 0 female 1
22
Computing rb (biserial r) from rpb
(point-biserial r)?
  • If the underlying variable 'gender' is not truly
    dichotomous (because of some neutered male cats),
    then rb coefficient can be used, using the
    equation
  • (E 4.4) Rb rpb ? (P1P2)?
  • y
  • where P1 is the proportion of cases that fell
    into category 1 and P2 of category 2 (male and
    female).

23
Computing rb from rpb
  • In the Menue 'Frequencies' we can obtain that
  • P153.3 male
  • P246.7 female
  • Y the value of the normal distribution where P1
    stops and P2 begins
  • In the Appendix A.1. , we find .3977 for .468 as
    the smaller portion and .532 as the bigger
    portion.
  • Computing Equation 4.4. with y.3977, yields a rb
    of .475
  • Rb is much higher than rpb!
  • --gt It makes a difference, if the variable is
    truly dichotomous or continuous!

24
Correlation and causalityThe standard disclaimer
  • The correlation between two variables is an
    undirected relationship
  • A causal interpretation is a directed
    relationship with the causing variable (causer)
    necessarily preceding and determining the caused
    variable (causee), in a meaningful way
  • Problem 1 There could be a 3rd Variable
    mediating between the first two variables.
  • Problem 2 There could be a complex interaction
    between the two variables, e.g., a positive
    feedback loop between exam anxiety and exam
    performance
  • Therefore, never ever interpret a correlation as
    a causal story

25
ExampleCorrelation ? Causation
  • Correlation
  • The number or storks and the number of human
    babies in a country are positively correlated.
  • Causation?
  • Does the birth rate go up because there are more
    storks? Do the storks bring the babies after all?

26
Partial correlationusing the ExamAnxiety.sav data
  • A partial correlation correlates two variables
    while keeping constant one or more additional
    variables
  • In the examAnxiety data,
  • Revision time is related to exam perform
  • Revision time is related to anxiety
  • In order to find out the true correlation between
    exam performance and anxiety, we have to partial
    out revision time

27
Segmenting the variance
  • Anxiety accounts for 19.4 of the variance of
    perfomance (r -.441)?
  • Rev time accounts for 15.7 of the variance of
    performance (r .397)?
  • Rev time accounts for 50.2 of the variance of
    anxiety (r -.709)?
  • --gtparts of the 19.4 of which anxiety accounts
    for the performance variation may actually be
    accounted for by Rev time
  • --gt Partial correlations may address the
    third-variable problem to some extent

28
Diagram depicting partial correlations
Unique Variance explained by revision time
Exam performance
Revision time
Exam Anxiety
Unique variance explained by exam anxiety
Variance explained by both exam anxiety and
revision time
29
Partial correlation between exam anxiety and exam
performance while controlling for revision time
Analyze --gt Correlate --gt Partial
30
Partial Correlation
  • The partial correlation between anxiety and
    performance is r -.2467 (R2 .06), having
    taken out revision time.
  • (Originally it had been r -.441)?
  • R2 has shrunk from 19.4 to 6

31
Semi-partial (Part) correlations
  • In a semi-partial (part) correlation, the effect
    of a third variable on one of the two variables
    is controlled, so that the unique corr between
    the two can be assessed.
  • In a partial correlation, the effect of a third
    variable on both of the two variables is
    controlled

Revision
Revision
Exam
Anxiety
Anxiety
Exam
32
Default Homework
  • Answer the 'Smart Alex's tasks' in Chapter 4 (p.
    141)?

33
Collective Homework
  • Obtain a sample from this Statistics course with
    the following variables
  • Weight, height, age, sex
  • Explore the data with the commands for
    descriptive statistics you have learned so far
    Frequencies, mean, median, mode, SD, SE, range,
    variance, etc.
  • Inspect the sample visually with histograms, box
    plot, scatter plots (simple, overlay, multiple)?
  • Test for normal distribution (K-S-Test) and
    homogeneity of variances (Levene), split files if
    necessary
  • Correlate all variable, see how high the corr
    are, if pos or neg
  • If necessary, partial out variances
Write a Comment
User Comments (0)
About PowerShow.com