Correlation

About This Presentation

Title:

Correlation

Description:

Correlation & Regression Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV s on a DV i.e. differences between males ... – PowerPoint PPT presentation

Number of Views:160

Avg rating:3.0/5.0

Slides: 44

Provided by: Michael1749

Learn more at: https://www.personal.kent.edu

Category:

Tags: correlation

more less

Transcript and Presenter's Notes

Title: Correlation

1
Correlation Regression
2
Correlation

T-tests and ANOVA examine the mean differences
between two levels of one or more IVs on a DV
i.e. differences between males and females (2
levels of the IV gender) on exam scores
What if instead of average differences we were
more interested in the relationship between two
variables?
relationship how one variable changes as a
function of another variable

3
Correlation

i.e. the relationship between anxiety prior to a
medical procedure and the patients post-op
recovery
This type of question concerns what is called a
correlation
Correlation relationship between two variables
NOTE if we were looking at average post-op
recovery (the DV) in groups both high and low in
pre-op anxiety (2 levels of the IV anxiety), we
would be looking at mean differences, and an
ANOVA would be more appropriate than correlation

4
Correlation

The easiest means of representing this
relationship/correlation is via the use of a
scatterplot
Scatterplot a graph in which the individual
data points are plotted in two-dimensions

5
Correlation

Predictor Variable traditionally the variable
on the x-axis (in this case Depression)
Criterion Variable traditionally the variable
on the y-axis (in this case Pessimism)
Best-Fit Line/Regression Line the line that
represents the area in space that each data point
is minimally distant from/that best represents
the data

6
Correlation

Regression Line
Best fit line that minimizes average distance
from all data points (i.e. residuals)
Residual Amount that data point deviates from
this line

7
Correlation

It is important to note that although the
predictor is usually the variable on the x-axis,
and the criterion on the y-axis, that often these
definitions are not adhered to and the variables
are named randomly
Also, because one variable is called the
predictor does not mean that it predicts the
criterion in the sense that it can tell you what
the criterion is before it occurs
i.e. to say that depression predicts pessimism
does not mean that depression comes first and
causes you to be pessimistic!

8
Correlation

Correlation does not equal causation!
the only way that you can say that one variable
predicts another in time is through the design of
your experiment
if depression were assessed in January and
pessimism were assessed in December, and the two
were found to be related, then you can say that
one predicts the other in time
statistical prediction ? prediction
if the two variables were measured at the same
time, we do not know which one caused the other
one

9
Correlation

to determine causation (that one variable caused
another) we need to show several things
that the predictor preceded the criterion in time
(this also shows that the criterion did not cause
the predictor)
that other variables did not cause both the
criterion and the predictor at the same time,
resulting in their relationship
IV DV
Var 1

10
Correlation

i.e. if we were studying the relationship
(correlation) between two variables the length
of grass and ice cream consumption
If they were measured simultaneously it would be
impossible to tell which caused which
If both were measured at two time points, July
and December, we would find that they both
increase and decrease at the same time (i.e. one
does not seem to cause the other) no causation
If we measured temperature as well, we would find
that both are correlated because increases in
temperature causes both, which explains why the
increase and decrease at the same time

11
Correlation

Correlation is represented by the Pearson
Product-Moment Correlation Coefficient (r)
can range from -1 to 1, where 1 represents a
strong positive relationship, -1 a strong
negative relationship, and 0 no relationship
between the two variables
both strong positive and negative relationships
are, none-the-less, robust relationships and are
generally meaningful a negative relationship is
not bad
only used when the two variables are
continuous/dimensional

12
Correlation

Positive Relationship (r .82)
As BDI2TOT increases, MASQGDD also increases

13
Correlation

Negative Relationship (r -.679)
As MASQAD increases, TMMSREP decreases

14
Correaltion

No Relationship (r .00)
Information about Explanatory Flexibility tells
you nothing about Emotional Insight

15
Correlation

Pearsons r is heavily reliant on the covariance
covxy
If variance
then cov is just average variability in both x
and y

16
Correlation

Error variance average amount each point
deviates from best-fit line standard error of
the estimate sy.x
sy.x
If Y is point on best fit line (predicted value
of Y), then sy.x standard deviation of
residuals or variance of residuals/error error
variance

17
Correlation

Pearsons r covxy/sxsy
Correlation amount of shared variability/v(total
variability)
Since its like a , r ranges from 0 (-)1.00
In fact, by squaring r (r2) variability that
is shared between x and y
Previous example of BDI2 and MASQGDD, r .82 r2
.67 ? 67 of variance in BDI2 is predicted by
MASQGDD

18
Correlation

Hypotheses in Correlation
H0 ? 0
? (rho) correlation in population (parameter)
H1 ? ? 0

19
Correlation

Assumptions of Correlation (Pearsons r)
Nonlinear/Curvilinear Relationships
If the relationship between the two variables is
not linear, and is instead U-shaped or
bell-shaped (like our normal distribution), our
attempts at finding a best-fit line will fail,
and it will seem as though our two variables are
unrelated (r will approximate 0), when in fact
the relationship exists, but is nonlinear

20
Correlation

Above is an example of a curvilinear
relationship, although the two variables are
clearly related, their correlation is only r
-.205
Note how the best-fit line does not represent the
data points well

21
Correlation

Assumptions of Correlation (Pearsons r)
Normality
Both variables must be normally distributed,
otherwise correlation will appear smaller than it
is
If our data is non-normal, correlation
coefficients other than r can be used

22
Correlation

We can also calculate r if our data is ordinal
instead of continuous/dimensional
Remember data on an ordinal scale is ranked,
which means that we can tell that one number is
higher than another, but not how much higher
(interval scales have this), and there is no zero
point (ratio scales have this) i.e. 1st place,
2nd place, etc. ordinal data
Correlation here is represented by Spearmans rs
Difference between r and rs is that rs requires
that the data be monotonic, or constantly rising
or falling if data are arranged in rank order,
they can only go up or down, you cant go from
1st place to 9th place to 2nd place if the places
are arranged in order

23
Correlation

Other correlation coefficients
The Point Biserial Correlation coefficient (rpb)
- If one variable is continuous/dimensional and
the other dichotomous (a nominal scale where the
variable can take only two possible values)
Dichotomous variables e.g. Gender
(Male/Female), Yes/No answers, Race (if it is
coded as Caucasian or Minority), etc.

24
Correlation

Other correlation coefficient
Phi (F) when both variables are dichotomous

25
Correlation

Factors that bias correlation coefficients
Range Restriction
Typically, restricting range reduces correlations
Full Dataset (r .82) Only BDI gt 30 (r .490)

26
Correlation

However, restricting range increases correlations
if the relationship is curvilinear because it
makes the variable linear
Full Dataset (r -.205) Only Var1 5 (r
-.982)

27
Correlation

Problems of range restriction are common in
psychological research, because researchers want
their group to be as different from each other as
possible to increase the effect sizes that they
obtain
Remember The formula for effect size for ANOVA
(Cohens d) is the mean for Group 1 the mean
for Group 2 divided by the sp
To get highly different groups, researchers
sample those high and low on a particular
variable
I.e. comparing those highest on aggression to
those lowest on aggression
This is identical to only looking at BDI2 scores
higher than 30, when looking at the full range of
scores, correlations will be more accurate

28
Correlation

Factors that bias correlation coefficients
Heterogenous Subsamples
This is a problem when there is an interaction
present (i.e. our age by gender interaction
mentioned in the discussion of Factorial ANOVA)

If males performance increases as they age, and
womens performance remains the same, when the
two genders are averaged together and age and
performance are correlated regardless of gender,
the correlation will be smaller
Strong correlation of age and performance for
males weak correlation of age and performance
for females biased correlation when the two are
added together

30
Correlation

Factors that bias correlation coefficients
Outliers
No Outliers (r .989) Outlier (r .522)

31
Correlation

Testing correlations for significance
just like t- and F-statistics, r-statistics can
be tested for significance
just like t- and F-statistics, with increasing
sample size (n), smaller correlations (rs) will
be significant
with 25 people, r .396 is significant at p lt
.05, with 1000 people you only need an r .062
(see Table E.2, page 515 in your text)

32
Correlation

Testing correlations for significance
the r-statistic is also its own, built-in effect
size statistic
Cohens conventions for r .1 small, .3
medium, and .5 large effects
by squaring r (r2), you also get a relatively
unbiased effect size estimate that is interpreted
identically to ?2 and ?2
Remember ?2 and ?2 represent the percent of
variability in one variable accounted for by the
other

33
Correlation

Testing correlations for significance
Therefore, if
r .5, p .00001, you can state that your two
variables are strongly (effect size) and reliably
(p-value) related
r .5, p .65, you can conclude that your two
variables are strongly related, but that you
probably didnt have enough subjects for this to
be represented in your p-value
r .1, p .00001, you can conclude that large
sample size inflated your p-value, and your
variables are probably not related
r .1, p .65, you can conclude that your two
variables are neither strongly nor reliably
related

34
Regression

The best-fit line allows us to make educated
guesses about what a score is on one variable
given a score on the other
Extrapolate make educated guesses what a score
would be that is either higher or lower than any
actual score obtained
Interpolate make educated guesses what a score
would be that is in the range of the scores
obtained, but that was not actually obtained

35
Regression

Range of scores on Depression 0 49
Range of scores on Pessimism 1 7
Extrapolation What pessimism score would be
associated with a depression score of 50? (6.8)
Interpolation What pessimism score would be
associated with a depression score of 45? (5.5)

36
Regression

Interested in linear relationship between 2
variables use correlation
Interested in linear relationship(s) between 3
dimensional variables regression
DV Symptoms of paranoia
IV Treatment vs. Control groups ? ANOVA
IV discrete (dichotomous/polychotomous)
IV of sessions of treatment ? Regression
IV dimensional/continuous

37
Regression

DV Criterion, IVs Predictors
Criterion b1x1 b2x2 b3x3 a
x1 predictor 1 b1 slope of x1 and DV a
intercept Slope rate of change
b .75 1 pt. increase in IV associated with
.75 pt. increase in DV
I.e. for every 1 pt. increase in pessimism, Dep
increases .75 pt.

38
Regression

Slope
Slope w/ raw data b
I.e. b .45 in prediction of GPA from IQ ? 1 pt.
increase in IQ associated with ½ pt. increase in
GPA
Slope w/ standardized data ß
Standardize data (i.e. convert to z-score) to
compare slopes between experiments
ß bxs/sintercept
I.e. ß .53 ? 1 s.d. increase in IQ associated
with ½ s.d. increase in GPA
b more interpretable if scale of variables is
meaningful
Intercept value of DV when IV 0
In previous ex., Pess 3 when Dep 0, so a
3

39
Regression

Regression can test
The overall ability of all of your IVs to
predict your criterion (overall model/omnibus R2)
The ability of each IV to predict your criterion
(b or ß)
Each of these statistics is associated with a
p-value tested for significance
Can also be used to make predictions based on
best-fit/regression line (less common)

40
Regression

Hypotheses in Regression
H0 b/ß/R2 (in population) 0
H1 b/ß/R2 (in population) ? 0

41
Regression

Assumptions of Regression
Linearity of Regression
Variables linearly related to one another
Normality in Arrays
Actual values of DV normally distributed around
predicted values (i.e. regression line) AKA
regression line is good approximation of
population parameter
Homogeneity of Variance in Arrays
Assumes that variance of criterion is equal for
all levels of predictor(s)
Sound familiar?
Variance of DV equal for all levels of IV(s)

42
Correlation/Regression

Correlation Regression can also answer other
kinds of questions
Can test difference between 2 independent r s/b
s
ra b gt rc d
Is the correlation between depression and anxiety
using the BDI and BAI larger than the same
correlation using the MASQ-AD and MASQ-AA
subscales?

43
Correlation/Regression

Can test difference between 2 dependent r s/b s
ra b gt rb c
Is the correlation between rumination and
depression as high as between rumination and
generalized anxiety?
Is the correlation between rumination and
depression _at_ Time 1 the same at Time 2, 4 weeks
later?
Dont worry about how to do calculations by hand

Write a Comment

User Comments (0)