Correlation - PowerPoint PPT Presentation

About This Presentation
Title:

Correlation

Description:

Correlation & Regression Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV s on a DV i.e. differences between males ... – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 44
Provided by: Michael1749
Category:

less

Transcript and Presenter's Notes

Title: Correlation


1
Correlation Regression
2
Correlation
  • T-tests and ANOVA examine the mean differences
    between two levels of one or more IVs on a DV
  • i.e. differences between males and females (2
    levels of the IV gender) on exam scores
  • What if instead of average differences we were
    more interested in the relationship between two
    variables?
  • relationship how one variable changes as a
    function of another variable

3
Correlation
  • i.e. the relationship between anxiety prior to a
    medical procedure and the patients post-op
    recovery
  • This type of question concerns what is called a
    correlation
  • Correlation relationship between two variables
  • NOTE if we were looking at average post-op
    recovery (the DV) in groups both high and low in
    pre-op anxiety (2 levels of the IV anxiety), we
    would be looking at mean differences, and an
    ANOVA would be more appropriate than correlation

4
Correlation
  • The easiest means of representing this
    relationship/correlation is via the use of a
    scatterplot
  • Scatterplot a graph in which the individual
    data points are plotted in two-dimensions

5
Correlation
  • Predictor Variable traditionally the variable
    on the x-axis (in this case Depression)
  • Criterion Variable traditionally the variable
    on the y-axis (in this case Pessimism)
  • Best-Fit Line/Regression Line the line that
    represents the area in space that each data point
    is minimally distant from/that best represents
    the data

6
Correlation
  • Regression Line
  • Best fit line that minimizes average distance
    from all data points (i.e. residuals)
  • Residual Amount that data point deviates from
    this line

7
Correlation
  • It is important to note that although the
    predictor is usually the variable on the x-axis,
    and the criterion on the y-axis, that often these
    definitions are not adhered to and the variables
    are named randomly
  • Also, because one variable is called the
    predictor does not mean that it predicts the
    criterion in the sense that it can tell you what
    the criterion is before it occurs
  • i.e. to say that depression predicts pessimism
    does not mean that depression comes first and
    causes you to be pessimistic!

8
Correlation
  • Correlation does not equal causation!
  • the only way that you can say that one variable
    predicts another in time is through the design of
    your experiment
  • if depression were assessed in January and
    pessimism were assessed in December, and the two
    were found to be related, then you can say that
    one predicts the other in time
  • statistical prediction ? prediction
  • if the two variables were measured at the same
    time, we do not know which one caused the other
    one

9
Correlation
  • to determine causation (that one variable caused
    another) we need to show several things
  • that the predictor preceded the criterion in time
    (this also shows that the criterion did not cause
    the predictor)
  • that other variables did not cause both the
    criterion and the predictor at the same time,
    resulting in their relationship
  • IV DV
  • Var 1

10
Correlation
  • i.e. if we were studying the relationship
    (correlation) between two variables the length
    of grass and ice cream consumption
  • If they were measured simultaneously it would be
    impossible to tell which caused which
  • If both were measured at two time points, July
    and December, we would find that they both
    increase and decrease at the same time (i.e. one
    does not seem to cause the other) no causation
  • If we measured temperature as well, we would find
    that both are correlated because increases in
    temperature causes both, which explains why the
    increase and decrease at the same time

11
Correlation
  • Correlation is represented by the Pearson
    Product-Moment Correlation Coefficient (r)
  • can range from -1 to 1, where 1 represents a
    strong positive relationship, -1 a strong
    negative relationship, and 0 no relationship
    between the two variables
  • both strong positive and negative relationships
    are, none-the-less, robust relationships and are
    generally meaningful a negative relationship is
    not bad
  • only used when the two variables are
    continuous/dimensional

12
Correlation
  • Positive Relationship (r .82)
  • As BDI2TOT increases, MASQGDD also increases

13
Correlation
  • Negative Relationship (r -.679)
  • As MASQAD increases, TMMSREP decreases

14
Correaltion
  • No Relationship (r .00)
  • Information about Explanatory Flexibility tells
    you nothing about Emotional Insight

15
Correlation
  • Pearsons r is heavily reliant on the covariance
  • covxy
  • If variance
  • then cov is just average variability in both x
    and y

16
Correlation
  • Error variance average amount each point
    deviates from best-fit line standard error of
    the estimate sy.x
  • sy.x
  • If Y is point on best fit line (predicted value
    of Y), then sy.x standard deviation of
    residuals or variance of residuals/error error
    variance

17
Correlation
  • Pearsons r covxy/sxsy
  • Correlation amount of shared variability/v(total
    variability)
  • Since its like a , r ranges from 0 (-)1.00
  • In fact, by squaring r (r2) variability that
    is shared between x and y
  • Previous example of BDI2 and MASQGDD, r .82 r2
    .67 ? 67 of variance in BDI2 is predicted by
    MASQGDD

18
Correlation
  • Hypotheses in Correlation
  • H0 ? 0
  • ? (rho) correlation in population (parameter)
  • H1 ? ? 0

19
Correlation
  • Assumptions of Correlation (Pearsons r)
  • Nonlinear/Curvilinear Relationships
  • If the relationship between the two variables is
    not linear, and is instead U-shaped or
    bell-shaped (like our normal distribution), our
    attempts at finding a best-fit line will fail,
    and it will seem as though our two variables are
    unrelated (r will approximate 0), when in fact
    the relationship exists, but is nonlinear

20
Correlation
  • Above is an example of a curvilinear
    relationship, although the two variables are
    clearly related, their correlation is only r
    -.205
  • Note how the best-fit line does not represent the
    data points well

21
Correlation
  • Assumptions of Correlation (Pearsons r)
  • Normality
  • Both variables must be normally distributed,
    otherwise correlation will appear smaller than it
    is
  • If our data is non-normal, correlation
    coefficients other than r can be used

22
Correlation
  • We can also calculate r if our data is ordinal
    instead of continuous/dimensional
  • Remember data on an ordinal scale is ranked,
    which means that we can tell that one number is
    higher than another, but not how much higher
    (interval scales have this), and there is no zero
    point (ratio scales have this) i.e. 1st place,
    2nd place, etc. ordinal data
  • Correlation here is represented by Spearmans rs
  • Difference between r and rs is that rs requires
    that the data be monotonic, or constantly rising
    or falling if data are arranged in rank order,
    they can only go up or down, you cant go from
    1st place to 9th place to 2nd place if the places
    are arranged in order

23
Correlation
  • Other correlation coefficients
  • The Point Biserial Correlation coefficient (rpb)
    - If one variable is continuous/dimensional and
    the other dichotomous (a nominal scale where the
    variable can take only two possible values)
  • Dichotomous variables e.g. Gender
    (Male/Female), Yes/No answers, Race (if it is
    coded as Caucasian or Minority), etc.

24
Correlation
  • Other correlation coefficient
  • Phi (F) when both variables are dichotomous

25
Correlation
  • Factors that bias correlation coefficients
  • Range Restriction
  • Typically, restricting range reduces correlations
  • Full Dataset (r .82) Only BDI gt 30 (r .490)

26
Correlation
  • However, restricting range increases correlations
    if the relationship is curvilinear because it
    makes the variable linear
  • Full Dataset (r -.205) Only Var1 5 (r
    -.982)

27
Correlation
  • Problems of range restriction are common in
    psychological research, because researchers want
    their group to be as different from each other as
    possible to increase the effect sizes that they
    obtain
  • Remember The formula for effect size for ANOVA
    (Cohens d) is the mean for Group 1 the mean
    for Group 2 divided by the sp
  • To get highly different groups, researchers
    sample those high and low on a particular
    variable
  • I.e. comparing those highest on aggression to
    those lowest on aggression
  • This is identical to only looking at BDI2 scores
    higher than 30, when looking at the full range of
    scores, correlations will be more accurate

28
Correlation
  • Factors that bias correlation coefficients
  • Heterogenous Subsamples
  • This is a problem when there is an interaction
    present (i.e. our age by gender interaction
    mentioned in the discussion of Factorial ANOVA)

29
  • If males performance increases as they age, and
    womens performance remains the same, when the
    two genders are averaged together and age and
    performance are correlated regardless of gender,
    the correlation will be smaller
  • Strong correlation of age and performance for
    males weak correlation of age and performance
    for females biased correlation when the two are
    added together

30
Correlation
  • Factors that bias correlation coefficients
  • Outliers
  • No Outliers (r .989) Outlier (r .522)

31
Correlation
  • Testing correlations for significance
  • just like t- and F-statistics, r-statistics can
    be tested for significance
  • just like t- and F-statistics, with increasing
    sample size (n), smaller correlations (rs) will
    be significant
  • with 25 people, r .396 is significant at p lt
    .05, with 1000 people you only need an r .062
    (see Table E.2, page 515 in your text)

32
Correlation
  • Testing correlations for significance
  • the r-statistic is also its own, built-in effect
    size statistic
  • Cohens conventions for r .1 small, .3
    medium, and .5 large effects
  • by squaring r (r2), you also get a relatively
    unbiased effect size estimate that is interpreted
    identically to ?2 and ?2
  • Remember ?2 and ?2 represent the percent of
    variability in one variable accounted for by the
    other

33
Correlation
  • Testing correlations for significance
  • Therefore, if
  • r .5, p .00001, you can state that your two
    variables are strongly (effect size) and reliably
    (p-value) related
  • r .5, p .65, you can conclude that your two
    variables are strongly related, but that you
    probably didnt have enough subjects for this to
    be represented in your p-value
  • r .1, p .00001, you can conclude that large
    sample size inflated your p-value, and your
    variables are probably not related
  • r .1, p .65, you can conclude that your two
    variables are neither strongly nor reliably
    related

34
Regression
  • The best-fit line allows us to make educated
    guesses about what a score is on one variable
    given a score on the other
  • Extrapolate make educated guesses what a score
    would be that is either higher or lower than any
    actual score obtained
  • Interpolate make educated guesses what a score
    would be that is in the range of the scores
    obtained, but that was not actually obtained

35
Regression
  • Range of scores on Depression 0 49
  • Range of scores on Pessimism 1 7
  • Extrapolation What pessimism score would be
    associated with a depression score of 50? (6.8)
  • Interpolation What pessimism score would be
    associated with a depression score of 45? (5.5)

36
Regression
  • Interested in linear relationship between 2
    variables use correlation
  • Interested in linear relationship(s) between 3
    dimensional variables regression
  • DV Symptoms of paranoia
  • IV Treatment vs. Control groups ? ANOVA
  • IV discrete (dichotomous/polychotomous)
  • IV of sessions of treatment ? Regression
  • IV dimensional/continuous

37
Regression
  • DV Criterion, IVs Predictors
  • Criterion b1x1 b2x2 b3x3 a
  • x1 predictor 1 b1 slope of x1 and DV a
    intercept Slope rate of change
  • b .75 1 pt. increase in IV associated with
    .75 pt. increase in DV
  • I.e. for every 1 pt. increase in pessimism, Dep
    increases .75 pt.

38
Regression
  • Slope
  • Slope w/ raw data b
  • I.e. b .45 in prediction of GPA from IQ ? 1 pt.
    increase in IQ associated with ½ pt. increase in
    GPA
  • Slope w/ standardized data ß
  • Standardize data (i.e. convert to z-score) to
    compare slopes between experiments
  • ß bxs/sintercept
  • I.e. ß .53 ? 1 s.d. increase in IQ associated
    with ½ s.d. increase in GPA
  • b more interpretable if scale of variables is
    meaningful
  • Intercept value of DV when IV 0
  • In previous ex., Pess 3 when Dep 0, so a
    3

39
Regression
  • Regression can test
  • The overall ability of all of your IVs to
    predict your criterion (overall model/omnibus R2)
  • The ability of each IV to predict your criterion
    (b or ß)
  • Each of these statistics is associated with a
    p-value tested for significance
  • Can also be used to make predictions based on
    best-fit/regression line (less common)

40
Regression
  • Hypotheses in Regression
  • H0 b/ß/R2 (in population) 0
  • H1 b/ß/R2 (in population) ? 0

41
Regression
  • Assumptions of Regression
  • Linearity of Regression
  • Variables linearly related to one another
  • Normality in Arrays
  • Actual values of DV normally distributed around
    predicted values (i.e. regression line) AKA
    regression line is good approximation of
    population parameter
  • Homogeneity of Variance in Arrays
  • Assumes that variance of criterion is equal for
    all levels of predictor(s)
  • Sound familiar?
  • Variance of DV equal for all levels of IV(s)

42
Correlation/Regression
  • Correlation Regression can also answer other
    kinds of questions
  • Can test difference between 2 independent r s/b
    s
  • ra b gt rc d
  • Is the correlation between depression and anxiety
    using the BDI and BAI larger than the same
    correlation using the MASQ-AD and MASQ-AA
    subscales?

43
Correlation/Regression
  • Can test difference between 2 dependent r s/b s
  • ra b gt rb c
  • Is the correlation between rumination and
    depression as high as between rumination and
    generalized anxiety?
  • Is the correlation between rumination and
    depression _at_ Time 1 the same at Time 2, 4 weeks
    later?
  • Dont worry about how to do calculations by hand
Write a Comment
User Comments (0)
About PowerShow.com