Canonical Correlation - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Canonical Correlation

Description:

Canonical Correlation. Equation is big brother to little r and multiple regression. little r y = x ... regression y = x1 x2 x3. canonical correlation (Rc) ... – PowerPoint PPT presentation

Number of Views:413
Avg rating:3.0/5.0
Slides: 21
Provided by: ValuedGate849
Category:

less

Transcript and Presenter's Notes

Title: Canonical Correlation


1
Canonical Correlation
  • Equation is big brother to little r and multiple
    regression
  • little r ? y x
  • regression ? y x1 x2 x3
  • canonical correlation (Rc) y1y2y3 x1x2x3
  • Analyzing the relation between 2 sets of
    variables
  • generally no IVs or DVs
  • e.g., facets of Neuroticism (N) health
  • set 1 N anxiety, vulnerability, worry
  • set 2 Health depression, well-being, physical
  • variables on both sides are combined in an
    optimal way to maximize the relationships
    between the 2 sides

2
The Model
anxiety
depression
Rc
Health
N
vulnerable
well-being
worry
physical
3
Canonical Correlation
  • linear combination of variables on each side
    creates a new variable
  • referred to as a canonical variate (CV) or a
    synthetic variable
  • each CV represents a dimension
  • e.g., N on one-side and health on the other
  • examine the correlation between CVs
  • CVs come in pairs, of which you can have multiple
  • however, individual variables may differentially
    contribute to an individual CV

4
Things we can do with this technique
  • Number of CV pairs
  • can have zero if variables are unrelated
  • typically have at least one, but you can have
    multiple
  • first CV pair is the most reliable
  • i.e., maximizes the correlation between a CV pair
  • e.g.,
  • 1 CV pair ?? N (for set 1) and Health (for set 2)
  • 2 CV pairs ??
  • pair 1 Emotional N and Psychological Health
  • pair 2 Cognitive N and Physical Health

5
Things we can do with this technique
  • Interpretation of CVs
  • what is their meaning?
  • Importance of CVs
  • how strongly are variables on one side related
    to each other
  • how strongly are variables related to variables
    on the other side
  • Canonical variate scores

6
The Process
  • First evaluate R11, R22, R12 (R21)
  • 1 2 3 4 5 6
  • anxiety (1) --- .70 .50 .30 -.35 -.20
  • vulnerability (1) --- .40 .20 -.25 -.30
  • worry (1) --- .40 -.35 -.35
  • depression (2) --- -.75 -.50
  • well-being (2) --- .45
  • physical (2) ---

7
The Process
  • From these matrices, linear combinations of
    variables are formed
  • reducing the variables sets into CVs
  • these CVs maximize Rc, using canonical weights
    (we will talk about these shortly)
  • squaring Rc gives a familiar index (yes?)
  • SPSS calls this the Squared Correlation
  • indication of overlapping variance in a CV pair
  • do you smell an index of effect size here?!

8
The Process
  • Wilks Lambda (?) or Bartlett's Test is used to
    determine if Rc is statistically significant
  • both test statistics are distributed as a ?2
  • df
  • first CV pair ( vars. in set 1)( vars. in set
    2)
  • second CV pair ( vars. in set 1 - 1)( vars.
    in set 2 - 1)
  • Wilk's ? error variance / total variance
  • for now
  • low ? values are goodwhy???
  • ?2 1 - ? ? is another measure of effect size

9
Overall Significance Tests
  • Statistically determine how many CVs
  • possible CV pairs variables in smallest
    variable set
  • first, the strongest CV is tested (highest Rc)
  • if significant, at least first CV pair is
    significant
  • if not significant, your linear combos are bad
  • second test
  • removes first CV pair, conducted on residual
    correlation matrices
  • determines if a second CV pair is significant
  • orthogonal to first CV pair
  • if significant, the second CV pair adds something
    unique
  • if not significant, only interpret the first CV

10
Interpreting Overall Tests and Relations between
CVs and individual variables
  • Technically what the statistical tests are doing
  • the first test is actually testing if "all" CV
    pairs explain significant variance
  • e.g., CV pairs 1, 2, and 3
  • the second test is actually testing if "all but
    the first" CV pairs explain significant variance
  • and similarly for the third test
  • if the first test is significant but the second
    test is not we infer that only the 1st CV pair is
    important

11
Interpreting Overall Tests and Relations between
CVs and individual variables
  • Look at Rc, Rc2, and ?2
  • Rc .30, practical significance is met
  • Rc2 and ?2 accounts for 9 of the variance
  • Now we can interpret the relations between
  • individual variables and the CV pair
  • variance accounted for by each CV and its own set
  • variance accounted for by each CV and the other
    set

12
Relations between CVs and individual variables
  • Canonical coefficients (or weights)
  • unique contribution of each variable to its CV
  • can be either raw or standardized
  • e.g., canonical weight matrix for first CV pair

Variables N__
Health__ Anxiety .50 Vulnerable
.30 Worry .05 Depression -.32
Well-Being .31 Physical
.05
13
Relations between CVs and individual variables
  • Correlations between variables and CV pairs are
    called loadings
  • e.g., loading or structure matrix for 2
    hypothetical CV pairs

Variables 1___ 2__ For N
Anxiety .85 .05 Vulnerable .70
.25 Worry .25 .80 For Health
Depression -.90 .02 Well-Being
.80 .23 Physical .20
-.50
14
Relations between CVs and variables for 1st CV
pair
anxiety
depression
.85
-.90
-.60
Health
N
.80
.70
vulnerable
well-being
.25
.20
worry
physical
15
Relations between CVs and individual variables
  • Canonical Adequacy Coefficient (CAC)
  • proportion of variance extracted by CV in
    intradomain variables
  • i.e., for own set of variables (same-set)
  • CAC ? ( loadings2 / variables in set )
  • CV (N) (.852 .702 .252 / 3)
    .42
  • CV (Health) (- .902 .802 .202 ) / 3 )
    .50
  • Are we happy with this?
  • FYI Thompson hates this index

16
Relations between CVs and individual variables
  • Redundancy (Red)
  • proportion of variance extracted by CV for
    other-set of variables
  • Red ? ( loadings2 / variables in own set )
    (Rc2)
  • remember Rc2 (-.602) .36
  • CV (Health) ( .852 .702 .252 / 3) ( .36 )
    .15
  • CV (N) (- .902 .802 .202 ) / 3 )
    ( .36 ) .18
  • Are we happy with this?

17
Summary of 1-CV Pair Solution
  • We found that one canonical variate pair
    explained data
  • first CV represents N, the second CV represents
    Health
  • The canonical correlation (- .60 ) and overall
    variance accounted ( .36 ) were fairly strong
  • At the variable level
  • anxiety vulnerability loaded on N
  • depression well-being loaded on health
  • Variables accounted for an appreciable amount of
    variance in their own CV and the other CV

18
Practical Issues
  • Normality, linearity, multicollinearity/singularit
    y are key
  • because correlational data is what we have!
  • Be aware of sample size issues
  • want 15-20 cases per variable
  • can have fewer if your variables are highly
    reliable
  • e.g., reliability .80
  • if lacking
  • bootstrap canonical correlational analysis
  • e.g., CANSTRAP programs

19
Practical Issues continued
  • Calculate canonical variate scores
  • represent scores on the synthetic variable
  • multiply an individuals standardized score
  • for a variable to its canonical weight
  • 2. then sum across all of these products
  • using the canonical weight matrix (slide 12)

Variables N__
z-score__ Anxiety .50 2.50 Vulnerability
.30 2.00 Worry .05 1.50
CV score for N (.50)(2.50) (.30)(2)
(.05)(1.5) 1.93
This variable, then, can be used in other
analyses!
20
Practical Issues Continued
  • Types of variables
  • generally continuous
  • but you can use binary
  • Other follow-up analyses
  • regression
  • regress each individual variable from one set on
    all variables from the other set simultaneously
  • e.g., anxiety ON depression, well-being, physical
    health
Write a Comment
User Comments (0)
About PowerShow.com