Canonical Correlation

About This Presentation

Title:

Canonical Correlation

Description:

Canonical Correlation. Equation is big brother to little r and multiple regression. little r y = x ... regression y = x1 x2 x3. canonical correlation (Rc) ... – PowerPoint PPT presentation

Number of Views:413

Avg rating:3.0/5.0

Slides: 21

Provided by: ValuedGate849

Category:

more less

Transcript and Presenter's Notes

Title: Canonical Correlation

1
Canonical Correlation

Equation is big brother to little r and multiple
regression
little r ? y x
regression ? y x1 x2 x3
canonical correlation (Rc) y1y2y3 x1x2x3
Analyzing the relation between 2 sets of
variables
generally no IVs or DVs
e.g., facets of Neuroticism (N) health
set 1 N anxiety, vulnerability, worry
set 2 Health depression, well-being, physical
variables on both sides are combined in an
optimal way to maximize the relationships
between the 2 sides

2
The Model
anxiety
depression
Rc
Health
N
vulnerable
well-being
worry
physical
3
Canonical Correlation

linear combination of variables on each side
creates a new variable
referred to as a canonical variate (CV) or a
synthetic variable
each CV represents a dimension
e.g., N on one-side and health on the other
examine the correlation between CVs
CVs come in pairs, of which you can have multiple
however, individual variables may differentially
contribute to an individual CV

4
Things we can do with this technique

Number of CV pairs
can have zero if variables are unrelated
typically have at least one, but you can have
multiple
first CV pair is the most reliable
i.e., maximizes the correlation between a CV pair
e.g.,
1 CV pair ?? N (for set 1) and Health (for set 2)
2 CV pairs ??
pair 1 Emotional N and Psychological Health
pair 2 Cognitive N and Physical Health

5
Things we can do with this technique

Interpretation of CVs
what is their meaning?
Importance of CVs
how strongly are variables on one side related
to each other
how strongly are variables related to variables
on the other side
Canonical variate scores

6
The Process

First evaluate R11, R22, R12 (R21)

1 2 3 4 5 6
anxiety (1) --- .70 .50 .30 -.35 -.20
vulnerability (1) --- .40 .20 -.25 -.30
worry (1) --- .40 -.35 -.35
depression (2) --- -.75 -.50
well-being (2) --- .45
physical (2) ---

7
The Process

From these matrices, linear combinations of
variables are formed
reducing the variables sets into CVs
these CVs maximize Rc, using canonical weights
(we will talk about these shortly)
squaring Rc gives a familiar index (yes?)
SPSS calls this the Squared Correlation
indication of overlapping variance in a CV pair
do you smell an index of effect size here?!

8
The Process

Wilks Lambda (?) or Bartlett's Test is used to
determine if Rc is statistically significant
both test statistics are distributed as a ?2
df
first CV pair ( vars. in set 1)( vars. in set
2)
second CV pair ( vars. in set 1 - 1)( vars.
in set 2 - 1)
Wilk's ? error variance / total variance
for now
low ? values are goodwhy???
?2 1 - ? ? is another measure of effect size

9
Overall Significance Tests

Statistically determine how many CVs
possible CV pairs variables in smallest
variable set
first, the strongest CV is tested (highest Rc)
if significant, at least first CV pair is
significant
if not significant, your linear combos are bad
second test
removes first CV pair, conducted on residual
correlation matrices
determines if a second CV pair is significant
orthogonal to first CV pair
if significant, the second CV pair adds something
unique
if not significant, only interpret the first CV

10
Interpreting Overall Tests and Relations between
CVs and individual variables

Technically what the statistical tests are doing
the first test is actually testing if "all" CV
pairs explain significant variance
e.g., CV pairs 1, 2, and 3
the second test is actually testing if "all but
the first" CV pairs explain significant variance
and similarly for the third test
if the first test is significant but the second
test is not we infer that only the 1st CV pair is
important

11
Interpreting Overall Tests and Relations between
CVs and individual variables

Look at Rc, Rc2, and ?2
Rc .30, practical significance is met
Rc2 and ?2 accounts for 9 of the variance
Now we can interpret the relations between
individual variables and the CV pair
variance accounted for by each CV and its own set
variance accounted for by each CV and the other
set

12
Relations between CVs and individual variables

Canonical coefficients (or weights)
unique contribution of each variable to its CV
can be either raw or standardized
e.g., canonical weight matrix for first CV pair

Variables N__
Health__ Anxiety .50 Vulnerable
.30 Worry .05 Depression -.32
Well-Being .31 Physical
.05
13
Relations between CVs and individual variables

Correlations between variables and CV pairs are
called loadings
e.g., loading or structure matrix for 2
hypothetical CV pairs

Variables 1___ 2__ For N
Anxiety .85 .05 Vulnerable .70
.25 Worry .25 .80 For Health
Depression -.90 .02 Well-Being
.80 .23 Physical .20
-.50
14
Relations between CVs and variables for 1st CV
pair
anxiety
depression
.85
-.90
-.60
Health
N
.80
.70
vulnerable
well-being
.25
.20
worry
physical
15
Relations between CVs and individual variables

Canonical Adequacy Coefficient (CAC)
proportion of variance extracted by CV in
intradomain variables
i.e., for own set of variables (same-set)
CAC ? ( loadings2 / variables in set )
CV (N) (.852 .702 .252 / 3)
.42
CV (Health) (- .902 .802 .202 ) / 3 )
.50
Are we happy with this?
FYI Thompson hates this index

16
Relations between CVs and individual variables

Redundancy (Red)
proportion of variance extracted by CV for
other-set of variables
Red ? ( loadings2 / variables in own set )
(Rc2)
remember Rc2 (-.602) .36
CV (Health) ( .852 .702 .252 / 3) ( .36 )
.15
CV (N) (- .902 .802 .202 ) / 3 )
( .36 ) .18
Are we happy with this?

17
Summary of 1-CV Pair Solution

We found that one canonical variate pair
explained data
first CV represents N, the second CV represents
Health
The canonical correlation (- .60 ) and overall
variance accounted ( .36 ) were fairly strong
At the variable level
anxiety vulnerability loaded on N
depression well-being loaded on health
Variables accounted for an appreciable amount of
variance in their own CV and the other CV

18
Practical Issues

Normality, linearity, multicollinearity/singularit
y are key
because correlational data is what we have!
Be aware of sample size issues
want 15-20 cases per variable
can have fewer if your variables are highly
reliable
e.g., reliability .80
if lacking
bootstrap canonical correlational analysis
e.g., CANSTRAP programs

19
Practical Issues continued

Calculate canonical variate scores
represent scores on the synthetic variable
multiply an individuals standardized score
for a variable to its canonical weight
2. then sum across all of these products
using the canonical weight matrix (slide 12)

Variables N__
z-score__ Anxiety .50 2.50 Vulnerability
.30 2.00 Worry .05 1.50
CV score for N (.50)(2.50) (.30)(2)
(.05)(1.5) 1.93
This variable, then, can be used in other
analyses!
20
Practical Issues Continued

Types of variables
generally continuous
but you can use binary
Other follow-up analyses
regression
regress each individual variable from one set on
all variables from the other set simultaneously
e.g., anxiety ON depression, well-being, physical
health

Write a Comment

User Comments (0)

About PowerShow.com

Canonical Correlation - PowerPoint PPT Presentation

Canonical Correlation

Canonical Correlation. Equation is big brother to little r and multiple regression. little r y = x ... regression y = x1 x2 x3. canonical correlation (Rc) ... – PowerPoint PPT presentation