Title: Canonical Correlation Analysis for Feature Reduction
1Canonical Correlation Analysis for Feature
Reduction
- Jieping Ye
- Department of Computer Science and Engineering
- Arizona State University
- http//www.public.asu.edu/jye02
2Outline of lecture
- Overview of feature reduction
- Canonical Correlation Analysis (CCA)
- Nonlinear CCA using Kernels
- Applications
3Overview of feature reduction
- Feature reduction refers to the mapping of the
original high-dimensional data onto a
lower-dimensional space. - Criterion for feature reduction can be different
based on different problem settings. - Unsupervised setting reduce the information loss
- Supervised setting maximize the class
discrimination - Given a set of data points of p variables
- Compute the linear transformation
(projection)
4Overview of feature reduction
Original data
reduced data
Linear transformation
5Overview of feature reduction
- Unsupervised
- Latent Semantic Indexing (LSI) truncated SVD
- Principal Component Analysis (PCA)
- Canonical Correlation Analysis (CCA)
- Supervised
- Linear Discriminant Analysis (LDA)
- Semi-supervised
- Research topic
6Outline of lecture
- Overview of feature reduction
- Canonical Correlation Analysis (CCA)
- Nonlinear CCA using Kernels
- Applications
7Outline of lecture
- Overview of feature reduction
- Canonical Correlation Analysis (CCA)
- Nonlinear CCA using Kernels
- Applications
8Canonical Correlation Analysis (CCA)
- CCA was developed first by H. Hotelling.
- H. Hotelling. Relations between two sets of
variates. Biometrika, 28321-377, 1936. - CCA measures the linear relationship between two
multidimensional variables. - CCA finds two bases, one for each variable, that
are optimal with respect to correlations. - Applications in economics, medical studies,
bioinformatics and other areas.
9Canonical Correlation Analysis (CCA)
- Two multidimensional variables
- Two different measurement on the same set of
objects - Web images and associated text
- Protein (or gene) sequences and related
literature (text) - Protein sequence and corresponding gene
expression - In classification feature vector and class label
- Two measurements on the same object are likely to
be correlated. - May not be obvious on the original measurements.
- Find the maximum correlation on transformed space.
10Canonical Correlation Analysis (CCA)
Correlation
Transformed data
measurement
transformation
11Problem definition
- Find two sets of basis vectors, one for x and
the other for y, such that the correlations
between the projections of the variables onto
these basis vectors are maximized.
Given
Compute two basis vectors
12Problem definition
- Compute the two basis vectors so that the
correlations of the projections onto these
vectors are maximized.
13Algebraic derivation of CCA
The optimization problem is equivalent to
where
14Algebraic derivation of CCA
Maximization of the correlation is equivalent to
the minimization of the distance.
15Algebraic derivation of CCA
The optimization problem is equivalent to
16Algebraic derivation of CCA
17Algebraic derivation of CCA
It can be rewritten as follows
Generalized eigenvalue problem
18Algebraic derivation of CCA
Next consider the second set of basis vectors
Additional constraint
second eigenvector
19Algebraic derivation of CCA
- In general, the k-th basis vectors are given by
the kth eigenvector of - The two transformations are given by
20Outline of lecture
- Overview of feature reduction
- Canonical Correlation Analysis (CCA)
- Nonlinear CCA using Kernels
- Applications
21Nonlinear CCA using Kernels
Key rewrite the CCA formulation in terms of
inner products.
Only inner products Appear
22Nonlinear CCA using Kernels
Recall that
Apply the following nonlinear transformation on x
and y
Define the following Two kernels
23Nonlinear CCA using Kernels
Define the Lagrangian as follows
Take the derivatives and set to 0
24Nonlinear CCA using Kernels
Two limitations overfitting and singularity
problem. Solution apply regularization technique
to both x and y.
The solution is given by computing the following
eigen-decomposition
25Outline of lecture
- Overview of feature reduction
- Canonical Correlation Analysis (CCA)
- Nonlinear CCA using Kernels
- Applications
26Applications in bioinformatics
- CCA can be extended to multiple views of the data
- Multiple (larger than 2) data sources
- Two different ways to combine different data
sources - Multiple CCA
- Consider all pairwise correlations
- Integrated CCA
- Divide into two disjoint sources
27Applications in bioinformatics
Source Extraction of Correlated Gene Clusters
from Multiple Genomic Data by Generalized Kernel
Canonical Correlation Analysis.
ISMB03 http//cg.ensmp.fr/vert/publi/ismb03/ismb
03.pdf
28Applications in bioinformatics
- It is crucial to investigate the correlation
which exists between multiple biological
attributes, and eventually to use this
correlation in order to extract biologically
meaningful features from heterogeneous genomic
data. - A correlation detected between multiple datasets
is likely to be due to some hidden biological
phenomenon. Moreover, by selecting the genes
responsible for the correlation, one can expect
to select groups of genes which play a special
role in or are affected by the underlying
biological phenomenon.
29Next class
- Topic
- Manifold learning
- Reading
- A global geometric framework for nonlinear
dimensionality reduction - Tenenbaum JB, de Silva V., and Langford JC
- Science, 290 23192323, 2000
-
- Nonlinear Dimensionality Reduction by Locally
Linear Embedding - Roweis and Saul
- Science, 2323-2326, 2000