Title:
1Principal Component Regression Analysis
Pseudo Inverse Heisenberg Uncertainty for
Data Mining Explicit Principal Components
Implicit Principal Components NIPALS Algorithm
for Eigenvalues and Eigenvectors Scripts -
PCA transformation of data - Pharma-plots -
PCA training and testing - Bootstrap PCA -
NIPALS and other PCA algorithms Examples
Feature selection
2Classical Regression Analysis
Pseudo inverse Penrose inverse Least-Squares
Optimization
3The Machine Learning Paradox
If data are can learned from, they must have
redundancy If there is redundancy, (XTX)-1 is
ill-conditioned - similar data patterns
- closely correlated descriptive features
4Beyond Regression
- Paul Werbos motivated beyond regression in 1972
- In addition, there are related statistical
duals (PCA, PLS, SVM) - Principal component analysis
h Principal components
- Trick eliminate poor conditioning by using h
PCs (largest ?)
- Now matrix to invert is small and
well-conditioned - Generally include 2 - 4 - 6 PCAs
- A Better PCA Regression is PLS (Please Listen to
Savanti Wold) - A Better PLS is nonlinear PNLS
5Explicit PCA Regression
- We had
- Assume we derive PCA features for A according to
- We now have
h Principal components
6Explicit PCA Regression on training/test set
- We have for training set
- And for the test set
7Implicit PCA Regression
h Principal components
How to apply? Calculate T and B with NIPALS
algorithm Determine b,
and apply to data matrix
8Algorithm
h Principal components
The B matrix is a matrix of eigenvectors of the
correlation matrix C If the features are zero
centered we have We only consider the h
eigenvectors corresponding to largest
eigenvalues The eigenvalues are the variances
Eigenvectors are normalized to 1 and solutions
of Use NIPALS algorithm to build up B and T
9NIPALS Algorithm Part 2
h Principal components
10PRACTICAL TIPS FOR PCA
- NIPALS algorithm assumes the features are zero
centered - It is standard practice to do a Mahalanobis
scaling of the data - PCA regression does not consider the response
data - The ts are called the scores
- Use 3-10 PCAs
- I usually use 4 PCAs
- It is common practice to drop 4 sigma outlier
features - (if there are many features)
11PCA with Analyze
- Several options option 17 for training and 18
for testing - (the weight vectors after training is in file
bbmatrixx.txt) - The file num_eg.txt contains a number equal to
PCAs - Option 17 is the NIPALS algorithm and generally
faster than 17 - SAnalyze has options for calculating Ts, Bs
and ?s - - option 36 transforms a data matrix to its
PCAs - - option 36 also saves eigenvalues and
eigenvectors of XTX - Analyze has also option for bootstrap PCA (-33)
12StripMiner Scripts
- last lecture iris_pca.bat (make PCAs and
visualize) - iris.bat (split up data in training and
validation set and predict) - iris_boot.bat (bootstrap prediction)
13Bootstrap Prediction (iris_boo.bat)
- Make different models for training set
- Predict Test set on average model
14Neural Network Interpretation of PCA
15PCA in DATA SPACE
Means that the similarity score with each data
point will be weighed (i.e.., effectively
incorporating Mahalanobis scaling in data space)
S
S
x1
This layer gives a similarity score with each
datapoint
S
S
. . .
S
S
S
xi
S
S
Kind of a nearest neighbor weighted prediction
score
xM
S
Weights correspond to H eigenvectors corresponding
to largest eigenvalues of XTX
Weights correspond to the dependent variable for
the entire training data
S
Weights correspond to the scores or PCAs for
the entire training set