- PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Description:

Principal Component Regression Analysis Pseudo Inverse Heisenberg Uncertainty for Data Mining Explicit Principal Components Implicit Principal ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 16
Provided by: MarkEmb8
Learn more at: https://www.rpi.edu
Category:

less

Transcript and Presenter's Notes

Title:


1
Principal Component Regression Analysis
Pseudo Inverse Heisenberg Uncertainty for
Data Mining Explicit Principal Components
Implicit Principal Components NIPALS Algorithm
for Eigenvalues and Eigenvectors Scripts -
PCA transformation of data - Pharma-plots -
PCA training and testing - Bootstrap PCA -
NIPALS and other PCA algorithms Examples
Feature selection
2
Classical Regression Analysis
Pseudo inverse Penrose inverse Least-Squares
Optimization
3
The Machine Learning Paradox
If data are can learned from, they must have
redundancy If there is redundancy, (XTX)-1 is
ill-conditioned - similar data patterns
- closely correlated descriptive features
4
Beyond Regression
  • Paul Werbos motivated beyond regression in 1972
  • In addition, there are related statistical
    duals (PCA, PLS, SVM)
  • Principal component analysis

h Principal components
  • Trick eliminate poor conditioning by using h
    PCs (largest ?)
  • Now matrix to invert is small and
    well-conditioned
  • Generally include 2 - 4 - 6 PCAs
  • A Better PCA Regression is PLS (Please Listen to
    Savanti Wold)
  • A Better PLS is nonlinear PNLS

5
Explicit PCA Regression
  • We had
  • Assume we derive PCA features for A according to
  • We now have

h Principal components
6
Explicit PCA Regression on training/test set
  • We have for training set
  • And for the test set

7
Implicit PCA Regression
h Principal components
How to apply? Calculate T and B with NIPALS
algorithm Determine b,
and apply to data matrix
8
Algorithm
h Principal components
The B matrix is a matrix of eigenvectors of the
correlation matrix C If the features are zero
centered we have We only consider the h
eigenvectors corresponding to largest
eigenvalues The eigenvalues are the variances
Eigenvectors are normalized to 1 and solutions
of Use NIPALS algorithm to build up B and T
9
NIPALS Algorithm Part 2
h Principal components
10
PRACTICAL TIPS FOR PCA
  • NIPALS algorithm assumes the features are zero
    centered
  • It is standard practice to do a Mahalanobis
    scaling of the data
  • PCA regression does not consider the response
    data
  • The ts are called the scores
  • Use 3-10 PCAs
  • I usually use 4 PCAs
  • It is common practice to drop 4 sigma outlier
    features
  • (if there are many features)

11
PCA with Analyze
  • Several options option 17 for training and 18
    for testing
  • (the weight vectors after training is in file
    bbmatrixx.txt)
  • The file num_eg.txt contains a number equal to
    PCAs
  • Option 17 is the NIPALS algorithm and generally
    faster than 17
  • SAnalyze has options for calculating Ts, Bs
    and ?s
  • - option 36 transforms a data matrix to its
    PCAs
  • - option 36 also saves eigenvalues and
    eigenvectors of XTX
  • Analyze has also option for bootstrap PCA (-33)

12
StripMiner Scripts
  • last lecture iris_pca.bat (make PCAs and
    visualize)
  • iris.bat (split up data in training and
    validation set and predict)
  • iris_boot.bat (bootstrap prediction)

13
Bootstrap Prediction (iris_boo.bat)
  • Make different models for training set
  • Predict Test set on average model

14
Neural Network Interpretation of PCA
15
PCA in DATA SPACE
Means that the similarity score with each data
point will be weighed (i.e.., effectively
incorporating Mahalanobis scaling in data space)
S
S
x1
This layer gives a similarity score with each
datapoint
S
S
. . .
S
S
S
xi
S
S
Kind of a nearest neighbor weighted prediction
score
xM
S
Weights correspond to H eigenvectors corresponding
to largest eigenvalues of XTX
Weights correspond to the dependent variable for
the entire training data
S
Weights correspond to the scores or PCAs for
the entire training set
Write a Comment
User Comments (0)
About PowerShow.com