Title: Pattern Recognition Lecture 9
1Pattern RecognitionLecture 9
- Methods for reducing the dimensionality of the
feature-space
2Topics
- Last time
- Evaluating features
- Dependencies between features
- Covariance and correlation
- Today
- Methods for exploiting the dependencies between
features - Reduce the number of features gt reduce the
dimensionality
3Reduce the number of features
- Why?
- The curse of dimensionality
- Visualization
- Remove noise (10 dependent features 1
independent) - Faster processing
- How?
- If features are correlated gt redundancy
- Remove redundancy
- Before the break
- Methods where we DONT consider the classes
- Unsupervised
- After the break
- Methods where we DO consider the classes
- Supervised
4Methods where we DONT consider the classes
- Unsupervised
- Ignore that samples comes
- from different classes
- Reduce the dimensionality (compression)
- Methods
- Hierarchical dimensionality reduction
- Principal component analysis (PCA)
5Hierarchical dimensionality reduction
- Correlation matrix
- Algorithm
- Calc. the correlation matrix
- Find max Cij
- Merge feature Fi and Fj
- Save merged feature as Fi
- Delete Fj
- Stop or go to 1)
- Stop criterion
- Max Cij is too small
- Number of dim. Ok
- Others
- Merge features
- Keep Fi and delete Fj
- (Fi Fj) / 2
- (w1Fi w2Fj) / 2
- Others
6Principal Component Analysis (PCA)
7Principal Component Analysis (PCA)
- Combines features into new features! and then
ignore some of the new features - PCA is used a lot, especially when you have many
dimensions - Basic idea Features with a large variance
separates the classes better - If both features have
- large variances then what?
- Transform the feature-space,
- so we get large variances and
- no correlation!
- Variance Information !
8PCA Transform
- Ignore y2 without loosing info when classifying
- y1 and y2 are the principal components
9PCA How to
- Collect data (x)
- Calc. the covariance matrix Cx
- Matlab Cx cov(x)
- Solve the Eigen-value problem gt A and Cy
- Matlab Evec,Eval eig(Cx)
- Transform x gt y y A (x-m)
- Analyze (PCA)
- M-method
- Variability measure from SEPCOR
- J-measure
10What to remember
- Feature reduction where we dont use the fact
that data comes from different classes - Unsupervised
- Hierarchical dimensionality reduction
- Correlation matrix
- Merge features or delete features
- Principal Component Analysis (PCA)
- Combine features into new features
- Ignore some of the new features
- VARIANCE INFORMATION
- Transform the feature-space from the
Eigen-vectors of the covariance matrix gt
uncorrelated features ! - Analyze
- M-method (no class info.)
- Use class info.
- Variability measure from SEPCOR
- J-measure
11Break
12Reduce the number of features
- Why?
- The curse of dimensionality
- Too many parameters to tune/train
- Visualization
- Remove noise (10 dependent features 1
independent) - Faster processing
- How?
- If features are correlated gt redundancy
- Remove redundancy
- Before the break
- Methods where we DONT consider the classes
- Unsupervised
- After the break
- Methods where we DO consider the classes
- Supervised
13Methods where we DO use the class information
- Supervised
- Use info. of classes and reduce the
dimensionality - Methods
- SEPCOR
- Linear Discriminant Methods
14SEPCOR
- Inspired from Hierarchical dimensionality
reduction - Method to choose the X best (most discriminative)
features - Idea combine Hierarchical dimensionality
reduction with class info. - SEPCOR separability correlation
- Principle
- Calc. a measure for how good (discriminative)
each feature, xi, is wrt. classification - Variability measure V(xi)
- Keep the most discriminative features, which have
a - low correlation with the other features
15SEPCOR Variability measure
V(xi)
- V(xi) large gt good feature wrt. classification
- That is large nominator and small denominator
x2
x1
V gtgt 1
V 1
V(x1) lt V(x2) gt x2 best
16SEPCOR The algorithm
- Make a list with features ordered after V-value
- Repeat until we have the desired number of
features or the list is empty - Remove and store the feature with largest V-value
- Find the correlation between the removed feature
and all the other features in the list - Ignore all features with correlation bigger than
MAXCOR
17Linear Discriminant Methods
18Linear Discriminant Methods
- Transform data to the new feature space
- Linear transform (rotation) yAx
- The transform is defined so that classification
becomes as easy as possible gt - Info discriminative power
- Fisher Linear Discriminant method
- Map data to one dimension
- Multiple Discriminant analysis
- Map data to a M-dimensional space
19Fisher Linear Discriminant
- Idea Map data to a line, y
- The orientation of the line is defined so that
the classes are as separated as possible - Transform y wTx, w is the direction of the
line, y - PCA w is defined as the 1st eigen-vector for the
covariance matrix (vis prob.)
20Fisher Linear Discriminant
Transformation y wTx Find w
21Fisher Linear Discriminant
- Transform
- y wTx
- Find w so that the following criterion function
is maximum
22Multiple Discriminant Analysis
- Generalized Fisher Linear Discriminant method
- N classes
- Mapped into a M-dimensional space (M lt N)
- Fx 3 points will span a plan
- Example with 3
- classes in 3D
- mapped into two
- different sub-spaces
23What to remember
- Feature reduction where we use the class info.
- Discriminative power information
- SEPCOR (ignore some of the features)
- Hierarchical dimensionality reduction
(correlation) - Variability measure
- The variance of the means / the mean of the
variances - Linear Discriminant Methods (make new features
and ignore some) - Fisher Linear Discriminant (map onto a line)
- Transform y wTx, w is the direction of the
line, y - Variability measure
- Multiple Discriminant Analysis
- Generalized Fisher Linear Discriminant method
- N classes
- Map data into a M-dimensional space (M lt N)