Title: F2E5414 Pattern Classification
1F2E5414 Pattern Classification
- Jonas SamuelssonSound and Image Processing
LabDept. Signals, Sensors, and Systems, KTH
(Royal Institute of Technology), Stockholm
2Jonas Samuelsson
- Swedish, born in Vallentuna (north of Stockholm),
grew up in Karlskoga (250 km west of Sth). - Undergrad studies at Chalmers, Gothenburg in
electrical engineering. - Grad studies also at Chalmers in information
theory. - Signal compression and quantization theory
- Vector quantization
- High rate theory
- Post-doc at KTH from 2002, forskarassistent since
2004 - Speech enhancement
- Model based enhancement with application to
non-stationary noisy environments - Source and channel coding
- Gaussian mixture model based source coding
3Todays agenda
- Course overview (Jonas)
- Presentation of problems 1-3 (Students)
- Discussions on the text of chapter 1 and section
2.11 (All) - Solving problem 2.50 (Students)
- Overview of material of next meeting (chapter 2)
4Course Topics
- Bayesian decision theory
- ML and MAP parameter estimation. Bayesian pdf
estimation. - Non-parametric techniques (for pdf estimation).
- Linear discriminant functions
- Multilayer neural networks
- Non-metric methods
- Algorithm independent machine learning.
- Unsupervised learning and clustering.
- Feature selection and extraction.
5Session 1 Introduction
- Sea-bass and salmon classification
- Introduction to Bayesian decision theory
- Bayesian belief networks
?
6Session 2 Bayesian decision theory
- ML and Bayesian (MAP) decision theory.
- Class conditionial densities, priors, posterior
pdfs, Bayes rule. - How minimize Bayes risk?
- Cost functions
- Zero-one loss, minimax, Neyman-Pearson
- The Gaussian case
- Error probabilities
- Chernoff bound -gt Battacharayya bound
- Missing/noisy features
- Classification of sequences
7Session 3 Parametric density estimation
- Maximum likelihood parameter estimation
- Maximum aposteriori parameter estimation
- Bayesian pdf estimation
- Gaussian case
- The expectation-maximization (EM) algorithm
- Gaussian mixture models (GMMs)
- EM algorithm for GMM optimization
- K-means algorithm
8Session 4 Non-parametric estimation
- The histogram and Parzen windows
- Estimation of class conditional pdfs p(xclass)
- kn-nearest neighbor estimation
- Estimation of the posterior probabilities
p(classx) - Nearest neighbor classification
- Direct design of the decision function
- Error rate and bounds
- k-nearest neighbor classification
- Different metrics
- Fuzzy classification, reduced coulomb energy
networks, approximations by series expansions
9Session 5 Linear discriminant functions
Classify to class i if
- Decision surfaces
- Polytopes in the multi-category case
- The perceptron criterion function
- For linearly separable data sets
- Least squares procedures
- Based on the pseudo inverse
- The LMS-algorithm
- The Ho-Kashyap procedure
- Fischers linear discriminant (from Ch. 3)
- Support vector machines (possibly from extra
material) - Generalizations to multi-category case
10Session 6 Multilayer neural networks, part I
3-d feature vectors, two-category case, neural
network with 5 hidden units
- Classification based on neural networks (NNs)
- The expressive power of NNs
- The backpropagation algorithm
- Optimizes the weights of the network
- Relation to Bayes discriminants
- NNs trained by backprop implements the optimal
Bayesian classifier
11Session 7 Multilayer neural networks, II
- Practical (ad-hoc) techniques in backpropagation
- Activation function (the sigmoid)
- Target values (not the posterior probability)
- Number of hidden units? Number of hidden layers?
- Initialization
- Other types of networks
- Radial basis function networks
- Recurrent networks
- Overfitting
- Pruning techniques (optimal brain damage, optimal
brain surgeon)
12Session 8 Non-metric methods
- Discrete features
- List of attributes (describe a fruit by, e.g.,
color, texture, taste, size) - Decision trees
- CART, ID3, C4.5
- Rule based recognition
- (Text) string recognition
13Session 9 Algorithm independent machine learning
- The No free lunch theorem
- No classifier can be said to be better than
others before all have been tested for a
particular classification problem. - Bias and variance for classification
- Resampling of training data for pdf parameter
estimation - Jackknife, bootstrap
- Resampling for classifier design
- Bagging, boosting
- Evaluation and comparison of classifiers
- Combining classifiers
14Session 10 Unsupervised learning and clustering
- Different scenarios
- The classes are unknown (training data is not
labeled) - The number of classes is unknown,
- (Gaussian) mixture densities (revisited)
- k-means clustering
- Unsupervised Bayesian learning
- Data description and clustering
- Similarity measures of patterns, criterion
functions for clustering - Online clustering
- (Non linear) Principal component analysis (PCA)
- Low dimensional representations, multidimensional
scaling - Kohonen self-organizing feature maps
15Requirements to pass
- Active participation during 7 out of 10 course
meetings doing a take-home exam (or possibly an
oral exam). - Solving at least 80 of the problems indicated in
the schedule, and at least 50 of the problems
for each meeting. Also all mandatory problems
should be solved (typically marked by ). The
solutions should be handed in at the course
meeting, or posted or faxed (solutions must be
readable though) the same day by remote students. - Demonstration of solutions during course
meetings, see above. If a student cannot
participate at a meeting, their solutions will be
subject to a very thorough examination.
If the requirements are fullfilled the student
has passed the first part of the course and is
recommended to be rewarded with 5 credits.