Title: Pattern Recognition and Machine Learning
1Pattern Recognition and Machine Learning
Chapter 1 Introduction
2Example
Handwritten Digit Recognition
3Polynomial Curve Fitting
4Sum-of-Squares Error Function
50th Order Polynomial
61st Order Polynomial
73rd Order Polynomial
89th Order Polynomial
9Over-fitting
Root-Mean-Square (RMS) Error
10Polynomial Coefficients
11Data Set Size
9th Order Polynomial
12Data Set Size
9th Order Polynomial
13Regularization
- Penalize large coefficient values
14Regularization
15Regularization
16Regularization vs.
17Polynomial Coefficients
18Probability Theory
Apples and Oranges
19Probability Theory
- Marginal Probability
- Conditional Probability
Joint Probability
20Probability Theory
Product Rule
21The Rules of Probability
22Bayes Theorem
posterior ? likelihood prior
23Probability Densities
24Transformed Densities
25Expectations
Conditional Expectation (discrete)
Approximate Expectation (discrete and continuous)
26Variances and Covariances
27The Gaussian Distribution
28Gaussian Mean and Variance
29The Multivariate Gaussian
30Gaussian Parameter Estimation
Likelihood function
31Maximum (Log) Likelihood
32Properties of and
33Curve Fitting Re-visited
34Maximum Likelihood
Determine by minimizing sum-of-squares
error, .
35Predictive Distribution
36MAP A Step towards Bayes
Determine by minimizing regularized
sum-of-squares error, .
37Bayesian Curve Fitting
38Bayesian Predictive Distribution
39Model Selection
40Curse of Dimensionality
41Curse of Dimensionality
Polynomial curve fitting, M 3 Gaussian
Densities in higher dimensions
42Decision Theory
- Inference step
- Determine either or .
- Decision step
- For given x, determine optimal t.
43Minimum Misclassification Rate
44Minimum Expected Loss
- Example classify medical images as cancer or
normal
45Minimum Expected Loss
46Reject Option
47Why Separate Inference and Decision?
- Minimizing risk (loss matrix may change over
time) - Reject option
- Unbalanced class priors
- Combining models
48Decision Theory for Regression
- Inference step
- Determine .
- Decision step
- For given x, make optimal prediction, y(x), for
t. - Loss function
49The Squared Loss Function
50Generative vs Discriminative
- Generative approach
- Model
- Use Bayes theorem
- Discriminative approach
- Model directly
51Entropy
- Important quantity in
- coding theory
- statistical physics
- machine learning
52Entropy
- Coding theory x discrete with 8 possible states
how many bits to transmit the state of x? - All states equally likely
53Entropy
54Entropy
- In how many ways can N identical objects be
allocated M bins? - Entropy maximized when
55Entropy
56Differential Entropy
- Put bins of width along the real line
- Differential entropy maximized (for fixed )
when - in which case
57Conditional Entropy
58The Kullback-Leibler Divergence
59Mutual Information