Part 2: Unsupervised Learning - PowerPoint PPT Presentation

1 / 72

About This Presentation

Title:

Part 2: Unsupervised Learning

Description:

Machine Learning Techniques for Computer Vision (ECCV 2004) Christopher M. Bishop ... Automatic relevance determination (ARD) ML PCA. Bayesian PCA ... – PowerPoint PPT presentation

Number of Views:233

Avg rating:3.0/5.0

Slides: 73

Provided by: cmbi5

Category:

more less

Transcript and Presenter's Notes

Title: Part 2: Unsupervised Learning

1
Machine Learning Techniques for Computer Vision

Part 2 Unsupervised Learning

Christopher M. Bishop
Microsoft Research Cambridge
ECCV 2004, Prague
2
Overview of Part 2

Mixture models
EM
Variational Inference
Bayesian model complexity
Continuous latent variables

3
The Gaussian Distribution

Multivariate Gaussian
Maximum likelihood

mean
4
Gaussian Mixtures

Linear super-position of Gaussians
Normalization and positivity require

5
Example Mixture of 3 Gaussians
6
Maximum Likelihood for the GMM

Log likelihood function
Sum over components appears inside the log
no closed form ML solution

7
EM Algorithm Informal Derivation
8
EM Algorithm Informal Derivation

M step equations

9
EM Algorithm Informal Derivation

E step equation

10
EM Algorithm Informal Derivation

Can interpret the mixing coefficients as prior
probabilities
Corresponding posterior probabilities
(responsibilities)

11
Old Faithful Data Set
Time betweeneruptions (minutes)
Duration of eruption (minutes)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
Latent Variable View of EM

To sample from a Gaussian mixture
first pick one of the components with probability
then draw a sample from that component
repeat these two steps for each new data point

19
Latent Variable View of EM

Goal given a data set, find
Suppose we knew the colours
maximum likelihood would involve fitting each
component to the corresponding cluster
Problem the colours are latent (hidden) variables

20
Incomplete and Complete Data
incomplete
complete
21
Latent Variable Viewpoint
22
Latent Variable Viewpoint

Binary latent variables
describing which component generated each data
point
Conditional distribution of observed variable
Prior distribution of latent variables
Marginalizing over the latent variables we obtain

23
Graphical Representation of GMM
24
Latent Variable View of EM

Suppose we knew the values for the latent
variables
maximize the complete-data log likelihood
trivial closed-form solution fit each component
to the corresponding set of data points
We dont know the values of the latent variables
however, for given parameter values we can
compute the expected values of the latent
variables

25
Posterior Probabilities (colour coded)
26
Over-fitting in Gaussian Mixture Models

Infinities in likelihood function when a
component collapses onto a data point
with
Also, maximum likelihood cannot determine the
number K of components

27
Cross Validation

Can select model complexity using an independent
validation data set
If data is scarce use cross-validation
partition data into S subsets
train on S-1 subsets
test on remainder
repeat and average
Disadvantages
computationally expensive
can only determine one or two complexity
parameters

28
Bayesian Mixture of Gaussians

Parameters and latent variables appear on equal
footing
Conjugate priors

29
Data Set Size

Problem 1 learn the functionfor
from 100 (slightly) noisy examples
data set is computationally small but
statistically large
Problem 2 learn to recognize 1,000 everyday
objects from 5,000,000 natural images
data set is computationally large but
statistically small
Bayesian inference
computationally more demanding than ML or
MAP(but see discussion of Gaussian mixtures
later)
significant benefit for statistically small data
sets

30
Variational Inference