David Newman, UC Irvine Lecture 10: Mixture Models 1 - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

David Newman, UC Irvine Lecture 10: Mixture Models 1

Description:

Homework 2 due Tuesday Nov 6 in class. Any questions? Do ... K-Means: Converged solution. Finite Mixture Models. Finite Mixture Models. Finite Mixture Models ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 40
Provided by: Informatio367
Category:

less

Transcript and Presenter's Notes

Title: David Newman, UC Irvine Lecture 10: Mixture Models 1


1
CS 277 Data MiningLectures 10 Mixture Models
  • David Newman
  • Department of Computer Science
  • University of California, Irvine

2
Notices
  • Homework 2 due Tuesday Nov 6 in class
  • Any questions?
  • Do you need some hints?
  • Are you learning anything?

3
Clustering
4
Different Types of Clustering Algorithms
  • Partition-based clustering
  • K-means
  • Probabilistic model-based clustering
  • mixture models
  • above work with measurement data, e.g., feature
    vectors
  • Hierarchical clustering
  • hierarchical agglomerative clustering
  • Graph-based clustering
  • min-cut algorithms
  • above work with distance data, e.g., distance
    matrix

5
K-Means After 1 iteration
6
K-Means Converged solution
7
Finite Mixture Models
8
Finite Mixture Models
9
Finite Mixture Models
10
Finite Mixture Models
11
Finite Mixture Models
Weightk
ComponentModelk
Parametersk
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Interpretation of Mixtures
  • 1. C has a direct (physical) interpretation
  • e.g., C age of fish, C male, female,
  • C Australian, American

16
Interpretation of Mixtures
  • 1. C has a direct (physical) interpretation
  • e.g., C age of fish, C male, female
  • 2. C is a convenient hidden variable (i.e., the
    cluster variable)
  • - focuses attention on subsets of the
    data
  • e.g., for visualization,
    clustering, etc
  • - C might have a physical/real interpretation
  • but not necessarily so

17
Probabilistic Clustering Mixture Models
  • assume a probabilistic model for each component
    cluster
  • mixture model f(x) ?k1K wk fk(x?k)
  • wk are K mixing weights
  • 0 ? wk ? 1 and ?k1K wk 1
  • where K component densities fk(x?k) can be
  • Gaussian
  • Poisson
  • exponential
  • ...
  • Note
  • Assumes a model for the data (advantages and
    disadvantages)
  • Results in probabilistic membership p(cluster k
    x), also called responsibilities

18
Gaussian Mixture Models (GMM)
  • model for k-th component is normal N(?k,?k)
  • often assume diagonal covariance ?jj ?j2 ,
    ?i?j 0
  • or sometimes even simpler ?jj ?2 ,
    ?i?j 0
  • f(x) ?k1K wk fk(x?k) with ?k lt?k , ?kgt or
    lt?k ,?kgt
  • generative model
  • randomly choose a component
  • selected with probability wk
  • generate x N(?k,?k)
  • note ?k ?k both d-dim vectors

19
Learning Mixture Models from Data
  • Score function log-likelihood L(?)
  • L(?) log p(X?) log ?H p(X,H?)
  • H hidden variables (cluster memberships of each
    x)
  • L(?) cannot be optimized directly
  • EM Procedure
  • General technique for maximizing log-likelihood
    with missing data
  • For mixtures
  • E-step compute memberships p(k x) wk
    fk(x?k) / f(x)
  • M-step pick a new ? to maximize expected data
    log-likelihood
  • Iterate guaranteed to climb to (local) maximum
    of L(?)

20
The E (Expectation) Step Responsibilities
Current K clusters and parameters
n data points
E step Compute p(data point i is in group k)
21
The M (Maximization) Step Re-estimate params
New parameters for the K clusters
n data points
M step Compute q, given n data points and
memberships
22
Complexity of EM for mixtures
K models
n data points
Complexity per iteration scales as O( n K f(p) )
23
Comments on Mixtures and EM Learning
  • Complexity of each EM iteration
  • Depends on the probabilistic model being used
  • e.g., for Gaussians, Estep is O(nK), Mstep is
    O(nKp2)
  • Sometimes E or M-step is not closed form
  • gt can require numerical optimization or sampling
    within each iteration
  • Generalized EM (GEM) instead of maximizing
    likelihood, just increase likelihood
  • EM can be thought of as hill-climbing with
    direction and step-size provided automatically
  • K-means as a special case of EM
  • Gaussian mixtures with isotropic (diagonal,
    equi-variance) ?k s
  • Approximate the E-step by choosing most likely
    cluster (instead of using membership
    probabilities)
  • Generalizations
  • Mixtures of multinomials for text data
  • Mixtures of Markov chains for Web sequences
  • more
  • Will be discussed later in lectures on text and
    Web data

24
EM for Gaussian Mixtures
  • Gaussian Mixture
  • Log Likelihood

25
EM for Gaussian Mixtures
  • E-Step Responsibilities
  • M-Step Re-estimate parameters
  • Evaluate log likelihood

26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
Selecting K in mixture models
  • cannot just choose K that maximizes likelihood
  • Likelihood L(?) is always larger for larger K
  • Model selection alternatives
  • 1) penalize complexity
  • e.g., BIC L(?) d/2 log n , d parameters
    (Bayesian information criterion)
  • Asymptotically correct under certain assumptions
  • Often used in practice for mixture models even
    though assumptions for theory are not met
  • 2) Bayesian compute posteriors p(k data)
  • P(kdata) requires computation of p(datak)
    marginal likelihood
  • Can be tricky to compute for mixture models
  • Recent work on Dirichlet process priors has made
    this more practical
  • 3) (cross) validation
  • Score different models by log p(Xtest ?)
  • split data into train and validate sets
  • Works well on large data sets
  • Can be noisy on small data (logL is sensitive to
    outliers)

36
Example of BIC Score for Red-Blood Cell Data
37
Example of BIC Score for Red-Blood Cell Data
True number of classes (2) selected by BIC
38
Relationship between K-Means and Mix-of-Gaussians
  • Welling (Clustering notes)
  • parameter a
  • p(x,i)?p(x,i)a
  • a0, get Mixture of Gaussians
  • a8, get K-Means
  • Bishop
  • S eI (diagonal/isotropic covariance matrix)
  • e ? 0, get K-Means

39
Next
  • Use EM to learn model for documents
Write a Comment
User Comments (0)
About PowerShow.com