Spectral Clustering and Embedding with Hidden Markov Models - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Spectral Clustering and Embedding with Hidden Markov Models

Description:

Tony Jebara, Columbia University. Spectral Clustering and Embedding with Hidden Markov Models ... Next: sneak peek at some new applications... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 27
Provided by: jeb83
Category:

less

Transcript and Presenter's Notes

Title: Spectral Clustering and Embedding with Hidden Markov Models


1
Spectral Clustering and Embedding with Hidden
Markov Models
Tony Jebara, Yingbo Song, Kapil
Thadani Department of Computer Science,
Columbia University
2
Outline
  • Unsupervised learning parametric vs.
    nonparametric
  • Density estimation parametric vs nonparametric
  • Semi-parametric likelihood (NIPS07)
  • Clustering parametric nonparametric
  • Expectation maximization
  • Spectral clustering
  • Semi-parametric clustering (ECML07)
  • Probability product kernels (PPK)
  • Hidden Markov model kernel
  • Spectral clustering on PPK and Results
  • Multidimensional scaling on PPK and Results
  • Future/Upcoming Work

3
Unsupervised Learning
  • Parametric methods (sufficient stats, e-family)
  • do not grow with data
  • Density Estimation Maximum likelihood
  • Clustering Expectation maximization
  • Visualization hidden variables (GTM)
  • Models mixtures, Bayes nets, hidden Markov
    models
  • Nonparametric frequentist methods
  • grow with data
  • Density Estimation Parzen, l1 fitting, infinite
    mixture
  • Clustering spectral clustering
  • Visualization kNN, multidimensional scaling,
    LLE
  • Models kernels, distance metrics, graphs on data

4
Density Estimation
  • Density estimation most generally, given samples
    find
  • Nonparametric assumes independently distributed
    id
  • Parametric assumes independent identically
    distributed iid
  • Can we combine the two? Semiparametric density
    (NIPS)
  • kernel pulls models together

5
Density Estimation
  • Nonparametric estimate
  • Parametric estimate
  • Semi-parametric estimate
  • probability kernel pulls models together

6
Probability Product Kernel
  • Natural similarity measure between 2
    distributions
  • To compute the kernel for a pair of inputs
  • 1) Estimate Densities (maximum likelihood ML)
  • 2) Kernel
  • Probability Product Kernel uses either
  • Non-negative, latter is 1 if pp
  • Measures overlap of two distributions, pulls
    pairs together

c
c
7
Probability Product Kernel
  • For exponential family
  • The kernel is
  • For Gaussian case, get RBF

8
Probability Product Kernel
  • For hidden Markov models
  • The brute-force kernel is exponential work

9
Probability Product Kernel
  • Instead of brute force cross product, use
    forward-backward
  • Only compute sub-kernels y for common parents
  • Forms clique functions and sum via junction tree

10
Probability Product Kernel
  • PPK for 2 Gaussian HMMs with states S U
  • Get SxU interaction table between all pairs
    of emissions
  • Then simple pseudo-code

state prior
transition
11
Clustering
  • Parametric clustering (EM mixture model)
  • local minima
  • strict shape assumptions
  • Nonparametric clustering (spectral cut, maxcut)
  • global optimum
  • no parametric assumptions
  • instead kernel tweaking
  • Semiparametric clustering (probabilty kernel
    pulls models)
  • makes parametric (Markov)
  • assumption about each
  • datum but not about
  • overall cluster shapes

12
Parametric EM Clustering
  • Parametric clustering
  • E Given two models (one per class), get
    responsibility for xn
  • M Maximize expected complete likelihood
  • What if each x is a sequence? Cluster two HMM
    models.
  • Just extend EM to HMM mixture with hidden state
    trellis
  • (Alon Sclaroff)
  • E
  • M

13
Parametric EM Clustering
  • EM clustering works well if we have a true
    mixture
  • Problem
  • what if we dont have a mixture of 2 Gaussians
    or HMMs?
  • example sequences are from two slowly drifting
    HMMs

14
Nonparametric Spectral Clustering
  • Spectral clustering is agnostic about shape of
    clusters!
  • Popular one is stabilized clustering (Ng, Weiss,
    Jordan)
  • Get top eigenvectors of normalized Laplacian
    LD-1/2AD-1/2
  • Usually use RBF affinity
  • What if each datum is timeseries? Can use
    Yin-Yang kernel
  • But how to use parametric assumptions on each
    datum?
  • For example extend so each datum is a 2-state
    HMM?

15
Motion Capture
  • Rotating walk/run motion data
  • Each sequence is a 2-state HMM
  • But each cluster shape is circlular

16
Spectral Clustering with PPK
  • For each time series,
  • parametrically learn an HMM
  • 2) Compute kernel between all pairs of HMMs
  • 3) Nonparametric spectral clustering or embedding
    (MDS)

17
Spectral Clustering with PPK
  • Algorithm for spectral clustering HMM models

18
Clustering MOCAP
  • Starting with a single movie of walk and run
  • Generate several rotated versions of each
  • Two clusters of sequences walk and run
  • Used 2-state Gaussian HMMs in SC-PPK
  • Get 2 circlular clusters better than EM Time
    Series Kernel

19
Clustering MOCAP
  • Built dataset from sequences of motion
  • Two motion categories mixed with several
    sequences
  • of each (1 sequence 123 dimensional time
    series)
  • Used 2-state Gaussian emission HMMs
  • Spectral cluster to predict classes
  • Number in parentheses is the subject

20
Clustering Arabic Characters
  • Dataset is example sequences of two different
    characters
  • About 20-30 examples per class
  • Each sequence is a 2 dimensional time series
  • Used 2-state Gaussian emission HMMs

21
Clustering Sign Language
  • Sign language dataset, each sign is a time series
  • Have two categories of expressions
  • Used multi-state HMMs with Gaussian emissions

22
Clustering Network Traces
  • Clustering network hosts in Columbia CS
    department
  • Features of packets per port per hour over 24
    hours
  • Fit an HMM to each host and cluster them
  • Example cluster (hosts in cluster their packet
    volume)
  • All are web servers, NFS or database servers.

( 1) 128.59.20.66 zinc.cs.columbia.edu.
num packets 75707059 ( 2) 128.59.20.227
planetlab2.cs.columbia.edu. num packets
43710510 ( 3) 128.59.21.157
bagpipe.cs.columbia.edu. num packets
42139618 ( 4) 128.59.16.20 cs.columbia.edu.
num packets 39047751 ( 5) 128.59.16.108
hellfire.cs.columbia.edu. num packets
39019003 ( 6) 128.59.23.17
manycore.cs.columbia.edu. num packets
38135241 ( 7) 128.59.22.220
nemo.cs.columbia.edu. num packets 26873532
( 8) 128.59.18.100 ober.cs.columbia.edu.
num packets 25070903 ( 9) 128.59.22.184
db-pc03.cs.columbia.edu. num packets
24431779 ( 10) 128.59.16.101
ground.cs.columbia.edu. num packets 23581185
( 11) 128.59.16.145 flame.cs.columbia.edu.
num packets 19861350 ( 12) 128.59.21.33
bosch.cs.columbia.edu. num packets 17715535
23
Clustering MOCAP ? Runtime
  • SC-PPK faster, uses EM for single HMMs
    independently
  • Then, spectral clustering algorithm is O(N3)
  • Clustering EM and k-Means must iterate HMM
    training

24
Embedding MOCAP
  • MDS Embedding of rotated walking and running
  • Yin-Yang kernel
  • b) Probability product kernel

25
Emedding Arabic ASL
  • MDS Embedding using PPK of
  • Arabic Dataset each character is a time series
    datum
  • of spatial coordinates.
  • b) Sign Language Dataset each datum is a time
  • series of hand movement coordinates

26
Conclusions
  • Semiparametric methods explore waters between
  • complementary parametric nonparametric
    approaches.
  • Semiparametric clustering avoids shape
    assumptions
  • on clusters but keeps assumptions on each datum
  • Novel semiparametric likelihood interpolates
    between
  • i.d. nonparametric ? i.i.d. parametric
  • encourages model agreement by probability
    product kernel
  • Also gives rise to a new clustering criterion
  • 1) fit each datum with parametric maximum
    likelihood
  • 2) compute kernels between models
  • 3) solve for spectral clustering or embedding
  • Next semiparametric density estimation aspect
    (NIPS)
  • iteratively maximize likelihood to avoid
    overfitting HMMs
  • Next sneak peek at some new applications
Write a Comment
User Comments (0)
About PowerShow.com