Statistical Models for Partial Membership - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Statistical Models for Partial Membership

Description:

Someone who is 50% Asian and 50% European partly belongs to 2 different groups ... Red: Senator Ensign. Partial membership values are very sensitive to exponent ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 30
Provided by: katherin145
Category:

less

Transcript and Presenter's Notes

Title: Statistical Models for Partial Membership


1
Statistical Models for Partial Membership
  • Katherine Heller
  • Gatsby Computational Neuroscience Unit, UCL
  • Sinead Williamson and Zoubin Ghahramani
  • University of Cambridge

2
Partial Membership
  • Example Person with mixed ethnic background.
  • Someone who is 50 Asian and 50 European partly
    belongs to 2 different groups (ethnicities).
  • This partial membership may be relevant for
    predicting this persons phenotype or food
    preferences.
  • Conceptually not the same as uncertain
    membership.
  • Being certain that someone is half Asian and half
    European is very different than being unsure of
    their ethnicity.
  • More evidence (like DNA tests) can help resolve
    uncertainty but will not change their ethnicity
    memberships.
  • Work on modeling partial membership by fuzzy
    logic community

3
Outline
  • Goal Describe a fully probabilistic approach to
  • data modeling with partial memberships.
  • Introduction
  • Bayesian Partial Membership Model (BPM)
  • BPM Learning
  • Experiments
  • Synthetic
  • Senate Roll Call data
  • Related Work
  • Conclusions
  • Nonparametric Extension?

4
Finite Mixture Models
Consider modeling a data set,
, using a finite mixture of K
components
Generative Process
1) Choose a cluster
2) Generate a data point from that cluster
where
and
denote memberships of data points to clusters!
5
Finite Mixture Models
Continuous Relaxation
where
and
denote memberships of data points to clusters!
denote partial memberships of data points to
clusters!
6
Why does this make sense?
(0,1)
(.5,.5)
(1,0)
Partial Membership
Mixture Model
  • If there is an Asian cluster and a European
    cluster, the partial membership model will better
    capture people with mixed ethnicity, whose
    features lie in between.

7
Exponential Family Distributions
Lets consider the case where
Sufficient Statistics
Natural Parameters
It follows that
Conjugate prior can be written as
8
Bayesian Partial Membership Model
Generative Process
Ethnicity Example
For each k
Defines a distribution over features for each of
k ethnic groups
Defines ethnic composition of the population
Controls how similar to the population an
individual is expected to be
For each n
Ethnic composition of individual n
Feature values of individual n
9
Bayesian Partial Membership Model
Generative Process
For each k
For each n
10
BPM Sampled Data
  • Each of the four plots shows 3000 data points
    drawn from the BPM with the same 3
    full-covariance Gaussian clusters.

11
BPM Theory
Lemma 1 In the limit as a?0 the exponential
family BPM model is a mixture of K components
with mixing proportions
Lemma 2 In the limit as a? the exponential
family BPM model has only one component with
natural parameters
12
BPM Learning
  • Want to infer all unknowns given X
  • We treat as fixed hyperparameters
  • Goal Infer using MCMC
  • All parameters in the BPM are continuous so we
    can use Hybrid Monte Carlo.
  • Hybrid Monte Carlo is an efficient MCMC method
    that uses gradient information to find high
    probability regions.

13
Synthetic Data
  • Generated synthetic binary data set of 50 data
    points, 32 dimensions, and 3 clusters. Ran HMC
    sampler for 4000 iterations. Computed

14
Senate Roll Call Data (2001-2002)
  • (99 senators 1 outcome) x 633 votes
  • K2 multivariate Bernoulli clusters
  • Model adapted to handle missing data

15
Senate Roll Call Comparisons
  • Fuzzy K-means

Blue Senator Schumer
Black Outcome
Red Senator Ensign
  • Partial membership values are very sensitive to
    exponent
  • For no value of do the membership values make
    sense

16
Senate Roll Call Comparisons
  • Dirichlet Process Mixtures
  • DPM confidently infers 4 clusters
  • Uncertainty is not a good substitute
  • for partial membership

Mean
Median
Min
Max
Outcome
BPM
DPM
Negative log predictive probability (in bits)
across senators
17
Image Data
  • 329 Tower and Sunset Images with 240 simple
    binary texture and color features and K2
    clusters.

18
Related Work
  • Latent Dirichlet Allocation (LDA)
  • Mixed Membership Models
  • Fuzzy Clustering
  • Exponential Family PCA

19
Future Work
  • Would be nice to have a nonparametric version.
  • Obvious thing to try Hierarchical Dirichlet
    Processes. But this would require summing over
    all infinitely many elements of , which isnt
    computationally feasible. Also semantically not
    very nice.
  • Indian Buffet Processes might work. Sample an IBP
    matrix with interpretation that a 1 means having
    some non-zero amount of membership in that
    cluster, then draw continuous exact amount
    separately.

20
Conclusions
  • Developed a fully probabilistic approach to data
    modeling with partial membership.
  • Uses continuous latent variables and can be seen
    as a relaxation of clustering with standard
    mixture models.
  • Used Hybrid Monte Carlo for inference which was
    extremely fast (finding sensible partial
    membership structure after very few samples).

21
Thank You
22
Partial Membership
  • Cornerstone of fuzzy set theory
  • Traditional set theory Items belong to a set or
    they dont 0,1.
  • Fuzzy set theory membership function
    where denotes the degree
    to which belongs to set
  • Fuzzy logic versus probabilistic models
  • Misguided arguments that fuzzy logic is different
    or supercedes probability theory.
  • While it might be easy to dismiss fuzzy logic,
    its framework for representing partial membership
    has inspired many researchers.
  • Google Scholar Over 45,000 fuzzy clustering
    papers. Most cited papers cited as frequently as
    most cited NIPS area papers.

23
Related Work - Latent Dirichlet Allocation (LDA)
and Mixed Membership Models
  • BPM generates data points at the document level
    of LDA (no word plate).
  • Whereas LDA (or Mixed Membership models) assume
    words (or attributes) are drawn using as
    mixing proportions in a mixture model, and are
    factorized, the BPM uses to form a convex
    combination of natural parameters. Attributes not
    drawn from mixture model and need not be
    factorized.
  • BPM - potentially faster MCMC sampling since BPM
    has all continuous parameters and LDA must infer
    a discrete topic assignment for each word.

24
Mixed Membership Model Generation
25
Related Work Fuzzy Clustering
  • Fuzzy k-means iteratively minimizes the following
    objective
  • where d is the distance between a data point and
    a cluster center, is the degree of membership
    of a data point in a cluster, and controls the
    amount of partial membership ( 1 is normal
    k-means)
  • None of these variables have probabilistic
    interpretations.

26
Related Work Exponential Family PCA
  • Originally formulated in terms of Bregman
    divergences, it can be seen as a non-Bayesian
    version of the BPM where the s are not
    constrained (to normalize to 1 or be positive).
  • Not a convex combination of natural parameters
    with the same sort of partial membership
    interpretation.
  • If we wanted we could relax these same
    constraints to get a Bayesian version of
    Exponential Family PCA , but wed have to tweak
    the model e.g. a Gaussian prior on .

27
BPM Learning
  • Hybrid Monte Carlo is an MCMC method that uses
    gradient information.
  • Hybrid Monte Carlo simulates dynamics of a system
    with continuous state variable on an energy
    function
  • provide forces on the state variables which
    encourage the system to find high probability
    regions, while maintaining detailed balance.

28
Bregman Divergence
  • F is a strictly convex function, p and q are
    points
  • Intuitively the difference between the value of F
    at p and the value of the first order Taylor
    expansion of F around q, evaluated at p.

29
LDA Review
  • 1. for z1K,
  • Draw
  • 2. For d1D,
  • a) Draw
  • b) for n1Nd
  • i. Draw
  • ii. Draw
Write a Comment
User Comments (0)
About PowerShow.com