Statistical Models for Partial Membership - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Statistical Models for Partial Membership

Description:

Someone who is 50% Asian and 50% European partly belongs to 2 different groups ... Red: Senator Ensign. Partial membership values are very sensitive to exponent ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 30

Provided by: katherin145

Category:

more less

Transcript and Presenter's Notes

Title: Statistical Models for Partial Membership

1
Statistical Models for Partial Membership

Katherine Heller
Gatsby Computational Neuroscience Unit, UCL
Sinead Williamson and Zoubin Ghahramani
University of Cambridge

2
Partial Membership

Example Person with mixed ethnic background.
Someone who is 50 Asian and 50 European partly
belongs to 2 different groups (ethnicities).
This partial membership may be relevant for
predicting this persons phenotype or food
preferences.

Conceptually not the same as uncertain
membership.
Being certain that someone is half Asian and half
European is very different than being unsure of
their ethnicity.
More evidence (like DNA tests) can help resolve
uncertainty but will not change their ethnicity
memberships.
Work on modeling partial membership by fuzzy
logic community

3
Outline

Goal Describe a fully probabilistic approach to
data modeling with partial memberships.
Introduction
Bayesian Partial Membership Model (BPM)
BPM Learning
Experiments
Synthetic
Senate Roll Call data
Related Work
Conclusions
Nonparametric Extension?

4
Finite Mixture Models
Consider modeling a data set,
, using a finite mixture of K
components
Generative Process
1) Choose a cluster
2) Generate a data point from that cluster
where
and
denote memberships of data points to clusters!
5
Finite Mixture Models
Continuous Relaxation
where
and
denote memberships of data points to clusters!
denote partial memberships of data points to
clusters!
6
Why does this make sense?
(0,1)
(.5,.5)
(1,0)
Partial Membership
Mixture Model

If there is an Asian cluster and a European
cluster, the partial membership model will better
capture people with mixed ethnicity, whose
features lie in between.

7
Exponential Family Distributions
Lets consider the case where
Sufficient Statistics
Natural Parameters
It follows that
Conjugate prior can be written as
8
Bayesian Partial Membership Model
Generative Process
Ethnicity Example
For each k
Defines a distribution over features for each of
k ethnic groups
Defines ethnic composition of the population
Controls how similar to the population an
individual is expected to be
For each n
Ethnic composition of individual n
Feature values of individual n
9
Bayesian Partial Membership Model
Generative Process
For each k
For each n
10
BPM Sampled Data

Each of the four plots shows 3000 data points
drawn from the BPM with the same 3
full-covariance Gaussian clusters.

11
BPM Theory
Lemma 1 In the limit as a?0 the exponential
family BPM model is a mixture of K components
with mixing proportions
Lemma 2 In the limit as a? the exponential
family BPM model has only one component with
natural parameters
12
BPM Learning

Want to infer all unknowns given X
We treat as fixed hyperparameters
Goal Infer using MCMC
All parameters in the BPM are continuous so we
can use Hybrid Monte Carlo.
Hybrid Monte Carlo is an efficient MCMC method
that uses gradient information to find high
probability regions.

13
Synthetic Data

Generated synthetic binary data set of 50 data
points, 32 dimensions, and 3 clusters. Ran HMC
sampler for 4000 iterations. Computed

14
Senate Roll Call Data (2001-2002)

(99 senators 1 outcome) x 633 votes
K2 multivariate Bernoulli clusters
Model adapted to handle missing data

15
Senate Roll Call Comparisons

Fuzzy K-means

Blue Senator Schumer
Black Outcome
Red Senator Ensign

Partial membership values are very sensitive to
exponent
For no value of do the membership values make
sense

16
Senate Roll Call Comparisons

Dirichlet Process Mixtures
DPM confidently infers 4 clusters
Uncertainty is not a good substitute
for partial membership

Mean
Median
Min
Max
Outcome
BPM
DPM
Negative log predictive probability (in bits)
across senators
17
Image Data

329 Tower and Sunset Images with 240 simple
binary texture and color features and K2
clusters.

18
Related Work

Latent Dirichlet Allocation (LDA)
Mixed Membership Models
Fuzzy Clustering
Exponential Family PCA

19
Future Work

Would be nice to have a nonparametric version.
Obvious thing to try Hierarchical Dirichlet
Processes. But this would require summing over
all infinitely many elements of , which isnt
computationally feasible. Also semantically not
very nice.
Indian Buffet Processes might work. Sample an IBP
matrix with interpretation that a 1 means having
some non-zero amount of membership in that
cluster, then draw continuous exact amount
separately.

20
Conclusions

Developed a fully probabilistic approach to data
modeling with partial membership.
Uses continuous latent variables and can be seen
as a relaxation of clustering with standard
mixture models.
Used Hybrid Monte Carlo for inference which was
extremely fast (finding sensible partial
membership structure after very few samples).

21
Thank You
22
Partial Membership

Cornerstone of fuzzy set theory
Traditional set theory Items belong to a set or
they dont 0,1.
Fuzzy set theory membership function
where denotes the degree
to which belongs to set
Fuzzy logic versus probabilistic models
Misguided arguments that fuzzy logic is different
or supercedes probability theory.
While it might be easy to dismiss fuzzy logic,
its framework for representing partial membership
has inspired many researchers.
Google Scholar Over 45,000 fuzzy clustering
papers. Most cited papers cited as frequently as
most cited NIPS area papers.

23
Related Work - Latent Dirichlet Allocation (LDA)
and Mixed Membership Models

BPM generates data points at the document level
of LDA (no word plate).
Whereas LDA (or Mixed Membership models) assume
words (or attributes) are drawn using as
mixing proportions in a mixture model, and are
factorized, the BPM uses to form a convex
combination of natural parameters. Attributes not
drawn from mixture model and need not be
factorized.
BPM - potentially faster MCMC sampling since BPM
has all continuous parameters and LDA must infer
a discrete topic assignment for each word.

24
Mixed Membership Model Generation
25
Related Work Fuzzy Clustering

Fuzzy k-means iteratively minimizes the following
objective
where d is the distance between a data point and
a cluster center, is the degree of membership
of a data point in a cluster, and controls the
amount of partial membership ( 1 is normal
k-means)
None of these variables have probabilistic
interpretations.

26
Related Work Exponential Family PCA

Originally formulated in terms of Bregman
divergences, it can be seen as a non-Bayesian
version of the BPM where the s are not
constrained (to normalize to 1 or be positive).
Not a convex combination of natural parameters
with the same sort of partial membership
interpretation.
If we wanted we could relax these same
constraints to get a Bayesian version of
Exponential Family PCA , but wed have to tweak
the model e.g. a Gaussian prior on .

27
BPM Learning

Hybrid Monte Carlo is an MCMC method that uses
gradient information.
Hybrid Monte Carlo simulates dynamics of a system
with continuous state variable on an energy
function
provide forces on the state variables which
encourage the system to find high probability
regions, while maintaining detailed balance.

28
Bregman Divergence

F is a strictly convex function, p and q are
points
Intuitively the difference between the value of F
at p and the value of the first order Taylor
expansion of F around q, evaluated at p.

29
LDA Review