Information Bottleneck EM - PowerPoint PPT Presentation

About This Presentation

Title:

Information Bottleneck EM

Description:

Information Bottleneck EM. School of Engineering & Computer Science ... Problem: No closed-form solution for ML estimation. Use Expectation Maximization (EM) ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 29

Provided by: tommyk1

Category:

more less

Transcript and Presenter's Notes

Title: Information Bottleneck EM

1
Information Bottleneck EM

Gal Elidan and Nir Friedman

School of Engineering Computer Science The
Hebrew University, Jerusalem, Israel
2
Learning with Hidden Variables
Input
Output A model P(X,T)
X1 XN
T
DATA
? ? ? ? ? ?

Problem No closed-form solution for ML
estimation
Use Expectation Maximization (EM)
Problem Stuck in inferior local Maxima
Random Restarts
Deterministic
Simulated annealing

EM information regularizationfor learning
parameters
3
Learning Parameters
X1 XN
Input
Output A model P?(X)
DATA
Empirical distribution Q(X)

Parametrization ? of P
P?(X1) Q(X1)P?(X2X1) Q(X2X1)
P?(X3X1) Q(X3X1)
4
Learning with Hidden Variables
X1 XN
T
Desired structure
Input
DATA
? ? ? ? ? ?
guess of ?
Empirical distribution Q(X,T) ?
Empirical distribution Q(X,T,Y)
Empirical distributionQ(X,T,Y)Q(X,T)Q(TY)
For each instance ID, complete value of T
EM Iterations
Parametrization? for P
5
EM Functional
The EM Algorithm E-Step Generate empirical
distribution M-Step Maximize using
? EM is equivalent to optimizing function of
Q,P? ? Each step increases value of functional
Neal and Hinton, 1998
6
Information Bottleneck EM

Target
In the rest of the talk
Understanding this objective
How to use it to learn better models

EM target
Information between hidden and ID
7
Information Regularization

Motivating idea
Fit training data Set T to be instance ID to
predict X
Generalization Forget ID and keep essence of X

Objective parameter free regularization of
Q
(lower bound of) Likelihood of P?
Compression of instance ID
vs.
Tishby et. al, 1999
8
Clustering example
EMTarget
Compressionmeasure
9
Clustering example
?1
EMTarget
Compressionmeasure
1
5
6
total preservation
11
4
7
3
?1
10
2
8
9
T ? ID
10
Clustering example
??
Compressionmeasure
EMTarget
Desired
??
T 2
11
Information Bottleneck EM
EM functional

Formal equivalence with Information Bottleneck
at ?1 EM and Information Bottleneck coincide
Generalizing result of Slonim and Weiss for
univariate case

12
Information Bottleneck EM
EM functional

Formal equivalence with Information Bottleneck

Maximum of Q(TY) is obtained when
Prediction ofT using P?
Marginal ofT in Q
Normalization
13
The IB-EM Algorithm for fixed ?

Iterate until convergance
E-Step Maximize LIB-EM by optimizing Q
M-Step Maximize LIB-EM by optimizing P?
(same as standard M-step)
Each step improves LIB-EM
Guaranteed to converge

14
Information Bottleneck EM

Target
In the rest of the talk
Understanding this objective
How to use it to learn better models

EM target
Information between hidden and ID
15
Continuation
easy
Follow ridge from optimum at ?0
LIB-EM
hard
0
?
Q
1
16
Continuation

Recall, if Q is a local maxima of LIB-EM then
We want to follow a path in (Q, ?) space so
that
for all t, and y

Local maxima for all ?
17
Continuation Step
start

Start at (Q,?) so that
Compute gradient
Take ? direction ?
Take a step in thedesired direction

18
Staying on the ridge
start

Potential problem
Direction is tangent to path miss optimum

SolutionUse EM steps to regain path
19
The IB-EM Algorithm

Set ?0 (start at easy solution)
Iterate until ?1 (EM solution is reached)
Iterate (stay on the ridge)
E-Step Maximize LIB-EM by optimizing Q
M-Step Maximize LIB-EM by optimizing P?
Step (follow the ridge)
Compute gradient and ? direction
Take the step by changing ? and Q

20
Calibrating the step size

Potential problem
Step size too small ? too slow
Step size too large ? overshoot target

21
Calibrating the step size
Recall that I(TY) measures compression of
ID When I(TY) rises more of data is captured
Use change in I(TY)
Naive
Interestingarea
Too sparse

Non-parametric involves only Q
Can be bounded I(TY) log2T

22
The IB-EM Algorithm

Set ?0
Iterate until ?1 (EM solution is reached)
Iterate (stay on the ridge)
E-Step Maximize LIB-EM by optimizing Q
M-Step Maximize LIB-EM by optimizing P?
Step (follow the ridge)
Compute gradient and ? direction
Calibrate step size using I(TY)
Take the step by changing ? and Q

23
The Stock Dataset

Naive Bayes model
Daily changes of20 NASDAQ stocks. 1213 train,
303 test
IB-EM outperforms best of EM solutions
I(TY) follows changes of likelihood
Continuation follows region of change
( marks evaluated ?)

-19
Best of EM
-21
Train likelihood
IB-EM
-23
?
0
0.2
0.4
0.6
0.8
1
Boyen et. al, 1999
24
Multiple Hidden Variables

We want to learn a model
with many hiddens ( )
Naive
Potentially exponential in of hiddens
Variational approximation
use factorized form (Mean Field)

P? ?
Q(TY) ?
LIB-EM ?(Variational EM) - (1- ?)Regularization
Friedman et. al, 2002
25
The USPS Digits dataset

400 samples 21 hiddens
Superior to all Mean Field EM runs
Time ? single exact EM run

single IB-EM 27 min
exact EM25 min/run
3/50 EM runs are ? IB-EM EM needs ? x17 time for
similar results
Offers good value for your time!
26
Yeast Stress Response