Hierarchical%20Mixture%20of%20Experts

About This Presentation

Title:

Hierarchical%20Mixture%20of%20Experts

Description:

Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005 Outline Background Hierarchical tree structure Gating ... – PowerPoint PPT presentation

Number of Views:203

Avg rating:3.0/5.0

Slides: 24

Provided by: 1149143

Learn more at: https://people.ee.duke.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hierarchical%20Mixture%20of%20Experts

1
Hierarchical Mixture of Experts

Presented by Qi An
Machine learning reading group
Duke University
07/15/2005

2
Outline

Background
Hierarchical tree structure
Gating networks
Expert networks
E-M algorithm
Experimental results
Conclusions

3
Background

The idea of mixture of experts
First presented by Jacobs and Hintons in 1988
Hierarchical mixture of experts
Proposed by Jordan and Jacobs in 1994
Difference from previous mixture model
Mixing weights depends on both the input and the
output

4
Example (ME)
5
One-layer structure
µ
Ellipsoidal Gating function
g1
g2
g3
Gating Network
x
µ1
µ2
µ3
Expert Network
Expert Network
Expert Network
x
x
x
6
Example (HME)
7
Hierarchical tree structure
Linear Gating function
8

Expert network
At the leaves of trees
for each expert

linear predictor
output of the expert
link function For example logistic function for
binary classification
9

Gating network
At the nonterminal of the tree
top layer other layer

Output
At the non-leaves nodes
top node other nodes

11
Probability model

For each expert, assume the true output y is
chosen from a distribution P with mean µij
Therefore, the total probability of generating y
from x is given by

12
Posterior probabilities

Since the gij and gi are computed based only on
the input x, we refer them as prior
probabilities.
We can define the posterior probabilities with
the knowledge of both the input x and the output
y using Bayes rule

13
E-M algorithm

Introduce auxiliary variables zij which have an
interpretation as the labels that corresponds to
the experts.
The probability model can be simplified with the
knowledge of auxiliary variables

14
E-M algorithm

Complete-data likelihood
The E-step

15
E-M algorithm

The M-step

16
IRLS

Iteratively reweighted least squares alg.
An iterative algorithm for computing the maximum
likelihood estimates of the parameters of a
generalized linear model
A special case for Fisher scoring method

17
Algorithm
E-step
M-step
18
Online algorithm