Convex Point Estimation using Undirected Bayesian Transfer Hierarchies

About This Presentation

Title:

Convex Point Estimation using Undirected Bayesian Transfer Hierarchies

Description:

Title: PowerPoint Presentation Last modified by: CHEN Created Date: 1/1/1601 12:00:00 AM Document presentation format: Other titles – PowerPoint PPT presentation

Number of Views:92

Avg rating:3.0/5.0

Slides: 18

Provided by: dukeEdu7

Learn more at: http://people.ee.duke.edu

Category:

more less

Transcript and Presenter's Notes

Title: Convex Point Estimation using Undirected Bayesian Transfer Hierarchies

1
Convex Point Estimation using Undirected Bayesian
Transfer Hierarchies

Gal Elidan, Ben Packer, Geremy Heitz, Daphne
Koller
Computer Science Dept.
Stanford University
UAI 2008

Presented by Haojun Chen August 1st, 2008
2
Outline

Background and motivation
Undirected transfer hierarchies
Experiments
Degree of transfer coefficients
Experiments
Summary

3
Background (1/2)

Transfer learning
Data from similar tasks/distributions are
used to compensate for the sparsity of training
data in primary class or task

Example Use rhinos to help learn elephants
shape
Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
4
Background (2/2)

Hierarchical Bayes (HB) framework
Principled approach for transfer learning

Joint distribution over the observed data and all
class parameters as follows
where
Example of a hierarchical Bayes parameterization
Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
5
Motivation

In practice, point estimation of the MAP is
desirable, for full Bayesian computations can be
difficult and computationally demanding
Efficient point estimation may not be achieved in
many standard hierarchical Bayes models, because
many common conjugate priors such as the
Dirichlet or normal-inverse-Wishart are not
convex with respect to the parameters
In this paper, an undirected hierarchical
Bayes(HB) reformulation is proposed to allow
efficient point estimation

6
Undirected HB Reformulation
Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
7
Purpose of Reformulation

Easy to specify
Fdata can be likelihood, classification, or other
objective
Divergence can be L1-norm, L2-norm, e-insensitive
loss, KL divergence, etc.
No conjugacy or proper prior restrictions
Easy to optimize
Convex over Q if Fdata is concave and Divergence
is convex

Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
8
Experiment Text categorization
Newsgroup20 Dataset

Bag-of-words model
Fdata Multinomial log likelihood (regularized)
frequency of word i
Divergence L2 norm

Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
9
Text categorization Result
Baseline Maximum likelihood at each node (no
hierarchy) Cross-validate
regularization (no hierarchy)
Shrinkage (McCallum et al. 98, with hierarchy)
Newsgroup Topic Classification
0.7

0.65
0.6
0.55
Classification Rate
0.5
0.45
Max Likelihood (No regularization)
Shrinkage
Regularized Max Likelihood
0.4
Undirected HB
0.35
75
150
225
300
375

Total Number of Training Instances
Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
10
Experiment Shape Modeling
Mammals Dataset (Fink, 05)

(Density estimation test likelihood)
Instances represented by 60 x-y
coordinates of landmarks on outline
Divergence
L2 norm over mean and variance

Mean landmark location
Covariance over landmarks
Regularization
Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
11
Undirect HB Shape Modeling Result
Mammal Pairs
50

Regularized Max Likelihood
0
-50
Elephant-Rhino
-100
Delta log-loss / instance
-150
Bison-Rhino
Elephant-Bison
-200
Elephant-Rhino
Giraffe-Bison
Giraffe-Elephant
-250
Giraffe-Rhino
Llama-Bison
Llama-Elephant
-300
Llama-Giraffe
Llama-Rhino
-350
6
10
20
30

Total Number of Training Instances
Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
12
Problem in Transfer
Not all parameters deserve equal sharing
Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
13
Degrees of Transfer (DOT)
is split into subcomponents with weights
, and hence different strengths are allowed
for different subcomponents, child-parent pairs
? 0 forces parameters to
agree ?8 allows parameters to
be flexible
Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
14
Estimation of DOT Parameters

Hyper-prior approach
Bayesian idea Put prior on and add as
parameter to optimization along with
Concretely inverse-Gamma prior (forced to be
positive)

Prior on Degree of Transfer
Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
15
DOT Shape Modeling Result
Mammal Pairs
15

Hyperprior
10
Elephant-Rhino
5
Delta log-loss / instance
0
Regularized Max Likelihood
Bison-Rhino
-5
Elephant-Bison
Elephant-Rhino
Giraffe-Bison
Giraffe-Elephant
Giraffe-Rhino
-10
Llama-Bison
Llama-Elephant
Llama-Giraffe
Llama-Rhino
-15
6
10
20
30

Total Number of Training Instances
Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
16
Distribution of DOT coefficients
Distribution of DOT coefficients using Hyperprior
approach
20
18
qroot
16
14
12
10
8
6
4
2
0
0
5
10
15
20
25
30
35
40
45
50
1/l
Stronger transfer
Weaker transfer
Resources http//velblod.videolectures.net/2008/p
ascal2/uai08_helsinki/packer_cpe/uai08_packer_cpe_
01.ppt
17
Summary

Undirected reformulation of the hierarchical
Bayes framework is proposed for efficient convex
point estimation
Different degrees of transfer for different
parameters are introduced so that some parts
of the distribution can be transferred to a
greater extent than others

Write a Comment

User Comments (0)