Discriminative Na - PowerPoint PPT Presentation

About This Presentation

Title:

Discriminative Na

Description:

Discriminative classifiers: Support Vector Machines ... The inter-class discriminative information between classes are discarded ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 32

Provided by: kzhu

Category:

more less

Transcript and Presenter's Notes

Title: Discriminative Na

1
Discriminative Naïve Bayesian Classifiers

Kaizhu Huang
Supervisors Prof. Irwin King,
Prof. Michael R. Lyu
Markers Prof. Lai Wan Chan,
Prof. Kin Hong Wong

2
Outline

Background
Classifiers
Discriminative classifiers Support Vector
Machines
Generative classifiers Naïve Bayesian
Classifiers
Motivation
Discriminative Naïve Bayesian Classifiers
Experiments
Discussions
Conclusion

3
Background

Discriminative Classifiers
Directly maximize a discriminative function or
posterior function
Example Support Vector Machines

4
Background

Generative Classifiers
Model the joint distribution for each class
P(xC) and then use Bayes rules to construct
posterior classifiers P(Cx).
Example Naïve Bayesian Classifiers
Model the distribution for each class under the
assumption each feature of the data is
independent with others features, when given the
class label.

5
Background

Comparison

Example of Missing Information
From left to right Original digit, 50 missing
digit, 75 missing digit, and occluded digit.
6
Background

Why Generative classifiers are not accurate as
Discriminative classifiers?

It is incomplete for generative classifiers to
just approximate the inner-class information.
The inter-class discriminative information
between classes are discarded

Scheme for Generative classifiers in two-category
classification tasks
7
Background

Why Generative Classifiers are superior to
Discriminative Classifiers in handling missing
information problems?
SVM lacks the ability under the uncertainty
NB can conduct uncertainty inference under the
estimated distribution.

A is the feature set T is the subset of A, which
is missing
8
Motivation

It seems that a good classifier should combine
the strategies of discriminative classifiers and
generative classifiers.
Our work trains one of the generative classifier
Naïve Bayesian Classifies in a discriminative way.

9
Roadmap of our work

Discriminative training
10
How our work relates to other work?
Jaakkola and Haussler NIPS98
Difference Our method performs a reverse
process From Generative classifiers to
Discriminative classifiers
Beaufays etc., ICASS99, Hastie etc., JRSS 96
Difference Our method is designed for Bayesian
classifiers.
11
How our work relates to other work?
Optimization on Posterior Distribution P(Cx)
3.
Logistical Regression (LR)
Difference LR will encounter computational
difficulties in handling missing information
problems. When number of the missing or unknown
features grows, it will be intractable to
perform inference.
12
Roadmap of our work

13
Discriminative Naïve Bayesian Classifiers
Easily solved by Lagrange Multiplier method
Mathematic Explanation of Naïve Bayesian
Classifier
Working Scheme of Naïve Bayesian Classifier
14
Discriminative Naïve Bayesian Classifiers (DNB)

Optimization function of DNB

Divergence item

On one hand, the minimization of this function
tries to approximate the dataset as accurately as
possible.
On the other hand, the optimization on this
function also tries to enlarge the divergence
between classes.
Optimization on joint distribution directly
inherits the ability of NB in handling missing
information problems

15
Discriminative Naïve Bayesian Classifiers (DNB)

Complete Optimization problem

Cannot separately optimize and as in
NB, Since they are interactive variables now.
16
Discriminative Naïve Bayesian Classifiers (DNB)

Solve the Optimization problem
Nonlinear optimization problem under linear
constraints. Using Rosen Gradient Projection
methods

17
Discriminative Naïve Bayesian Classifiers (DNB)
Gradient and Projection matrix
18
Extension to Multi-category Classification
problems
19
Experimental results

Experimental Setup
Datasets
5 benchmark datasets from UCI machine learning
repository
Experimental Environments
PlatformWindows 2000
Developing tool Matlab 6.5

20
Without information missing

Observations
DNB outperforms NB in every datasets
DNB wins in 2 datasets while it loses in three
dataets in comparison with SVM
SVM outperforms DNB in Segment and Satimages

21
With information missing

DNB uses
to conduct inference when there is information
missing
SVM sets 0 values to the missing features (the
default way to process unknown features in
LIBSVM)

22
With information missing
23
With information missing
24
With information missing
25
With information missing

Observations
NB demonstrates a robust ability in handling
missing information problems.
DNB inherits the ability of NB in handling
missing information problems while it has a
higher classification accuracy than NB
SVM cannot deal with missing information problems
easily.
In small datasets, DNB demonstrates a superior
ability than NB.

26
Discussion

Why SVM outperforms DNB when no information
missing?

SVM
DNB

SVM directly minimizes the error rate, while DNB
minimizes an intermediate term.
SVM assumes no model, while DNB assumes
independent relationship among features. all
models are wrong but some are useful.

27
Discussion

How DNB relates to Fisher Discriminant (FD)?

Using the difference of the mean between two
classes as the divergence measure is not an
informative way in comparison with using
distributions.
FD is usually used as dimension reduction method
rather than a classification method

28
Discussion

Can DNB be extended to general Bayesian Network
(BN) Classifier?
Finding optimal General Bayesian Network
Classifiers is an NP-complete problem.
Structure learning problem will be involved.
Direct application of DNB will encounter
difficulties since the structure is non-fixed in
restricted BNs .
The tree-like discriminative Bayesian Network
Classifier is ongoing.

29
Discussion
Discriminative training of Tree-like Bayesian
Network Classifiers
And as far as possible from the distribution of
the other dataset
Two reference distributions are used in each
iteration.
Approximate the Empirical distribution as close
as possible
30
Future work

Extensive evaluations on discriminative Bayesian
network classifiers including Discriminative
Naïve Bayesian Classifiers and tree-like Bayesian
Network Classifiers.

31
Conclusion