Title: Discriminative Na
1Discriminative Naïve Bayesian Classifiers
- Kaizhu Huang
- Supervisors Prof. Irwin King,
- Prof. Michael R. Lyu
- Markers Prof. Lai Wan Chan,
- Prof. Kin Hong Wong
2Outline
- Background
- Classifiers
- Discriminative classifiers Support Vector
Machines - Generative classifiers Naïve Bayesian
Classifiers - Motivation
- Discriminative Naïve Bayesian Classifiers
- Experiments
- Discussions
- Conclusion
3Background
- Discriminative Classifiers
- Directly maximize a discriminative function or
posterior function - Example Support Vector Machines
4Background
- Generative Classifiers
- Model the joint distribution for each class
P(xC) and then use Bayes rules to construct
posterior classifiers P(Cx). - Example Naïve Bayesian Classifiers
- Model the distribution for each class under the
assumption each feature of the data is
independent with others features, when given the
class label.
5Background
Example of Missing Information
From left to right Original digit, 50 missing
digit, 75 missing digit, and occluded digit.
6Background
- Why Generative classifiers are not accurate as
Discriminative classifiers?
- It is incomplete for generative classifiers to
just approximate the inner-class information. - The inter-class discriminative information
between classes are discarded
Scheme for Generative classifiers in two-category
classification tasks
7Background
- Why Generative Classifiers are superior to
Discriminative Classifiers in handling missing
information problems? - SVM lacks the ability under the uncertainty
- NB can conduct uncertainty inference under the
estimated distribution.
A is the feature set T is the subset of A, which
is missing
8Motivation
- It seems that a good classifier should combine
the strategies of discriminative classifiers and
generative classifiers. - Our work trains one of the generative classifier
Naïve Bayesian Classifies in a discriminative way.
9Roadmap of our work
Discriminative training
10How our work relates to other work?
Jaakkola and Haussler NIPS98
Difference Our method performs a reverse
process From Generative classifiers to
Discriminative classifiers
Beaufays etc., ICASS99, Hastie etc., JRSS 96
Difference Our method is designed for Bayesian
classifiers.
11How our work relates to other work?
Optimization on Posterior Distribution P(Cx)
3.
Logistical Regression (LR)
Difference LR will encounter computational
difficulties in handling missing information
problems. When number of the missing or unknown
features grows, it will be intractable to
perform inference.
12Roadmap of our work
13Discriminative Naïve Bayesian Classifiers
Easily solved by Lagrange Multiplier method
Mathematic Explanation of Naïve Bayesian
Classifier
Working Scheme of Naïve Bayesian Classifier
14Discriminative Naïve Bayesian Classifiers (DNB)
- Optimization function of DNB
Divergence item
- On one hand, the minimization of this function
tries to approximate the dataset as accurately as
possible. - On the other hand, the optimization on this
function also tries to enlarge the divergence
between classes. - Optimization on joint distribution directly
inherits the ability of NB in handling missing
information problems
15Discriminative Naïve Bayesian Classifiers (DNB)
- Complete Optimization problem
Cannot separately optimize and as in
NB, Since they are interactive variables now.
16Discriminative Naïve Bayesian Classifiers (DNB)
- Solve the Optimization problem
- Nonlinear optimization problem under linear
constraints. Using Rosen Gradient Projection
methods
17Discriminative Naïve Bayesian Classifiers (DNB)
Gradient and Projection matrix
18Extension to Multi-category Classification
problems
19Experimental results
- Experimental Setup
- Datasets
- 5 benchmark datasets from UCI machine learning
repository - Experimental Environments
- PlatformWindows 2000
- Developing tool Matlab 6.5
20Without information missing
- Observations
- DNB outperforms NB in every datasets
- DNB wins in 2 datasets while it loses in three
dataets in comparison with SVM - SVM outperforms DNB in Segment and Satimages
21With information missing
- DNB uses
- to conduct inference when there is information
missing - SVM sets 0 values to the missing features (the
default way to process unknown features in
LIBSVM)
22With information missing
23With information missing
24With information missing
25With information missing
- Observations
- NB demonstrates a robust ability in handling
missing information problems. - DNB inherits the ability of NB in handling
missing information problems while it has a
higher classification accuracy than NB - SVM cannot deal with missing information problems
easily. - In small datasets, DNB demonstrates a superior
ability than NB.
26Discussion
- Why SVM outperforms DNB when no information
missing?
SVM
DNB
- SVM directly minimizes the error rate, while DNB
minimizes an intermediate term. - SVM assumes no model, while DNB assumes
independent relationship among features. all
models are wrong but some are useful.
27Discussion
- How DNB relates to Fisher Discriminant (FD)?
FD
- Using the difference of the mean between two
classes as the divergence measure is not an
informative way in comparison with using
distributions. - FD is usually used as dimension reduction method
rather than a classification method
28Discussion
- Can DNB be extended to general Bayesian Network
(BN) Classifier? - Finding optimal General Bayesian Network
Classifiers is an NP-complete problem. - Structure learning problem will be involved.
Direct application of DNB will encounter
difficulties since the structure is non-fixed in
restricted BNs . - The tree-like discriminative Bayesian Network
Classifier is ongoing.
29Discussion
Discriminative training of Tree-like Bayesian
Network Classifiers
And as far as possible from the distribution of
the other dataset
Two reference distributions are used in each
iteration.
Approximate the Empirical distribution as close
as possible
30Future work
- Extensive evaluations on discriminative Bayesian
network classifiers including Discriminative
Naïve Bayesian Classifiers and tree-like Bayesian
Network Classifiers.
31Conclusion
- We develop a novel model named Discriminative
Naïve Bayesian Classifiers - It outperforms Naïve Bayesian Classifiers when no
information is missing - It outperforms SVMs in handling missing
information problems.