A Comparative Study of classification Methods for Microarray Data Analysis presentation

About This Presentation

Transcript and Presenter's Notes

Title: A Comparative Study of classification Methods for Microarray Data Analysis

1
A Comparative Study of classification Methods for
Microarray Data Analysis

Hong Hu and Jiuyong Li and Ashley Plank and Hua
Wang and Grant Daggard
Department of Mathematics Computing
University of Southern Queensland, Australia

2
Microarray Data Classification

The task of classification is to build a model
(classifier) from categorized
historical Microarray data (training data), and
then use the model to categorize
future incoming data (test data) automatically.
It involves two stages learning
and classification.

classification
Test data
Learning
Training data
Model
Algorithm
Prediction
3
Gene expression Microarray Data

4
motivations

We are very interested in classifying Microarray
data using single tree classifier classification
and ensemble tree methods.
Reading through the literature of Microarray data
classification, it is difficult to find consensus
conclusions on their relative performance.

5
Ensemble method

A ensemble method combines multiple classifiers
(models) built on a set of re-sampled training
datasets or generated from various classification
methods on a training dataset. This set of
classifiers from a decision committee, which
classifies future coming samples

6
Algorithms selected for comparison

SVMs ( Support Vector Machines)
C4.5 ( Decision tree)
BaggingC4.5
AdaBoostingC4.5
Random Forest

7
Experimental design methodology

Test data sets
Seven data sets from kent Ridge Biological Data
set Repository are selected for our experiments.
They were collected from very well researched
journal papers.
Breast cancer
Lung cancer
Lymphoma
ALL-AML Leukemia
Colon
Ovarian
Prostate
Softwares used for comparison
Weka-3-5-2 package

8
Experimental design methodology ..Conts

Two set of experiments on Microarray data with or
without pre-processing
Ten-fold cross-validation
Sign test and Wilcoxon signed rank test

9
Experimental design methodology ..Conts

Sign test
Sign test is used to test whether one random
variable in a pair tends to be larger than the
other random variable in the pair. Given n pairs
of observations. Within each pair, either a plus,
tie or minus is assigned. The plus corresponds to
that one value is greater than the other, the
minus corresponds to that one value is less than
the other, and the tie means that both equal to
each other. The null hypothesis is that the
number of pluses and minuses are equal. If the
null hypothesis test is rejected, then one random
variable tends to be greater than the other.

10
Design experimental methodology ..Conts

Wilcoxon signed rank test
Sign test only makes use of information of
whether a value is greater, less than or equal to
the other in a pair. Wilcoxon signed rank test
calculates differences of pairs. The absolute
differences are ranked after discarding pairs
with the difference of zero. The ranks are sorted
in ascending order. When several pairs have
absolute differences that are equal to each
other, each of these several pairs is assigned as
the average of ranks that would have otherwise
been assigned. The hypothesis is that the
differences have the mean of 0.

11
Experimental results based on preprocessed data
Table 1 Average accuracy of seven preprocessed
data sets

With preprocessed datasets, all ensemble methods
on average perform better than C4.5 and LibSVMs.
Both C4.5 and LibSVM perform similar to each
other.

12
The results of sign test
Table 2 Summary of sign test at 95 confidence
between compared algorithms
13
The results of sign test ..Conts

The limitation of sign test
The sign test measures the difference but not the
magnitude of the difference. Therefore the
difference of 0.01 and 10.0 are considered the
same in the sign test since only plus or minus is
used

14
The results of Wilcoxon signed rank test
Table 3 Summary of Wilcoxon sign test at 95
confidence between compared algorithms
15
Experimental results based on original data sets
Table 4 Average accuracy on seven original data
sets. The last row shows the differences in
average accuracy Between the average accuracy
based on preprocessed data and original data for
every compared classification method.
16
Sign test and Wilcoxon signed rank test
Table 5 Summary of sign test at 95 confidence
between the differences Of compared algorithms
Table 6 Summary of Wilcoxon signed sign test at
95 confidence between the differences of
compared algorithms
17
Conclusion

All ensemble methods are significantly more
accurate than C4.5.
Data pre-processing significantly improves
accuracies of all five compared methods
No sufficient evidence to support the performance
difference between the SVMs and an ensemble
method although the average accuracy of SVM is
much lower than that of an ensemble method.
Wilcoxon signed rank test is better than the sign
test for the evaluation of Microarray
classification

18
Questions ?

Hong Hu
PhD student
Email huhong_at_usq.edu.au
Department of Maths computing, Faculty of
Sciences, USQ

19
Thank you

Write a Comment

User Comments (0)

About PowerShow.com

A Comparative Study of classification Methods for Microarray Data Analysis PowerPoint PPT Presentation