A Comparative Study of classification Methods for Microarray Data Analysis PowerPoint PPT Presentation

presentation player overlay
1 / 16
About This Presentation
Transcript and Presenter's Notes

Title: A Comparative Study of classification Methods for Microarray Data Analysis


1
A Comparative Study of classification Methods for
Microarray Data Analysis
  • Hong Hu and Jiuyong Li and Ashley Plank and Hua
    Wang and Grant Daggard
  • Department of Mathematics Computing
  • University of Southern Queensland, Australia

2
Microarray Data Classification
  • The task of classification is to build a model
    (classifier) from categorized
  • historical Microarray data (training data), and
    then use the model to categorize
  • future incoming data (test data) automatically.
    It involves two stages learning
  • and classification.

classification
Test data
Learning
Training data
Model
Algorithm
Prediction
3
Gene expression Microarray Data

4
motivations
  • We are very interested in classifying Microarray
    data using single tree classifier classification
    and ensemble tree methods.
  • Reading through the literature of Microarray data
    classification, it is difficult to find consensus
    conclusions on their relative performance.

5
Ensemble method
  • A ensemble method combines multiple classifiers
    (models) built on a set of re-sampled training
    datasets or generated from various classification
    methods on a training dataset. This set of
    classifiers from a decision committee, which
    classifies future coming samples

6
Algorithms selected for comparison
  • SVMs ( Support Vector Machines)
  • C4.5 ( Decision tree)
  • BaggingC4.5
  • AdaBoostingC4.5
  • Random Forest

7
Experimental design methodology
  • Test data sets
  • Seven data sets from kent Ridge Biological Data
    set Repository are selected for our experiments.
    They were collected from very well researched
    journal papers.
  • Breast cancer
  • Lung cancer
  • Lymphoma
  • ALL-AML Leukemia
  • Colon
  • Ovarian
  • Prostate
  • Softwares used for comparison
  • Weka-3-5-2 package

8
Experimental design methodology ..Conts
  • Two set of experiments on Microarray data with or
    without pre-processing
  • Ten-fold cross-validation
  • Sign test and Wilcoxon signed rank test

9
Experimental design methodology ..Conts
  • Sign test
  • Sign test is used to test whether one random
    variable in a pair tends to be larger than the
    other random variable in the pair. Given n pairs
    of observations. Within each pair, either a plus,
    tie or minus is assigned. The plus corresponds to
    that one value is greater than the other, the
    minus corresponds to that one value is less than
    the other, and the tie means that both equal to
    each other. The null hypothesis is that the
    number of pluses and minuses are equal. If the
    null hypothesis test is rejected, then one random
    variable tends to be greater than the other.

10
Design experimental methodology ..Conts
  • Wilcoxon signed rank test
  • Sign test only makes use of information of
    whether a value is greater, less than or equal to
    the other in a pair. Wilcoxon signed rank test
    calculates differences of pairs. The absolute
    differences are ranked after discarding pairs
    with the difference of zero. The ranks are sorted
    in ascending order. When several pairs have
    absolute differences that are equal to each
    other, each of these several pairs is assigned as
    the average of ranks that would have otherwise
    been assigned. The hypothesis is that the
    differences have the mean of 0.

11
Experimental results based on preprocessed data
Table 1 Average accuracy of seven preprocessed
data sets
  • With preprocessed datasets, all ensemble methods
    on average perform better than C4.5 and LibSVMs.
  • Both C4.5 and LibSVM perform similar to each
    other.

12
The results of sign test
Table 2 Summary of sign test at 95 confidence
between compared algorithms
13
The results of sign test ..Conts
  • The limitation of sign test
  • The sign test measures the difference but not the
    magnitude of the difference. Therefore the
    difference of 0.01 and 10.0 are considered the
    same in the sign test since only plus or minus is
    used

14
The results of Wilcoxon signed rank test
Table 3 Summary of Wilcoxon sign test at 95
confidence between compared algorithms
15
Experimental results based on original data sets
Table 4 Average accuracy on seven original data
sets. The last row shows the differences in
average accuracy Between the average accuracy
based on preprocessed data and original data for
every compared classification method.
16
Sign test and Wilcoxon signed rank test
Table 5 Summary of sign test at 95 confidence
between the differences Of compared algorithms
Table 6 Summary of Wilcoxon signed sign test at
95 confidence between the differences of
compared algorithms
17
Conclusion
  • All ensemble methods are significantly more
    accurate than C4.5.
  • Data pre-processing significantly improves
    accuracies of all five compared methods
  • No sufficient evidence to support the performance
    difference between the SVMs and an ensemble
    method although the average accuracy of SVM is
    much lower than that of an ensemble method.
  • Wilcoxon signed rank test is better than the sign
    test for the evaluation of Microarray
    classification

18
Questions ?
  • Hong Hu
  • PhD student
  • Email huhong_at_usq.edu.au
  • Department of Maths computing, Faculty of
    Sciences, USQ

19
Thank you
Write a Comment
User Comments (0)
About PowerShow.com