Ensemble Clustering in Medical Diagnostics - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Ensemble Clustering in Medical Diagnostics

Description:

Data mining approach to discover hidden patterns in data. ... Bagging. Generate clusterings on random subset of data. Random projection ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 21
Provided by: derekg6
Category:

less

Transcript and Presenter's Notes

Title: Ensemble Clustering in Medical Diagnostics


1
Ensemble Clustering in Medical Diagnostics
  • Derek Greene, Alexey Tsymbal
  • Nadia Bolshakova, Pádraig Cunningham

Department of Computer Science, Trinity College
Dublin, Ireland
2
Agenda
  • Cluster Analysis
  • Overview
  • Applications in Medical Diagnostics
  • Ensemble Clustering
  • Motivation
  • General Model
  • Design Issues
  • Experimental Evaluation
  • Ensemble techniques
  • Empirical results
  • Implementation
  • Conclusion

3
Cluster Analysis
  • Data mining approach to discover hidden patterns
    in data.
  • Divide a dataset into groups or clusters of
    objects based on a given similarity criterion.
  • Unsupervised learning procedure
  • Often no information exists concerning underlying
    structure of partition.
  • i.e. number of clusters or their composition.

4
Applications in Medical Diagnostics
  • Examples
  • Categorization of patients into cohesive
    sub-groups.
  • e.g. clustering of cancer patient data to define
    previously unrecognized tumour sub-types
  • Analysis of medical imaging data.
  • e.g. identification of cell tissue types from MRI
    image data
  • Gene expression analysis.
  • e.g. identification of co-regulated gene groups

5
Common Cluster Analysis Methods
  • Hierarchical Methods
  • e.g. hierarchical agglomerative

1 2 3 4
5
6
Ensemble Clustering - Overview
  • Ensembles in Supervised Learning
  • Ensemble have been successfully applied in cases
    where classes are well-defined.(e.g. Breiman,
    1996).
  • Ensemble Clustering
  • Combine the strengths of multiple partitions to
    produce a superior clustering (Strehl Ghosh,
    2002).

7
Ensemble Clustering - Motivation
  • Accuracy in unsupervised learning
  • No pre-defined model for the data
  • No definitive measure of accuracy.
  • A clustering that agrees with domain expert
    opinion is desirable.
  • Issues with common clustering algorithms
  • May be influenced by bias of clustering algorithm
    toward cluster shape and dispersion.
  • Goal for Ensemble Clustering
  • Aggregate a collection of base clusterings to
    produce a more accurate partition of a dataset.

8
Ensemble Clustering - Model
  • Generic model for Ensemble Clustering

Dataset
9
Ensemble Design Decisions
  • Base Algorithm
  • Which clustering algorithm to apply to produce
    the base clusterings?
  • e.g. k-means, k-medoids, weak clustering
  • Generation Strategy
  • How many base clusterings to generate?
  • How can we ensure diversity among the base
    clusterings?
  • Integration Strategy
  • How should the base clusterings be aggregated?

10
Experimental Overview
  • Goal
  • Evaluate ensemble generation and integration
    strategies on a varied collection of datasets.
  • Data
  • Benchmark datasets
  • Iris, 2-Spirals, Half-rings
  • Real-world medical databases from UCI ML
    repository
  • Breast cancer
  • Pima Indians diabetes
  • Cleveland heart disease
  • BUPA liver-disorders
  • Lymphography
  • Thyroid disease

11
Generation Strategies
  • Plain
  • Rely on stochastic element in base clustering
    algorithm.
  • Random-k
  • Randomly select number of clusters (k).
  • Bagging
  • Generate clusterings on random subset of data.
  • Random projection
  • Randomly transform data to new set of features.
  • Random subspacing
  • Randomly select subset of original features.
  • Heterogeneous ensembles
  • Use multiple different base clustering algorithms.

12
Integration Strategies
  • Co-Occurrence Method
  • Determine level of association between each of
    pair objects in a dataset (Jain Fred, 2002).

Base clusterings
Co-occurrence matrix
A
D
C
E
B
A
D
C
E
B
A
D
C
E
B
13
Integration Strategies (cont.)
  • Which algorithm to use for meta-clustering?
  • We apply hierarchical agglomerative clustering
    algorithm to co-occurrence matrix.
  • Single-linkage
  • Complete-linkage
  • Average-linkage

14
Evaluation - Accuracy
  • Comparison to single k-means algorithm based on
    Jaccard accuracy score

15
Evaluation - Diversity v. Accuracy
  • Comparison of generator diversity with Jaccard
    accuracy scores across all datasets
  • Results indicate that diversity alone is not
    sufficient to yield an improved solution.
  • Base accuracy is also important.

16
Evaluation - Meta-clustering Algorithms
  • Comparison of hierarchical meta-clustering
    algorithms based on Jaccard accuracy scores
    across all datasets
  • Results indicate that choice of integration
    strategy is important
  • Choice of algorithm may often be domain-specific.

17
Implementation
  • MachaonClustering Framework

http//www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html
18
Conclusion
  • Summary
  • Ensemble clustering offers potential to improve
    our ability to identify hidden patterns in data.
  • To exploit this, appropriate design decisions
    must be made
  • Sufficient level of diversity in base
    clusterings.
  • Suitable meta-clustering algorithm.
  • Future Work
  • Examine relationship between accuracy of ensemble
    members and final output.
  • Consider alternative integration strategies.

19
Contact Details
Derek Greene Department of Computer Science
Trinity College Dublin, Ireland Derek.Greene_at_cs.
tcd.ie
20
IEEE CBMS 2005
Trinity College Dublin June 23-24
The 18th IEEE Symposium on Computer-Based Medical
Systems
Write a Comment
User Comments (0)
About PowerShow.com