Comparison of clustering techniques for breast cancer data - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Comparison of clustering techniques for breast cancer data

Description:

Comparison of clustering techniques. for breast cancer data. Daniele Soria ... M. Abd El-Rehim, G. Ball, S.E. Pinder, E. Rakha, C. Paish, J.F. Robertson, D. ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 23
Provided by: dqs
Category:

less

Transcript and Presenter's Notes

Title: Comparison of clustering techniques for breast cancer data


1
Comparison of clustering techniquesfor breast
cancer data
  • Daniele Soria
  • http//www.cs.nott.ac.uk/dqs
  • 22nd European Conference on Operational Research
  • Prague, 11-07-2007

2
Outline
  • Introduction
  • Case study
  • Methods
  • Results
  • Future work

3
Background
  • Identification of biologically distinct groups
    with clinical and prognostic relevance.
  • TMA technology allows concomitant analyses of
    many proteins on tumour samples.
  • Started from Abd El-Rehim et al. (2005) 1
  • IHC applied to TMA preparations of cases of
    invasive breast cancer
  • To study the combined protein expression profiles
    of a large panel of biomarkers.

4
Abd El-Rehim et al. (2005)
  • IHC results analyzed with hierarchical clustering
  • ANN to categorize cases into groups
  • Six groups obtained
  • Each group driven by different markers
  • Arbitrary choice of clusters number

5
Aims of our work
  • Apply different clustering techniques
  • Integrate previous results and compare them with
    ours
  • Verify stability of results across different
    methods
  • Refine the phenotypic characterisation of breast
    cancer

6
Case study
  • Patients entered into the Nottingham Tenovus
    Primary Breast Carcinoma Series between 1986 and
    1998
  • 1076 cases informative for all 25 biological
    markers
  • Clinical information (grade, size, age, survival,
    follow-up, etc.)

7
Methods
  • Three different algorithms (2?20)
  • Fuzzy c-means (FCM)
  • K-means (KM)
  • Partitioning Around Medoids (PAM)
  • Validity indices computed
  • Software R used (www.r-project.org)

8
FCM method
  • Minimization of the objective function
  • X x1,x2,...,xn n data points
  • V v1,v2,,vc c cluster centres
  • U(µij)nc fuzzy partition matrix
  • µij membership degree of xi to vj
  • m fuzziness index (often m2)

9
KM method
  • Minimization of the objective function
  • xi-vj Euclidean distance between xi and vj
  • cj data points in cluster j
  • vj can be calculated as

10
PAM method
  • Based on the search for k representative objects
    (medoids) among the observations
  • Minimum sum of dissimilarities among the
    observations to their closest medoid
  • k clusters are constructed by assigning each
    observation to the nearest medoid

11
Fuzzy C-means results
  • Hierarchical starting point
  • If c gt 3, no data assigned to all clusters
  • Not clear classification (poor membership)
  • No help from validity indices

12
K-means results
6
3
3
3
6
4
13
PAM results
4
4
4
4
4
4
14
Principal Components
K-Means
PAM
km1, km2, km3, km4, km5, km6
pam1, pam2, pam3, pam4
15
Principal Components
Hierarchical
ART
h1, h2, h3, h4, h5, h6
art1, art2, art3, art4, art5, art6
16
From Clusters to Classes
17
Classes
class1 class2 class3 class4 class5 cla
ss6 62 of data 3-D Visualisation
18
Previous papers
19
Our results
20
Actual and future work
  • New data available from NCH
  • How to assign them to classes?
  • k Nearest Neighbour classification

21
Main references
  • D.M. Abd El-Rehim, G. Ball, S.E. Pinder, E.
    Rakha, C. Paish, J.F. Robertson, D. Macmillan,
    R.W. Blamey, I.O. Ellis, High-throughput protein
    expression analysis using tissue microarray
    technology of a large well-characterised series
    identifies biologically distinct classes of
    breast cancer confirming recent cDNA expression
    analyses, Int. Journal of Cancer, 116, 340-350,
    2005.
  • L. Kaufman, P.J. Rousseeuw, Finding groups in
    data, Wiley series in probability and
    mathematical statistics, 1990.
  • A. Weingessel, E. Dimitriadou and S. Dolnicar, An
    Examination Of Indexes For Determining The Number
    Of Clusters In Binary Data Sets, Working Paper
    No.29, 1999.

22
Thank You!
  • This work was supported by Marie
  • Curie Action MEST-CT-2004-7597
  • under the Sixth Framework Programme
  • of the European Community
  • Contact daniele.soria_at_cs.nott.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com