BIOPTRAIN Meeting - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

BIOPTRAIN Meeting

Description:

Combination of various data sets available at the centers into a single European ... and Adaptive Resonance Theory (Liverpool University), other methods (Milan) ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 16
Provided by: dqs
Category:

less

Transcript and Presenter's Notes

Title: BIOPTRAIN Meeting


1
BIOPTRAIN Meeting
  • Daniele Soria

2
Who I am
  • Daniele Soria (Room B79)
  • Bioinformatic analysis of breast cancer data
  • First Supervisor Dr. Jon Garibaldi

3
Main goals
  • Collaboration between U.o.N. and I.N.T.
  • Combination of various data sets available at the
    centers into a single European resource for
    breast cancer data
  • Investigation of novel computational analysis
    methods applicable across bioinformatics data and
    routine clinical information

4
How to reach the goals
  • SQL data management, in order to work better with
    databases
  • Meetings with pathologists and researchers of the
    School of Molecular Medical Sciences at
    Nottingham City Hospital

5
Current work (1)
  • Databases management
  • Discrepancies in different clinical databases
    from QMC also involving Gene Microarray data and
    Mass Spectrometry data
  • Unique dataset with all information available
    (still in progress).

6
Current work (2)
  • Analysis of Breast Cancer populations involving
    various centers and different steps
  • 1. Cluster definition Fuzzy C-means Clustering
    (UoN), Hierarchical Euclidian (NTU), K-means
    and Adaptive Resonance Theory (Liverpool
    University), other methods (Milan)
  • 2. First statistical analysis data description,
    cluster comparison and evaluation in terms of
    clinical characteristics (Kaplan-Meier of
    survival, disease free interval and time to
    recurrence)

7
Databases management
  • 15 different databases provided by Nottingham
    City Hospital and Nottingham Trent University
  • 2500 cases with almost 300 variables (clinical
    data and biomarker determinations)
  • Find discrepancies between data and check all
    possible duplications
  • Put all together

8
Analysis of BC populations
  • Fuzzy C-means clustering
  • Minimization of the objective function
  • X x1,x2,...,xn collection of n data points,
    V v1,v2,,vc set of corresponding c
    cluster centres in the dataset X,
  • µij membership degree of data xi to the
    cluster centre vj
  • m (fuzziness index) used to control the
    fuzziness of membership of each datum (often m2)

9
Analysis of BC populations
  • Xie-Beni validity index
  • Evaluation of the quality of clustering
  • May be used to find the optimal number of
    clusters
  • Measure of clusters compactness and separation
  • If S represents the overall validity index, is
    the compactness and s is the separation of the
    fuzzy c-partition of the dataset, then Xie-Beni
    validity index can be expressed as

10
First results (1)
  • Split of data in two clusters
  • Calculation of Xie-Beni validity index for all
    clustering methods used
  • Visualisation of Kaplan-Meier curves

11
First results (2)
12
First results (3)
13
Next steps
  • Chi-squared tests on survival between clusters
    cut-off 5 years without recurrence
  • Mann-Whitney Test on continuous variables divided
    by cluster groups
  • Non-parametric statistical models (Cox)

14
Acknowledgments
  • Dr. J. Garibaldi
  • I.N.T. Dr. E. Biganzoli, Dr. P. Boracchi
  • Dr. A. Green, Dr. G. Balls
  • Marie Curie Action MEST-CT-2004-7597 6FP of
    European Community
  • Other PhD students (especially Xiao Y Wang)

15
The End Thank you!
  • World Champions
  • ITALIA
Write a Comment
User Comments (0)
About PowerShow.com