Aucun titre de diapositive - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Aucun titre de diapositive

Description:

Alexandre G. de Brevern1, Serge Hazout1 and Alain Malpertuy2. 1 Equipe de Bioinformatique G nomique et Mol culaire (EBGM), INSERM E0346, Universit PARIS VII, ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 2
Provided by: Thym8
Category:

less

Transcript and Presenter's Notes

Title: Aucun titre de diapositive


1
Incidence of Missing Values in Hierarchical
Clustering of Microarrays Data Alexandre G. de
Brevern1, Serge Hazout1 and Alain Malpertuy2 1
Equipe de Bioinformatique Génomique et
Moléculaire (EBGM), INSERM E0346, Université
PARIS VII, 2 place Jussieu, case 7113, 75251
Paris France - debrevern,hazout_at_urbb.jussieu.fr
2 Atragene Bioinformatics, 4 Rue Pierre Fontaine,
91000 Evry France - alain.malpertuy_at_atragene.com
  • 1 - Introduction
  • Microarray technologies enable to determine
    expressed genes in various cell types, time and
    experimental conditions. These experiments are
    performed on a large scale and produced large
    amount of data. Clustering methods such as
    Hierarchical Clustering (HC) 1 or Self
    Organizing Map (SOM) 2 are frequently used to
    analyse microarray data and led to the
    identification of co-expressed genes. However,
    microarray datasets often contain missing values
    (MV) which occurred during the multistep
    experimental process (spotting, hybridization,
    image scanning). These missing data represent a
    major drawback for the use of HC, K-means 3 and
    SOM and do not allowed to perform Principal
    Component Analysis 4.
  •  
  • Since the presence of MV may significantly modify
    the clustering results, the users usually remove
    the genes with MV or replace the MV by a constant
    (zero or the average expression level of the
    entire experiment). Recently, Troyanskaya et al.
    showed that the values of MV could be estimated
    by using the weighted K - Nearest Neighbours
    method (KNN). The KNN approach computes the
    estimated value from the k closest expression
    profils among the dataset. The authors considered
    that KNN, with k 15, is the most accurate
    method to estimate MV in microarray data 5.
  •  
  • In order to evaluate the influence of MV in
    clustering results, we analysed the missing
    values distribution in different public sets of
    Saccharomyces cerevisiae and we studied the
    effects of MV on HC. We finally compared the KNN
    replacement method with the most frequently used
    zero approach.
  • 2 - Materials and Methods
  • We used the following microarray datasets from
    Saccharomyces cerevisiae Ogawa et al. 6,
    Spellman et al. 7, Ferrea et al. 8 and Gasch
    et al. 9. To study the incidence of the
    presence of MV in microarray experiments, the
    genes containing MV were removed from the
    original datasets and different subsets were
    selected from the resulting set. For example, the
    treatment of Ogawa published data gave a set of
    5783 genes, which led to six subsets of 827, 968,
    1159, 1448, 1929 and 2892 genes. For each subset,
    we introduced randomly MV with a rate e (1.0 to
    50.0). Hierarchical Clustering was then
    performed with seven metrics (average, centroid,
    complete, mcquitty, median, single and ward using
    hclust from R software) and compared with the HC
    results obtained with the corresponding original
    subsets containing no MV.
  •  
  • For the 5783 genes set and subsets containing no
    MV we defined C clusters from the HC trees
    (figure 1). We then compared the consistency of
    these C clusters with the corresponding trees
    computed with MV (100 distinct simulations have
    been done for each e value). The number of
    clusters C is function of the metric used. C must
    be defined to give at least ten clusters
    representing 80 of the whole genes.
  •  
  • In a second step, we used the KNN and zero
    methods to replace the MV and performed HC on the
    resulting sets.
  • Evaluation of the KNN replacement method
  • Table 2 Presentation of the optimal k values
    obtained
  • from Ogawa sets.
  •  
  • We observed that the optimal k values were lower
    in small subsets
  • than in sets with higher number of genes. The
    error rate decreased
  • with high number of genes.
  • Figure 3 Comparison of real and KNN predicted
  • values distribution
  •  
Write a Comment
User Comments (0)
About PowerShow.com