Title: MicroArray Data Analysis Candice Quadros
1MicroArray Data AnalysisCandice
QuadrosAmol Kothari
2Neural Network for classification
- Harnessing the power of a neural network for
classifying samples.
3Neural Network for classification
- Reduce the no. of genes
- We have to reduce the data dimensionality, i.e.
reduce the no. of genes to consider. - PCA can be used to select most informative genes,
but it is computationally expensive to obtain the
Eigen vectors for high dimensional data. - Use the method suggested by Golub et al. to
obtain the informative genes.
4Neural Network for classification
- Steps in classification
- Obtain the informative genes using Golubs
method. - Normalize the genes by shifting them to the mean
dividing by the standard deviation. - Train the neural network by using the training
data targets, and get the weights. - Classify the test data using the weights obtained
above.
5Neural Network for classification
Inform. Genes No. of Hidden Units NN Accuracy Golub Accuracy
100 3 70.55 61.76
200 13 76.74 58.82
6Hierarchical Merging When to stop?
- Question When to stop the merging?
- Suggested Solutions
- Diameter(C) ? MaxD
- Avg(sim(Oi,Oj)) ? (Oi,Oj ?C)
- Difficult to estimate the parameters in high
dimensions.
7Hierarchical Merging When to stop?
- Another solution When m clusters are present,
stop merging. - Problem The m clusters might contain single
point clusters. - Use the concept of MinPts (from DBScan). A set of
points is a significant cluster only if the set
has MinPts. - When there are m significant clusters, then stop.
8Hierarchical Merging When to stop?
No. of Significant Clusters
No. of iterations
9Visualization of data Vizstruct
10Visualization of data Vizstruct
- Equation used
- How do weigh each dimension, i.e. how do we
select ?? Default value 0.5 - Use the Eigen Values of each dimension to obtain
the value of ?.
11Visualization of data Vizstruct
- Steps for visualization
- Project the data into Eigen space.
- The Eigen values of each dimension i ?i
- Now use the same formulae for calculating the 2D
point
Where ?i Eigen value of the ith dimension
12Visualization of data Vizstruct
- Results
- The visualization obtained by this method is more
representative of the data, compared to
Vizstruct. - Demo
13