PUMAdb: Data Analysis Tutorial

About This Presentation

Title:

PUMAdb: Data Analysis Tutorial

Description:

http://puma.princeton.edu/help/ http://puma.princeton.edu/help ... The names should be separated by two colons. Gene Selection: All genes. Ten arrays. All genes ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 120

Provided by: johnm109

Learn more at: https://puma.princeton.edu

Category:

more less

Transcript and Presenter's Notes

Title: PUMAdb: Data Analysis Tutorial

1
PUMAdb Data Analysis Tutorial

June 1, 2004

2
User Help Help, Tutorials and Workshops

Help FAQ
http//puma.princeton.edu/help/
http//puma.princeton.edu/help/FAQ.shtml
Tutorials regularly scheduled
Welcome tutorial
Data analysis, Normalization and Clustering
Interested? Email array_at_genomics.princeton.edu
Hybridization Scanning Individual Instruction
Email dstorton_at_molbio.princeton.edu

3
PUMAdb Data Analysis

Data Analysis Background
Data normalization
Clustering algorithms
Data centering
Using the Databases Analysis Pipeline
Gene Selection and Annotation
Data Filtering
Data Retrieval
Gene Filtering
Clustering and Image Generation

4
Data Analysis Background

Data normalization
Transforms data for cross-array comparison, by
eliminating or compensating for some biases.
Clustering algorithms
Identifies and reveals patterns within the data.
Data centering
Transforms data for within-array comparison.

5
What is data normalization?

Normalization is an attempt to correct for
systematic bias in data.
Normalization allows you to compare data from one
array to another.
In practice we do not always understand the data
- inevitably some biology will be removed too (or
at least not revealed).

6
Tumor
Pool of Cell Lines
7
Such biases have consequences

Plotting the frequency of un-normalized
intensities reveals the differential effect
between the two channels.

8
How do we deal with this?

Normalization
In general, an assumption is made that the
average gene does not change.
You need to understand your data, to know if that
is an appropriate assumption or not.
The number of reporters (clones or genes) you
are assaying will affect this.

9
Normalization
10
Effect on log ratios
Un-normalized
Normalized
Frequency
Log-ratios
11
Total Intensity Normalization

For those spots that are thought to be well
measured, calculate mean or median log ratio.
Use this as a normalization factor to adjust all
log ratios.
Equivalent to assuming same total intensity in
both channels.
Our current software
provides two simple methods for selection of well
measured spots pixel-by-pixel regression, and
foreground over background intensity.
calculates normalized values for all channel 2
measurements, and ratios.

12
Normalization by Subset

Housekeeping genes
Calculate normalization based on biologically
determined stable genes.
Not always valid even very stable genes can
respond to some conditions.
Spiking or doping controls
Calculate based on introduced DNA species.
Requires careful measurement of total DNA in each
channel.
Our software accepts a global (per array),
user-defined normalization factor for this
purpose.

13
PUMAdb Data Analysis

Data Analysis Background
Data normalization
Clustering algorithms
Data centering
Using the Databases Analysis Pipeline
Gene Selection and Annotation
Data Filtering
Data Retrieval
Gene Filtering
Clustering and Image Generation

14
Clustering Algorithms
In microarray studies, we often use clustering
algorithms to help us identify patterns in
complex data. For example, we can randomize the
data used to represent this painting and see if
clustering will help us visualize the pattern.
15
Clustering algorithms
?
The painting is sliced into rows which are then
randomized.
16
Clustering algorithms
Rows ordered by hierarchical clustering with
nodes flipped to optimize ordering
17
Clustering algorithms
Rows ordered by Self-Organizing Maps
18
Clustering Random vs. Biological Data
From Eisen MB, et al, PNAS 1998 95(25)14863-8
19
How does clustering work?

Compare all expression patterns to each other.
Join patterns that are the most similar out of
all patterns.
Compare all joined and unjoined patterns.
Go to step 2, and repeat until all patterns are
joined.

20
How do we compare expression profiles?

Treat expression data for a gene as a
multidimensional vector.
Decide on a distance metric to compare the
vectors.
Plenty to choose from
Pearson correlation, Euclidean Distance,
Manhattan Distance etc.

21
Expression Vectors

Crucial concept for understanding clustering
Each gene is represented by a vector where
coordinates are its values (log(ratio)) in each
experiment
x log(ratio)expt1
y log(ratio)expt2
z log(ratio)expt3
etc.

22
Distance Metrics

Distances are measured between expression
vectors
Distance metrics define the way we measure
distances
Many different ways to measure distance
Euclidean distance
Pearson correlation coefficient(s)
Manhattan distance
Mutual information
Kendalls Tau
etc.
Each has different properties and can reveal
different features of the data

23
Euclidean distance

The Euclidean distance metric detects similar
vectors by identifying those that are closest in
space.
In this example, A and C are closest to one
another.

24
Pearson correlation

The Pearson correlation disregards the magnitude
of the vectors but instead compares their
directions.
In this example, Gene A and Gene B have the same
slope, so would be most similar to each other.

25
Distance Metric Pearson vs. Euclidean
A
B
C

By Euclidean distance, A and B are most similar.
By Pearson correlation, A and C are most similar.

26
Hierarchical Clustering

Calculate the distance between all genes. Find
the smallest distance. If several pairs share the
same similarity, use a predetermined rule to
decide between alternatives.
Fuse the two selected clusters to produce a new
cluster that now contains at least two objects.
Calculate the distance between the new cluster
and all other clusters.
Repeat steps 1 and 2 until only a single cluster
remains.
Draw a tree representing the results.

27
Clustering Optimizing node order

When joining a gene vector to another, it is
important to think about the order in which the
nodes are joined.
In this example, ASH1 is allegedly most similar
to PIR1, so their patterns are displayed adjacent
to one another.

28
And we finally get a cluster
29
Clustering Two-way clustering

Just as gene patterns are clustered, array
patterns can be clustered.
All the data points for an array can be used to
construct a vector for that array and the vectors
of multiple arrays can be compared.

30
Clustering Two-way Clustering
Two-way clustering can help show which samples
are most similar, as well as which genes.
31
So is clustering the solution?

Advantages
Simple
Easy to implement
Easy to visualize
Disadvantages
Can lead to incorrect/incomplete conclusions
Discarding of subtleties in 2-way clustering
May be driven by strong sub-clusters

32
Clustering Partitioning Methods

Split data up into smaller, more homogenous sets
Should avoid artifacts associated with
incorrectly joining dissimilar vectors
Can cluster each partition independently of
others
Self-Organizing Maps is one partitioning method

33
Clustering Self Organizing Maps

SOMs result in genes being assigned to partitions
of most similar genes.
Neighboring partitions are more similar to each
other than they are to distant partitions.

34
The 64,000 question

How many partitions do I use?
Ask a statistician
Tibshirani R, et al. (2000) Estimating the number
of clusters in a dataset via the Gap statistic
http//www-stat.stanford.edu/tibs/ftp/gap.pdf
Ask us, and well say trial and error -)
The ideal outcome is a single expression pattern
in each partition, and each partition distinct
from the others.

35
PUMAdb Data Analysis

Data Analysis Background
Data normalization
Clustering algorithms
Data centering
Using the Databases Analysis Pipeline
Gene Selection and Annotation
Data Filtering
Data Retrieval
Gene Filtering
Clustering and Image Generation

36
Data Centering

Centering sets the average value of a vector to
zero.
This results in a loss of information, but may
reveal important patterns.

37
Data Centering

Gene centering is useful when the actual value of
the ratio is not important or is not meaningful
(e.g., common reference).
Centering is generally not appropriate when using
a biologically meaningful control sample, such as
a matched, untreated sample, or a zero timepoint.

38
Data Transformation Centering

To illustrate how centering affects data, a small
sample of data were duplicated. A constant was
added to the second copy of each row

39
Data Centering Effects of Different Centering
Strategies
Uncentered Data, No Centering Metric During
Clustering
Uncentered Data, Centering Metric During
Clustering
Centered Data, No Centering Metric During
Clustering
Centered Data, Centering Metric During Clustering
40
PUMAdb Data Analysis

Data Analysis Background
Data normalization
Clustering algorithms
Data centering
Using the Databases Analysis Pipeline
Gene Selection and Annotation
Data Filtering
Data Retrieval
Gene Filtering
Clustering and Image Generation

41
Data Retrieval and Analysis

Experiment names will be listed with feature
extraction software indicated.

42
Gene Selection and Annotation

Specify genes or clones
Collapse data by SUID or LUID
Determine UID column
Choose biological annotation
Label result set

43
Gene Selection Specify Genes or Clones

Use all genes or clones on an array
Select a genelist from your loader account
Enter a list of genes to select. The names
should be separated by two colons

44
Gene Selection All genes

Ten arrays
All genes
No control or empty spots, Spot flag 0
8690 SUIDs used in cluster
Using all genes results in a very long cluster!

45
Gene Selection Genelists

Ten arrays
500-gene genelist
No control or empty spots, Spot flag 0
380 SUIDs used for cluster
Using a genelist reduces the length of the cluster

46
Gene Selection Specify Genes or Clones

Using all genes or clones on an array will give
you a very long list of genes. This is the best
option when you have no pre-existing expectations
about your data and simply want to see what is
happening.
Selecting a genelist from your loader account
will give you a more select group of genes. This
can be appropriate for testing hypotheses.

47
Gene Selection Retrieving and Collapsing Data

Collapse or averaging occurs within a single
array. Multiple instances of the same entity
will be combined as specified.
Duplicated entities can be defined in three ways
Sequence Unique ID (the identifier for a
reporter). A SUID refers to the sequence itself.
Laboratory Unique ID (the identifier for the
source of the sample in the lab). An LUID refers
to a specific microtiter well. Multiple LUIDs
may correspond to one SUID.
SPOT (the number corresponding to a feature on a
print). This option only appears for retrieval
from a single print (array design). Multiple
spots/features on an array may contain a single
LUID or SUID.

48
Gene Selection Collapse by SUID

Ten arrays
500 gene genelist
No control or empty spots, Spot flag 0
380 SUIDs used for cluster

49
Gene Selection Collapse by LUID

Ten arrays
Gene list of 500 genes
No control or empty spots
Retrieve by LUID
397 LUIDs used for cluster
Retrieving via LUIDs may increase the number of
gene vectors generated

50
Gene Selection Collapse Data

Retrieving by SUID (databases identifier for
sequence) yields 380 genes -- samples that came
from different microtiter wells will be collapsed
if they are called the same sequence
Retrieving by LUID (the identifier for the
original microtiter well location of the sample)
yields 397 genes -- even if samples are the
same sequence, they will not be collapsed if they
come from different microtiter wells

51
Gene Annotation UID column

Rows of data can be labeled with one of four
options
Systematic name / clone ID (the default)
SUID gives the databases unique ID
LUID gives the labs unique ID (we dont always
have data for this defaults to SUID)
SPOT gives the spot number

52
Gene Annotation Biological Annotation

The list includes all information stored within
the database for any gene from the organism in
question. Not all genes will have all
annotations.
Annotations from a genelist (selected earlier)
can be used to describe the genes

53
Array Annotation Name Choices

Arrays (hybridizations) are identified in the
database by slide name (e.g., serial number) and
experiment name, both unique.
Agilent and Affymetrix data sets are further
identified by a result set name possibly more
than one per hybridization, and not guaranteed to
be unique.

54
Gene Selection and Annotation Summary

Specify genes or clones
Collapse data by SUID or LUID
Determine UID column
Choose biological annotation
Label arrays/hybridizations

55
PUMAdb Data Analysis

Data Analysis Background
Data normalization
Clustering algorithms
Data centering
Using the Databases Analysis Pipeline
Gene Selection and Annotation
Data Filtering
Data Retrieval
Gene Filtering
Clustering and Image Generation

56
Data Filtering

Choose data column to retrieve
Elect to invert reverse dye replicates
Elect to filter by spot flag
Select spot criteria for filtering
Define image presentation options

57
Data Filtering Choose Data to Retrieve

You can retrieve and cluster any numerical
measurement from your data.
Clustering doesnt necessarily make sense for all
fields.
Default (and most appropriate) fields for
clustering are log ratio (two-channel data) and
signal/intensity (single-channel data).

58
Data Filtering Spot Flags, Reverse Replicates

Unreliable spots (identified by software or
visual inspection) can be flagged. Spots that
are not flagged are given a flag value of 0.
Autoflags (GenePix 5.0) are included in this
option.
If your experiments are identified as reverse
replicates, clicking on the reverse option will
properly invert the ratio and log ratio data.

59
Data Filtering Selecting Filtering Criteria

Each spot will be individually assessed as
specified, prior to any averaging or collapse.
Each filter can be made active and customized as
desired.
Filters can be combined using logical operators
(filter string), defaulting to a logical AND.
Filters available will be appropriate to the
feature extraction software used. The exception
is ScanAlyze and older versions of GenePix, which
get (but cant use) all options for GenePix.

60
Data Filtering Default Spot Filters

Regression correlation measures pixel-by-pixel
agreement between the two channels.
Foreground/Background intensities are a simple
measure of signal to noise.
Absolute intensity cutoffs impose a minimum net
signal.
Failed and Is Contaminated refer to the
quality of the spotted material.
Equivalent defaults are presented for Agilent
data.
Affymetrix data can be filtered on detection,
detection p-value, etc.
Any data, including biological annotations, can
be used for customized filters.

61
Data Filtering Filter selection

Data filters should be customized for the data
retrieved.
Uniform filter values will be applied to each
array retrieved.
The database makes available some basic tools for
examining data and choosing appropriate filter
values.

62
Data Filtering Filter Selection

Any numerical field can be plotted against any
other (or none), in a scatter plot or histogram.
This is useful for quality assessment, and for
selecting filters.

63
Data Filtering Regression Correlation

Plot filter field (here regression correlation)
against test field (log ratio).
Log ratios should center around 0.
Here, the log ratios appear to diverge below a
regression correlation of about 0.4 - 0.6.

64
Spots with low regression correlation
65
Data Filtering

Ten arrays
500 gene gene list
Spot flag 0
No other filters
380 SUIDs used for cluster

66
Data Filtering Regression Correlation

Ten arrays
500 gene Genelist
Spot flag 0
Regression correlation gt 0.6
380 SUIDs used for filtering
Filtering away spots with low regression
correlation removes many spots

67
Data Filtering Regression Correlation

Ten arrays
500-gene genelist
Spot flag 0
Regression correlation gt 0.8
364 SUIDs used for clustering
A more stringent filter reduces the data quite a
bit and even removes some genes entirely

68
Data Filtering Foreground to Background
Intensity Ratios

FG/BG (log scale) versus log ratio
Data center around 0
Impose cutoff at 2.5 (linear) to eliminate
flare at low relative intensity.

69
Data Filtering Intensity to Background Ratios

Ten arrays used
500-gene genelist
Spot flag 0
Normalized Channel 2 (red) mean intensity divided
by Normalized Channel 2 median background greater
than 2.5
371 SUIDs used for clustering
Some arrays show very high background and some
genes show such high background that they did not
pass this filter in any array

70
Data Filtering Intensity to Background Ratios

Ten arrays used
500-gene genelist
Spot flag 0
Channel 1 (green) mean intensity divided by
Channel 1 median background greater than 2.5
377 SUIDs used for clustering
Often, background can be higher in one channel --
note that fewer data are removed here than when
we used the same filter on Channel 2 (red)

71
Data Filtering Intensity to Background Ratios
72
Data Filtering Intensity Cutoff

More than one way to look at a fish.

73
Data Filtering Combinations of Filters

Ten arrays
500-gene genelist
Spot flag 0
Regression correlation gt 0.6
Net intensity in either channel gt 350
374 SUIDs selected for clustering
This data set was formed by selecting spots that
are good quality (via the regression correlation)
and good intensity in at least one channel

74
Data Filtering

No filters 380 SUIDs
Regression correlation gt 0.8 364 SUIDs
Ratio of intensity to background in both channels
gt 2.5 370 SUIDs
Net intensity in either channel gt 350 377 SUIDs
70 of pixels within one standard deviation of
background 345 SUIDs
Regression correlation gt 0.6 AND Net intensity in
either channel gt 350 374 SUIDs

75
Data Filtering Image Presentation Options

Retrieve spot coordinates will allow you to see
an assembled image of each array after
clustering. (However, multiple spots with the
same contents interact poorly with use of
systematic names as IDs - only one spot image
will be shown).
Show all spots allows you to view the spots you
filtered out (in addition to the ones that passed
filtering) after clustering. This slows down
retrieval.

76
Data Filtering Summary

Choose data column to retrieve
Elect to invert reverse-dye replicates
Elect to filter by spot flag
Select spot criteria for filtering (spot filters
dont remove genes, but just gray data that
dont pass, unless all spots are removed)
Define image presentation options

77
PUMAdb Data Analysis

Data Analysis Background
Data normalization
Clustering algorithms
Data centering
Using the Databases Analysis Pipeline
Gene Selection and Annotation
Data Filtering
Data Retrieval
Gene Filtering
Clustering and Image Generation

78
Data Retrieval

General results and progress
PreClustering (.pcl) file
Data retrieval summary report
Option to deposit data in repository

79
Data Retrieval Summary
80
Data Processing and Clustering

Experiment Selection
Gene Selection and Annotation
Data Filtering
Data Retrieval
Gene Filtering
Clustering and Image Generation

81
Gene Filtering

Transform single-channel data
Filter genes based on data distribution
Data centering
Filter genes based on data values
Filter genes and arrays based on spot filter
criteria

82
Gene Filtering Transformation

Single-channel (e.g., Affymetrix) data only.
Adjust arrays for simple cross-array
normalization.
Log-transform data for clustering.
May add a constant for variance stabilization
May replace non-positive values with very small
values

83
Gene Filtering Data Distribution

Rank will select genes whose retrieved value is
in the top Nth percentile for M or more arrays.
Deviations selects those genes whose retrieved
value has a value significantly above or below
the mean (N standard deviations), for M or more
arrays.

84
Gene Filtering Percentile Rank

Ten arrays
500-gene genelist
Spot flag 0
Regression correlation gt 0.6
Net intensity in either channel gt 350
Rank gt 95 in at least one array
66 SUIDs are used for clustering
Many spots are removed, since only the spots that
were very intense in the red channel were
included

85
Gene Filtering Deviation from Mean Value

Ten arrays, 500-gene genelist
Spot flag 0
Regression correlation gt 0.6
Net intensity in either channel gt 350
Genes whose Log(Normalized Red/Green) is more
than one standard deviation from mean in at least
one array
70 SUIDs selected for clustering
This filter removes spots that do not show
significant variance from the mean -- a good way
to identify genes with potentially interesting
behavior

86
Gene Filtering Centering Data

Data can be centered at this stage. This
transforms the data so that the mean value is
equal to zero. Images and downloaded files will
reflect this transformation.
During clustering, data can be treated as if they
were centered, but the values of the data are not
affected.
Data centering and centering during clustering
can be combined in all four possible ways.
Gene centering is useful for common references.
Array centering amounts to renormalizing each
array, using the spots that pass the spot filter
criteria.

87
Data Centering Effects of Different Centering
Strategies
Uncentered Data, No Centering Metric During
Clustering
Uncentered Data, Centering Metric During
Clustering
Centered Data, No Centering Metric During
Clustering
Centered Data, Centering Metric During Clustering
88
Gene Filtering Center Genes
Centered
Uncentered

Ten arrays, 500-gene genelist, Spot flag 0
Regression correlation gt 0.6
Net intensity in either channel gt 350
Genes centered -- no effect on number of SUIDs
clustered, but distribution of signal is changed
(centered data is displayed on left)

89
Gene Filtering Data Values

Cutoff requires data to exceed a user-defined
value in at least M arrays. This is perhaps our
least useful filter. Especially when data are
centered, you could be losing important
information.
Distance requires that the length of the genes
expression vector, across all arrays, be greater
than a user-defined value. This is a general
measure of response to experimental conditions.
Only available for log ratio data.

90
Gene Filtering Values of Log(Red/Green)

Ten arrays, 500-gene genelist, Spot flag 0
Regression correlation gt 0.6
Net intensity in either channel gt 350
Log of Red/GreenNormalized Ratio (Mean) is
absolute value gt 2 for at least 1 array
57 SUIDs selected for clustering
Since this is a filter based on values, caution
should be exercised -- values often change during
normalization and centering.

91
Gene Filtering Spot Filter Criteria

Genes can be screened out if they do not meet the
spot criteria a given percentage of the time, as
specified by the user.
Arrays can be similarly filtered out if they do
not meet the spot filter criteria.

92
Gene Filtering Amount of Data Passing Filters

Ten arrays, 500-gene genelist, Spot flag 0
Regression correlation gt 0.6
Net intensity in either channel gt 350
Centered genes and arrays
Genes must have 80 of spots pass filters
285 SUIDs are used for the cluster
This reduces the number of missing data genes
and permits the clustering to be performed on
genes with more data points.

93
Gene Filtering Amount of Data Passing Filters

Ten arrays, 500-gene genelist, Spot flag 0
Regression correlation gt 0.6
Net intensity in either channel gt 350
Centered genes and arrays
Genes and Arrays must have 80 of spots pass
filters
285 SUIDs are used for the cluster
Filtering away arrays whose spots fail the
filters at a high frequency is a good way to
remove pathologically bad arrays

94
Spot Filtering vs. Gene Filtering
Gene filters remove the genes that do not meet
the filter criteria often enough. This reduces
the number of genes.
Spot filters remove individual data points. That
means there will be more missing (gray) data.
95
Gene Filtering Summary

Correct selection of filters will retain
interesting data and remove those that are
unreliable or uninteresting.
A good understanding of your experiment is
REQUIRED before you can decide which filters make
biological sense.
Not all filtering criteria are useful for all
experiments.

96
Gene Filtering Results

The numbers of genes and arrays are shown
PreClustering files (.pcl) can be downloaded
Summary report is available
May deposit to repository at this stage.
Proceed to clustering

97
Gene Filtering Data Retrieval Summary Report
98
Gene Filtering Summary

Transform single-channel data
Filter genes based on data distribution
Center data
Filter genes based on data values
Filter genes and arrays based on spot filter
criteria

99
PUMAdb Data Analysis

Data Analysis Background
Data normalization
Clustering algorithms
Data centering
Using the Databases Analysis Pipeline
Gene Selection and Annotation
Data Filtering
Data Retrieval
Gene Filtering
Clustering and Image Generation

100
Clustering and Image Generation

Partitioning options
Clustering metric selections
Correlated genes
Image generation options

101
Clustering Metric Selections

Genes and arrays can be clustered.
Pearson correlation treats vectors as if they
were the same (unit) length.
Euclidean distance measures the absolute distance
between two points in space. Therefore Euclidean
distance will be affected by both the direction
and the amplitude of the vectors.

102
Clustering Gene Clustering

Ten arrays, 500-gene genelist, Spot flag 0
Regression correlation gt 0.6
Net intensity in either channel gt 350
Centered genes
Genes must have 80 of spots pass filters
274 SUIDs are used for the cluster
No centering during clustering
Pearson correlation, genes clustered

103
Clustering Tree Displays

Clustered gene arrays are displayed adjacent to
most similar arrays.
The nodes of the trees indicate the members of an
array and the degree of similarity to its
neighbor.

104
Clustering Array clustering

Ten arrays, 500-gene genelist, Spot flag 0
Regression correlation gt 0.6
Net intensity in either channel gt 350
Centered genes, 80 must pass filters
274 SUIDs are used for the cluster
No centering during clustering
Pearson correlation, clustering genes and arrays
Clustering of arrays will change the order of the
arrays in your display

105
Clustering Tree Displays

Clustering arrays will give a tree for the arrays
that is very similar to that for the genes

106
Clustering Array Clustering
No Array Clustering
With Array Clustering
107
Clustering Partitioning Data

Data can be partitioned into a Self Organizing
Map (SOM)
If partitioned, dimensions of the SOM must be
specified

108
Clustering Self Organizing Maps

SOMs result in genes being assigned to partitions
of most similar genes
Neighboring partitions are more similar to each
other than they are to distant partitions

109
Clustering Correlated Genes

A file listing the best-correlated genes, for
each gene retrieved, can be produced.

110
Clustering Image Generation Options

Contrast can be modified
Missing data can be assigned different colors of
gray
Both red/green and blue/yellow schemes can be
used
You can elect to view spot images

111
Clustering Visualization

Click on the image to get a dynamic display.
Click on one of the other options to see static
displays with or without the spot images.
Downloadable files (.cdt, .atr, .gtr, report) for
use with other tools (e.g., TreeView).

112
Clustering Cluster Image

Scale is indicated on the color bar
Gene names are at the right
Tree generated by hierarchical clustering is at
the left

113
Clustering Display Clustered Spot Images
114
Clustering DisplayAdjacent Cluster and
Clustered Spot Images
115
Clustering Display Hierarchical Cluster View
116
Clustering and Image Generation Summary