Visualization of Multivariate Data - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Visualization of Multivariate Data

Description:

Max Planck Institute for Molecular Genetics. Visualization of ... Trends Genet. 1991. Affymetrix. Cy3/Cy5 Slides. Schena M, Schalon D, David RW, Brown PO ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 44
Provided by: stei6
Category:

less

Transcript and Presenter's Notes

Title: Visualization of Multivariate Data


1
Visualization of Multivariate Data Christine
Steinhoff Max Planck Institute for Molecular
Genetics Berlin, Germany
2
Outline
Motivation DATA INTEGRATION
Data types EXPRESSION aCGH Patients
Information Problems DISTRIBUTION, SCALE
Procedure DISCRETIZATION FILTERING
INDICATOR MATRIX MCA TOWARDS DISTANCE
DEFINITION
Results
3
Data Sources
4
... Basic output Microarrays
Affymetrix
x(i,j) real value
Idea x(gene i,slide j) should be correlated in
some way with the number of mRNA molecules of
sequence gene i in the probe of slide j
5
... What does arrayCGH aim at?
x(i,j) real value
Idea x(gene i,slide j) should be correlated in
some way with the number of DNA copies of
sequence gene i in the probe of slide j
6
ArrayCGH
Intuitive Idea just take the same chip but put
DNA extract instead of mRNA onto it! -gt many
more sophisticated methods have been
developed Even though the concept is very
similar there are profound differences
7
DATA INTEGRATION
Patients Covariates Information on Patients
under study
8
PROBLEMS
Discrete categories
After appropriate normalization Approx
lognormal symmetric
Not symmetric skew
Scale and Distribution differ!
9
Motivation
10
Gene x3 Loss
The AIM
Gene x1 Overexpressed
Gene x2 Amplified
Gene x4 Overexpressed
11
Approaches
Generalized Singular Value Decomposition
Samples
Berger et al Huang et al Jefferey et al
m
m
aCGH
Expr
n
p
Genes
Preprocessing Scale and Distribution
Transformation
12
Approaches
EV EV

Berger et al Huang et al Jefferey et al
i
The columns of X are the generalized singular
vectors of R
13
Approaches
  • Problems
  • Scaling, Distribution transformation
  • Only two datasets
  • Does not allow for categorical variables

Berger et al Huang et al Jefferey et al
14
Data INPUT
Procedure
Discretization
Filtering
Indicator coding
Multiple Correspondence Analysis
15
Step 1 Discretization
Patients covariates
arrayCGH
Expression
Categorical e.g. Staging Grading Smoking Mutatio
n ....
16
Step 1 Discretization
arrayCGH
Expression
For example CBS Package DNAcopy Segmentation
and discretization of arrayCGH data
For example Fold Change Criterion
17
Step 1 Discretization
Patients covariates
arrayCGH
Expression
Typically n23,000 -gt reduce number
18
Step 2 Filtering (optional)
  • Suggestion
  • Neglect all genes with no change in any patient
  • Choose genes with highest Variance across
    patients
  • Select for high Correlation between arrayCGH and
    expression

19
Step 3 Indicator Matrix - Binary Coding
Indicator matrix With binary coding
Original matrix With categories
20
Step 3 Indicator Matrix - Binary Coding
Indicator matrix With binary coding
Original matrix With categories
21
Step 4 Appending Matrices
A
E
P
Experimental
SupplementalCovariates
22
Multiple Correspondence Analysis with
supplementary Information
23
Multiple Correspondence Analysis
Gene 251 state 1
G1 (-1) G1 (0) G1 (1) G2 (-1) ...
G1 (-1) G1 (0) G1 (1) G2 (-1) ...
t(E)E
t(E)A
t(A)E
t(A)A
24
Patients Information
25
EXAMPLE PUBLISHED DATA
26
Covariate States Display
27
Gene States Display
28
Towards Distance Definition
  • Determine
  • Angle
  • Vector length
  • - Select genes according to a predefined angle
  • Or
  • - Select genes according to angle and length

a
29
How to select candidate genes?
X1 angle X2 1/vector covariatestate X3
1/vector genestate -gt Minimize!
L2_w(x1,x2,x3) sqrt(w1x12 w2x22 w3x32)
30
  • How does the analysis compare with
  • Just acgh
  • Just expr
  • Joint analysis?

31
  • How does the analysis compare with
  • Just acgh
  • Just expr
  • Joint analysis?

32
Explore ERBB2 and MYC that have been found in
Berger et al.
ERBB2 Amplified in ACGH
ERBB2 normal in ACGH
ERBB2 overexpression
33
ERBB2 underexpr
ERBB2 loss in ACGH
34
MYC Overexpression
MYC amplification
35
MYC Normal acgh
MYC underexpression
36
Enrichment of GO Categories
37
SUMMARY
Pipeline for joint visualization of (a)
experimental continuous data e.g. arrayCGH and
expression data (b) Patients covariates
Application Data set parallel investigation of
arrayCGH and expression in breast cancer
patients covariate data available Determinati
on of candidate gene sets enrichment of specific
cancer related GO Categories
38
FURTHER DIRECTIONS AND OPEN QUESTIONS
  • Integration of variable datasources
  • Appropriate discretization methods
  • Avoid filtering by choosing algorithm for
    decomposition of sparse matrices
  • Evaluation scheme (problem of simulation and
    noise adding)
  • Investigation of Robustness
  • ...

39
ACKNOWLEDGEMENT
Sensor Lab, CNR-INFM
Max Planck Institute for Molecular Genetics
Martin Vingron
Matteo Pardo
40
Gene Expression Arrays Technology
  • Schena M, Schalon D,
  • David RW, Brown PO
  • Quantitative monitoring of gene
  • expression patterns with a
  • complementary DNA microarray. Science 1995

- Lennon GG Lehrach HH. Hybridization
analyses of arrayed cDNA libraries. Trends
Genet. 1991
Commercial 1998
Affymetrix
... In meanwhile several more!!!!
41
Gene Expression Arrays Technology
Affymetrix
42
Welche Technologieplattformen gibt es?
Hybridisierung
Affymetrix
Rot Grün
...AATGGGTCAGAAGGACTCCTATGTGGGTG...
TTACCCAGTCTTCCTGAGGATACACCCAC
TTACCCAGTCTTGCTGAGGATACACCCAC
43
... Some differences
Affymetrix
Rot Grün
  • - Nylon Filter
  • - eine Probe
  • radioaktives Signal
  • - viele Spots möglich
  • - große Fläche / lokale Effekte
  • - Überstrahlen
  • - nur eine Probe pro Hybri-
  • disierungsvorgang
  • - Glas Träger
  • - rote und grüne Probe
  • Floureszenz Signal
  • bis 20000 Spots möglich
  • - gleichzeitiges Hybridisieren
  • von Probe und Kontrolle
  • (rot/grün)
  • - Chip
  • - eine Probe bestehend aus
  • 16-20 Wdh. und zugehörigen
  • Mismatches
  • kommerzieller Chip
  • gute reproduzierbare Daten
  • nur eine Probe pro Hybridisierungs-vorgang
Write a Comment
User Comments (0)
About PowerShow.com