Visualization of Multivariate Data - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Visualization of Multivariate Data

Description:

Max Planck Institute for Molecular Genetics ... lobular. 2. 4. Patient 2. mutant. ductal. 2. 3. Patient 1. p53. ER. Histo. Grade. Met. Node. Stage ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 27
Provided by: stei6
Category:

less

Transcript and Presenter's Notes

Title: Visualization of Multivariate Data


1
Visualization of Multivariate Data Christine
Steinhoff Max Planck Institute for Molecular
Genetics Berlin, Germany
2
Outline
Motivation DATA INTEGRATION
Data types EXPRESSION aCGH Patients
Information Problems DISTRIBUTION, SCALE
Procedure DISCRETIZATION FILTERING
INDICATOR MATRIX MCA TOWARDS DISTANCE
DEFINITION
Results
3
Data Sources
4
DATA INTEGRATION
Patients Covariates Information on Patients
under study
5
Gene x3 Loss
DATA INTEGRATION
Gene x1 Overexpressed
Gene x2 Amplified
Gene x4 Overexpressed
6
PROBLEMS
Discrete categories
After appropriate normalization Approx
lognormal symmetric
Not symmetric skew
Scale and Distribution differ!
7
Data INPUT
Procedure
Discretization
Filtering
Indicator coding
Multiple Correspondence Analysis
8
Step 1 Discretization
Patients covariates
arrayCGH
Expression
Categorical e.g. Staging Grading Smoking Mutatio
n ....
9
Step 1 Discretization
arrayCGH
Expression
For example CBS Package DNAcopy Segmentation
and discretization of arrayCGH data
For example Fold Change Criterion
10
Step 1 Discretization
Patients covariates
arrayCGH
Expression
Typically n23,000 -gt reduce number
11
Step 2 Filtering (optional)
  • Suggestion
  • Neglect all genes with no change in any patient
  • Select for high Correlation between arrayCGH and
    expression

12
Step 3 Indicator Matrix - Binary Coding
Indicator matrix With binary coding
Original matrix With categories
13
Step 3 Indicator Matrix - Binary Coding
Indicator matrix With binary coding
Original matrix With categories
14
Step 4 Appending Matrices
A
E
P
Experimental
SupplementalCovariates
15
Multiple Correspondence Analysis with
supplementary Information
16
Multiple Correspondence Analysis
Gene 251 state 1
G1 (-1) G1 (0) G1 (1) G2 (-1) ...
G1 (-1) G1 (0) G1 (1) G2 (-1) ...
t(E)E
t(E)A
t(A)E
t(A)A
17
Multiple Correspondence Analysis
How to read Gene states cluster according to
- Distance from origin - Angle
18
Patients Information
19
Towards Distance Definition
  • Determine
  • Angle
  • Vector length
  • - Select genes according to a predefined angle
  • Or
  • - Select genes according to angle and length

a
20
EXAMPLE PUBLISHED DATA
21
EXAMPLE PUBLISHED DATA
22
EXAMPLE PUBLISHED DATA
P0.006
23
EXAMPLE PUBLISHED DATA
P0.005
P0.008
24
SUMMARY
Pipeline for joint visualization of (a)
experimental continuous data e.g. arrayCGH and
expression data (b) Patients covariates
Application Data set parallel investigation of
arrayCGH and expression in breast cancer
patients covariate data available Determinati
on of candidate gene sets enrichment of specific
cancer related pathways
25
FURTHER DIRECTIONS AND OPEN QUESTIONS
  • Integration of variable datasources
  • Appropriate discretization methods
  • Avoid filtering by choosing algorithm for
    decomposition of sparse matrices
  • Evaluation scheme (problem of simulation and
    noise adding)
  • Appropriate comparision with Berger et al
    approach on continuous data
  • (no implementation of patients covariates)
  • ...

26
ACKNOWLEDGEMENT
Sensor Lab, CNR-INFM
Max Planck Institute for Molecular Genetics
Martin Vingron
Matteo Pardo
Write a Comment
User Comments (0)
About PowerShow.com