Title: Logical Analysis of Diffuse Large B Cell Lymphoma
1Logical Analysis of Diffuse Large B Cell
Lymphoma
- Gabriela Alexe1, Sorin Alexe1, David Axelrod2,
Peter Hammer1, and David Weissmann3 - of RUTCOR(1) and Department of Genetics(2),
Rutgers University and Robert Wood Johnson
Medical School(3)
2This Talk
- Lymphoma
- Gene Expression Level Analysis
- cDNA Microarray
- Applied to Diffuse Large B-Cell Lymphoma
- Logical Analysis of Data
- Discretization/Binarization
- Support Sets
- Pattern Generation
- Theories and Models
- Prediction
3 Lymphoma
4Lymphoma
- Cancer of lymphoid cells
- Clonal
- Uncontrolled growth
- Metastasis
- Lymphoma
- Diagnosis
- Grade
5Diffuse Large B Cell Lymphoma (DLBCL)
- 31 of non-Hodgkin lymphoma cases
- 50 long-term, disease-free survival
- Clinical variability
- Prognosis therapy
- IPI
- Morphology
- Gene expression
6Diffuse Large B Cell Lymphoma
7Spleen with Diffuse Large B Cell Lymphoma
8Gene Expression Level Analysis
9DNA-RNA Hybridization
10Gene Expression Profiling
cDNA microarray analysis
11DLBCL cDNA Microarray Analysis
- Distinct types of diffuse large B-cell lymphoma
identified by gene expression profiling,Alizadeh
et al., Nature, Vol 403, pp 503-511 - cDNA microarray data -gt unsupervised hierarchical
agglomerative clustering - Germinal center signature 76 survival at 5
years - Activated B cell signature 16 at 5 years
12DLBCL Clustering
Each case (patient) is a point in N-dimensional
space where N of genes
13DLBCL Survival by Type
14Supervised Learning Classification of DLBCL
- Diffuse large B-cell lymphoma prediction by
gene-expression profiling and supervised machine
learningShipp et al., Nature Medicine, vol 8, p
68-74 - Prognosis of DLBCL
- Highly correlated genes -gt weighted voting
algorithm
15Shipps 13 Gene Predictor
16Logical Analysis of Data
17Logical Analysis of Data (LAD)
- Non-statistical method based on
- Combinatorics
- Optimization
- Logic
- Based on dataset of cases/patients
- LAD learns patterns characteristic of classes
- Subsets of patients who are /- for a condition
- Collections of patterns are extensible
- Predictions
18The Problem Approximation of Hidden Function
Dataset
HiddenFunction
LAD Approximation
19Main Components of LAD
- Discretization/Binarization
- Support Sets
- Pattern Generation
- Theories and Models
- Prediction
20Discretization
Separating Cutpoints
Minimum Set of SeparatingCutpoints
21Cutpoints and Support Set
- Minimization is NP hard
- Numerous powerful methods
- Support set
- Cutpoints define a grid in which ideally no cell
contains both and cases - Cutpoints simplify data and decrease noise
22Patterns
- Examples
- Gene A gt 34 gene B lt 24 gene C lt 2
- Positive and negative patterns
- Pattern parameters
- Degree ( of conditions)
- Prevalence ( of /- cases that satisfy it)
- Homogeneity (proportion of /- cases among those
it covers) - Best low degree, large prevalence, high
homogeneity - Patterns are extensible!
23Pattern Generation
- Generate patterns based on learning set
- Stipulate control parameters. For example
- Degree 4
- - prevalences gt 70
- - homogeneities 100
- All 75 patterns in 1.2 seconds on Pentium IV 1 Gz
PC - Evaluate set
- Average of patterns covering each observation
- Accuracy applied to evaluation set
24Patterns Illustration
Negative Pattern
Positive Pattern
25Theories Approximations of the 2 Regions
A theory is a set of positive (or negative)
patterns such that every positive (or negative)
case is covered.
26Models
- A set of a positive and a negative theory
- A good model
- Small number of features (genes)
- Patterns are high quality
- Low degrees
- High prevalences
- High homogeneities
- Number of patterns is small
- Maximize their biologic interpretability
27Theories and Models
Unexplained Area
Positive Theory
Negative Theory
Model
Positive Area
Discordant Area
Negative Area
28LAD Prediction
- A new case a set of gene expression levels
- Satisfy some positive no negative?
- Satisfy some negative no positive ?
- Satisfy some of both?
- Which more?
- Does not satisfy any (rare)
298 Gene Classification Model
30Accuracy of Prognosis
31Conclusion
-
- Logical Analysis of Data (LAD ) a versatile new
classification method here applied to diagnosis
and prognosis of lymphoma. - LAD genes differ almost entirely from those
specified by other studies. - Genes not individually correlated with diagnosis
or prognosis but highly correlated in
combinations of as few as two genes. - Patterns suggest biologic pathways
- LAD provides highly accurate prognosis of DLBCL
32Contacts
- Gabriela Alexe galexe_at_us.ibm.com
- Soren Alexe salexe_at_rutcor.rutgers.edu
- David Axelrod axelrod_at_biology.rutgers.edu
- Peter Hammer hammer_at_rutcor.rutgers.edu
- David Weissmann weissmdj_at_umdnj.edu