Title: GEPAS
1GEPAS
- Microarray Data Analysis Differential Gene
Expression
Edinburg, October 2008 Ana Conesa aconesa_at_cipf.es
http//bioinfo.cipf.es/aconesa Bioinformatics
and Genomics Department Centro de Investigacion
Principe Felipe (CIPF)? (Valencia, Spain)? imb
2Analysis of Differential Gene Expression
3Data
- Gene normalized intensities
- Gene names or gene identifiers
- Microarray information
- In an independent text file
- Using special lines and reserved words within the
text file containing gene intensity measurements
4Data entry
File with ID and intensity measurements
Class file
Intensity file with special line for microarray
class
Text files tab delimited
5Data entry
- Reserved words
- NAMES
- CLASS
- INDEPENDENT_VARIABLE
- TIME_VARIABLE
- CENSORING_VARIABLE
- CONTIN
- SERIES
- Comment lines
- Better do not use empty lines
6Results
Parameter estimates, p-values, adjusted
p-values, posterior probabilities
Raw data
Output file Ordered genes
Graphic display of results Ordered genes
Redirect output to other GEPAS tools
7Results
Ordered genes and ordered arrays
- We perform one hypothesis test for each gene
- There is an increased chance of finding false
positives - We need to adjust p-values to control
- FWER (family-wise error rate)?
- FDR (false discovery rate)?
8Two class comparison
We can rank the genes according to a
straightforward biological meaning
9Two class comparison
- t-test
- data-adaptive statistic
- Empirical Bayes (hierarchical mixture model)?
- CLEAR test
10t test for a gene expression
For each gene, we check if its mean expression is
equal or different across the two classes
Null hypothesis the mean expression is equal in
both groups.
Alternative hypothesis the mean expression is
different between the groups.
Mean in group 1
Mean in group 2
Test Statistic
Estimation of the variability of the differences
11p - value
Under the null hypothesis
Frequency histogram
12p - value
Under the null hypothesis
Frequency histogram
13p - value
Under the null hypothesis
Frequency histogram
p-value (area)?
Distribution function
Rejection region
Confidence region
14p - value
- If we reject when p-value lt 0.05 there is a 5
chance of getting a false positive - On average
- If you test 100 hypotheses 5 will be false
positives (appear significantly wrong)? - If you test 10000 hypotheses 500 will appear as
false positives - Multiple testing correction is needed
15Multi-class comparison
It is not clear how to arrange genes by their
pattern across classes
16Multi-class comparison
17Gene expression related to a continuous variable
Expression data
Continuous Variable INDEPENDENT_VARIABLE
Assessing linear relationships
18Regression
19Regression
gene1 slope
20Regression
gene1 slope
21Regression
gene2 slope gene3 slope
gene1 slope
22Correlation
- Pearson correlation coefficient
- Spearman correlation coefficient
- Linear regression
Arrays ranked according to the independent
variable
Genes ranked by correlation to the continuous
variable
23Survival data
Expression
Survival times Censoring indicator TIME_VARIABL
E CENSORING_VARIABLE
Cox proportional hazards regression model
24Survival data
- Cox model coefficients
- Estimate for the statistics
- p-values
Arrays ranked according to the survival time
Genes ranked by their relationship with survival
times
25Time course analysis / Dose analysis
Expression data
Expression data
Complexity of the model
- Time variable and series classification
- CONTIN
- SERIES
Clustering
26Time course analysis / Dose analysis
maSigPro method
Polonomial Regression model on time, treatments
and time.vs.treatment Result pvalue for time
change (slope, curvature of curves)? treatmen
t changes (different intercepts)? t vs. T
interactions (diferent evolutions)? clustering
according to patterns
27Time course analysis / Dose analysis
28Redirecting to Babelomics
29The End
www.gepas.org