GEPAS

About This Presentation

Transcript and Presenter's Notes

Title: GEPAS

1
GEPAS

Microarray Data Analysis Differential Gene
Expression

Edinburg, October 2008 Ana Conesa aconesa_at_cipf.es
http//bioinfo.cipf.es/aconesa Bioinformatics
and Genomics Department Centro de Investigacion
Principe Felipe (CIPF)? (Valencia, Spain)? imb
2
Analysis of Differential Gene Expression
3
Data

Gene normalized intensities
Gene names or gene identifiers
Microarray information
In an independent text file
Using special lines and reserved words within the
text file containing gene intensity measurements

4
Data entry
File with ID and intensity measurements
Class file
Intensity file with special line for microarray
class
Text files tab delimited
5
Data entry

Reserved words
NAMES
CLASS
INDEPENDENT_VARIABLE
TIME_VARIABLE
CENSORING_VARIABLE
CONTIN
SERIES
Comment lines
Better do not use empty lines

6
Results
Parameter estimates, p-values, adjusted
p-values, posterior probabilities
Raw data
Output file Ordered genes
Graphic display of results Ordered genes
Redirect output to other GEPAS tools
7
Results
Ordered genes and ordered arrays

We perform one hypothesis test for each gene
There is an increased chance of finding false
positives
We need to adjust p-values to control
FWER (family-wise error rate)?
FDR (false discovery rate)?

8
Two class comparison
We can rank the genes according to a
straightforward biological meaning
9
Two class comparison

t-test
data-adaptive statistic
Empirical Bayes (hierarchical mixture model)?
CLEAR test

10
t test for a gene expression
For each gene, we check if its mean expression is
equal or different across the two classes
Null hypothesis the mean expression is equal in
both groups.
Alternative hypothesis the mean expression is
different between the groups.
Mean in group 1
Mean in group 2
Test Statistic
Estimation of the variability of the differences
11
p - value
Under the null hypothesis
Frequency histogram
12
p - value
Under the null hypothesis
Frequency histogram
13
p - value
Under the null hypothesis
Frequency histogram
p-value (area)?
Distribution function
Rejection region
Confidence region
14
p - value

If we reject when p-value lt 0.05 there is a 5
chance of getting a false positive
On average
If you test 100 hypotheses 5 will be false
positives (appear significantly wrong)?
If you test 10000 hypotheses 500 will appear as
false positives
Multiple testing correction is needed

15
Multi-class comparison
It is not clear how to arrange genes by their
pattern across classes
16
Multi-class comparison

ANOVA
CLEAR

17
Gene expression related to a continuous variable
Expression data
Continuous Variable INDEPENDENT_VARIABLE
Assessing linear relationships
18
Regression
19
Regression
gene1 slope
20
Regression
gene1 slope
21
Regression
gene2 slope gene3 slope
gene1 slope
22
Correlation

Pearson correlation coefficient
Spearman correlation coefficient
Linear regression

Arrays ranked according to the independent
variable
Genes ranked by correlation to the continuous
variable
23
Survival data
Expression
Survival times Censoring indicator TIME_VARIABL
E CENSORING_VARIABLE
Cox proportional hazards regression model
24
Survival data

Cox model coefficients
Estimate for the statistics
p-values

Arrays ranked according to the survival time
Genes ranked by their relationship with survival
times
25
Time course analysis / Dose analysis
Expression data
Expression data
Complexity of the model

Time variable and series classification
CONTIN
SERIES

Clustering
26
Time course analysis / Dose analysis
maSigPro method
Polonomial Regression model on time, treatments
and time.vs.treatment Result pvalue for time
change (slope, curvature of curves)? treatmen
t changes (different intercepts)? t vs. T
interactions (diferent evolutions)? clustering
according to patterns
27
Time course analysis / Dose analysis
28
Redirecting to Babelomics
29
The End
www.gepas.org

Write a Comment

User Comments (0)

About PowerShow.com

GEPAS PowerPoint PPT Presentation