Title: Systems Biology through Pathway Statistics
1Systems Biology throughPathway Statistics
- Chris Evelo
- BiGCaT Bioinformatics Group BMT-TU/e UM
- Diepenbeek May 14 2004
2Where the cat hunts
BiGCaT Bioinformatics
3BiGCaT Bioinformatics, bridge between two
universities
TU/eIdeas Experience in Data Handling
Universiteit Maastricht Patients,
Experiments,Arrays and Loads of Data
BiGCaT
LUC DiepenbeekStatistical Foundations
4BiGCaT Bioinformatics,between two research fields
Nutritional EnvironmentalResearch
CardiovascularResearch
BiGCaT
5Our usual preygene expression arrays
- Microarrays relative fluorescense signals.
Identification.
Macroarrays absolute radioactive signal.
Validation.
6Transcriptomics
- The study of genome wide gene
- expression on the transcriptional level
- Where genome wide means gt20K genes.
- And transcriptional level means that somehow
gt20K mRNA sequences have to be analyzed - And gt20K expression values have to befiltered,
normalized, replicate treated,clustered and
understood - Thus no transcriptomics without bioinformatics
7No separate statistics?
- Previous slide have to be filtered,
normalized, replicate treated, clustered and
understood - Dont we have to know which genes really changed?
8Changed?
- We need statistical prove of genes changing
because - Scientist ask for it.
- Journals ask for it.
- But do we really need it?
9No we dont!
- Biologist will double check anyway
- Largest problem are false positives 1 in 1000
means 20 on an array!Replicate filtering gets
rid of that, loosing very little power off
course that needed statistical proof - To understand we need pathways not single genes
(or proteins)
10Two types of arrays
Single longer (gt60 mer) cDNA reporters Agilent,
Incyte,custom 1 value per reporter Reference
variabilityor multi array stats
Multi short(25 mer) oligoreporters Affymetrix
16-20 values perreporterSingle array
statistics
11Systems Biology Triangle
Transcriptomics
microarrays, 20 k (available)
SystemsBiology
Large scale analytical chemistry (developing
outside)
2D-gels, antibody techniques(developing inside)
Metabolomics
Proteomics
12Proteomics would be
- The study of genome wide gene expression
- on the translational level
- Where genome wide would mean gt20K proteins.
- Then proteomics does not yet exist!
13Protein variants derived from single genes
Alternative splicing?
Phosphorylation?
Alternative splicing? Modification?
Phosphorylation? Modification?
14Two types of omics
Transcriptomics Microarrays Values for 20 K
genes Annotation difficult
Proteomics Currently only 2DMS Only
20-50identified proteins Annotationis
identification Plus modifications
15Gene Ontology (GO) levels (I)
The Gene Ontology (GO) project gives a consistent
descriptions of gene products from different
databases.
Amigo browser http//www.godatabase.org/cgi-bin/go
.cgi GO consortium http//www.geneontology.org
16Gene Ontology (GO) levels (II)
17Use of GO classification-GenMAPP-
- GenMAPP Gene MicroArray Pathway Profiler
- Program to visualize Gene Expression Data on
MAPPs representing biological pathways and
grouping of genes - Local MAPPs
- contain pathways made by specific research
institutes - Gene Ontology (GO) MAPPS
- contain pathways with functionally related
genes from the public Gene Ontology Project
18Example Local MAPP
19Example GO MAPP
20Local MAPP
21GO MAPP
22Understanding changes
- Map changed genes/proteins (quantitatively or
qualitatively) to known pathways. - Or use information from the Gene Ontology (GO)
database - Steal and smartly adapt a transcriptomics tool
- GenMapp/Mappfinder
- Rachel will show some examples