Title: Genic activity and pathway networks Embedding biological knowledge in genomic statistical analysis d
1Genic activity and pathway networks Embedding
biological knowledge in genomic statistical
analysisdaniel.remondini_at_unibo.it
2Collaborations
Brown University Brain Research Center Genomic
Proteomic Center Applied Math
Bologna University Dip. Fisica, CIG ISS -
Roma Unilever Research Center, UK ITB CNR Milano
3Our group
- Systems Biology (Genomics, IS, NS)
- EMF effects in biological tissues/cells/organisms
- F. Bersani - Physics
- G. Castellani - Physics
- M. Francesconi - Biotechnology
- P. Mesirca - Physics
- G. Procopio - Physics
- D. Remondini - Physics
- I. Zironi - Biology
4Typical genomic experiment
- Comparison between two sample (cell) "states"
(one is the normal/basal state) - 1) sample preparation
- 2) microarray hybridization
- 3) statistical analysis
5Genomic analysis drawbacks
- single gene analysis is not sufficient to
understand cell mechanisms undergoing
experimental conditions - cell behaviour is a complex phenomenon several
elements (e.g. genes) act together in order to
generate it
6Single Gene Statistical Analysis
- Statistical significance is not necessarily
related to biological significance because - the most interesting genes often show the
slightest variations in expression (low
statistical significance) - they often are poorly expressed at an absolute
value (where the experimental signal is more
noisy) - this also contributes to low significance
7"typical" microarray picture
8Introducing biological knowledge
- Gene significance must also rely on the known
role that the gene has in cellular mechanisms - Single gene statistical analysis must be
integrated with a priori higher level biological
knowledge - -gt known biological pathways (e.g. KEGG)
9Improvements
- Robustness of statistical analysis is increased,
since each pathway contains many genes. - Single-gene significance threshold can be
lowered, due to a following higher-level
significance filtering - A priori knowledge can be exploited for further
analysis/comparison
10Pathway Network Structure
- Libraries like KEGG have also an interesting
network structure it is possible that
biologically relevant informations can be
retrieved from the topological structure of nodes
(pathways) and edges (common genes between two
pathways) - Topologically relevant edges can be focal areas
from which biological messages are spread
throughout the network (like hubs for the nodes)
11KEGG database
- Kyoto Encyclopedia of Genes and Genomes
- Gene relationships (interactions/pathway
clustering) derive from knowledge of
biochemical/physical interactions - Several organisms Human, Mouse, Rat,
- Only a portion of whole genome is embedded in
KEGG (continuously updated)
12KEGGPathway Network (Mouse)
104 probes 102 pathways
13KEGG (human) betweenness centrality
14Analysis pipeline
- 1) single gene statistical analysis
- 2) integration with a priori biological knowledge
(KEGG pathway network) - 3) larger scale (global) analysis
- pathway statistical significance
- pathway network structure (nodes and/or edges)
15Pathway significance analysis
- Node (pathway) or edge (intersection)
significance analysis - total number of genes represented in KEGG and
total number of statistically significant genes
compared with the significant genes found in a
node or edge and its total number of elements - hypergeometric distribution-based test
16Pathway network analysis
- Given staistically significant nodes and edges,
the significant pathway network can be
reconstructed. - Edges and nodes can be ranked based on their
centrality in the network (e.g. connectivity
degree or betweenness)
17Betweenness Centrality BC
- BC of a network element (node/edge)
- sum (over al i,j) of shortest paths passing
through the network element (node/edge)
connecting element i and j, with respect to the
total number of shortest paths connecting i and j
18Betweenness Centrality
- BC is a very interesting parameter because
- - it can be calculated both for nodes and edges
- - it is a measure of the possible information
flow through that element, thus if it is affected
by experimental conditions it is very likely that
such perturbation can spread to the whole system
more easily
19Case studies
- 1) c-myc induction (rat fibroblast cell line)
- 2) TAC induction (mouse heart cells)
- 3) Lifespan dataset (human lymphocytes)
- 4) Ewing Sarcoma dataset (human)
20c-Myc-triggered gene expression
- C-Myc encode for transcriptional regulators whose
inappropriate expression is correlated with a
wide array of human malignancies. - Up-regulation of Myc enforces growth, antagonizes
cell cycle withdrawal and differentiation, and in
some situations promotes apoptosis. - c-myc-/- cells reconstituted with the
conditionally active, tamoxifen-specific
c-Myc-estrogen receptor fusion protein (MycER)
allows the fine and selective change of of c-Myc
activity by Tamoxifen . - Time series data 5 time points in triplicate
(9000 probes) - J.M. Sedivy lab (Brown Univ. USA) OConnel et
al JBC 2004 -
21Network representation
- Significantly underrepresented (-1)
- Significantly overrepresented 1
- Not significant 0
22c-Myc off
23c-Myc on
24Is BC ranking affected by KEGG network structure?
Cmyc ON vs. KEGG
25Gene expression time correlation
c-Myc off
c-Myc on
Remondini et al, PNAS 2005
26Animal model (mouse) of Left Ventricular
Hypertrophy (LVH) induced by Transverse Aortic
Constriction (TAC).
- Time series experimental design
- 15 Affymetrix chips at T10, T22, T34 weeks
after TAC. - Each time point have been repeated with 5
replicas -
27TAC gene expression
28TAC 2 weeks
29TAC 4 weeks
30Lifespan Experiment
- Lymphocite mRNA extracted from 25 samples, age
25-93 (20000 probes) - 5 age groups with 5 samples each ANOVA
- Custom arrays (Unilever labs, UK)
31Lifespan gene expression
32Time correlation
33Lifespan KEGG network
34Pathway ranking by BC
- 1 PPAR SigPath 26 Apoptosis
- 2 Adipocytokine SigPath 27 Carbon fixation
- 3 Inositol phosphate Met 28 Colorectal cancer
- 4 Jak-STAT SigPath 29 Glutathione metabolism
- 5 Phosphatidylinositol SigSyst 30 g-ExaCloCE Degr
- 6 Purine metabolism 31 Antigen ProcAndPres
- 7 Glyo and Dicarbo xylate Met 32 Cyanoamino Ac
Met - 8 Cysteine metabolism 33 Gap junction
- 9 B cell receptor SigPath 34 Taur HypoTaur Met
- 10 Glycolysis-Gluconeogenesis 35 ALA-ASP Met
- Styrene degradation 36 Leuk tr-emigration
- 12 Long-term depression 37 Atrazine Deg
-
35Ewing Sarcoma Dataset
About 30 human samples of Ewing Sarcoma that
responded positively to therapy or
not Affymetrix absolute arrays U133_Plus_2
(about 50000 probes)
36Ewing Sarcoma Dataset
37Top 30 pathwaysranked by BC
Red overexpressed Grn - underexpressed
38DIGRESSIONE STATO BASALE (GENE EXPRESSION) DI
UNA CELLULA
- Modello di interazione genica spin glass/boolean
net - Ruolo della struttura delle interazioni
(complessa)
39DIGRESSIONE STATO BASALE (GENE EXPRESSION) DI
UNA CELLULA
- Frustrazione? struttura ground state(s) bacini.
- Esiste una temperatura? (transregulation noise)
- Ruolo delle perturbazioni esterne (site-specific)