Title: Objective Bayesian Nets for Integrating Cancer Knowledge
1Objective Bayesian Nets for Integrating Cancer
Knowledge
Sylvia Nagl PhD Cancer Systems Biology
Biomedical Informatics UCL London
2caOBNET Overview
- Knowledge integration by objective Bayesian
networks (obNETS) - Maximum entropy method
- An integrated clinico-genomic obNET for breast
cancer - Conclusions
3Bayesian networks
- Graphical models
- directed and acyclic graph (DAG)
- Joint multivariate probability distribution
-
- with conditional independencies between
variables - Given the data, optimal network topology can be
estimated - heuristic search algorithms and scoring
criteria - Statistical significance of edge strengths
- Bayesian methods
- bootstrapping
Apolipoprotein E gene SNPs and plasma apoE level
Rodin Boerwinkle 2005
4Knowledge integration
- Cancer treatment decisions should be based on all
available knowledge - Knowledge is complex and varied
- Patient's symptoms, expert knowledge, clinical
databases relating to past patients, molecular
databases, scientific papers, medical informatics
systems - Generated by independent studies with
- diverse protocols
5Knowledge integration
- Diverse data types
- Genomic, transcriptomic, proteomic, SNPs, tissue
microarray, histopathology, clinical etc. - New data types, e.g., epigenetic data
- All data types capture different characteristics
of a dynamic complex system - At different spatial and temporal scales
- Cell, tumour, patient, and therapeutic system of
patient-therapy interactions - How can this disparate data be used for an
integrated understanding on which to base our
actions?
6Objective Bayesianism
- Data and knowledge impinge on belief we try to
find a coherent set of beliefs with best fit - Beliefs based on undefeated items of knowledge
- In case of conflict, try to find compromise
beliefs - Objective Bayesianism offers a formalism for
determining the beliefs that best fit background
knowledge - Applying Bayesian theory, an agents degree of
belief should be representable by a probability
function p - Empirical knowledge imposes quantitative
constraints on p - Represented in an obNET (learnt from database)
7obNETS for prediction
- Standard algorithms can be used to calculate the
probability of a specific outcome - A direct link between variables may suggest a
causal connection
8Bayesian networks
- Can BNs be integrated?
- Spanning genetic/molecular and clinical levels
- obNETS offer a principled path to knowledge
integration
9Maximum entropy principle
- Adopt p, from all those that satisfy the
constraints, that are maximally equivocal - Williamson, J.(2002) Maximising Entropy
Efficiently. - Williamson, J. (2005a) Bayesian Nets and
Causality. - Williamson, J. (2005b) Objective Bayesian nets.
- www.kent.ac.uk/secl/philosophy/jw/
10Example
- Two items of empirical knowledge may conflict
- Study 1 Cancer will recur in 50 of patients
with given set of characteristics - Degree of belief in recurrence in individual
patient 0.5 - Study 2 Frequency of recurrence is 30
- Degree of belief will be constrained to closed
interval 0.3,0.5 - In general
- Belief function will lie within a closed set of
probability functions - There will be a unique function that maximises
entropy
11obNet integration
12obNet integration
Original obNETs provide probability distributions
13obNET integration
14obNET integration
15obNET integration
16obNET integration
Maximum entropy principle If CPTs for merged
nodes disagree on probabilities, assign closed
interval and take least committal value in that
range
17obNET integration Proof of principle
- Two obNETs from breast cancer knowledge domain
- Genomic Comparative genome hybridisation (CGH)
data - progenetix database - Subset of bands with 3 or more genes implicated
in tumour progression and response to cytotoxic
therapies (28 bands) - Clinical American Surveillance, Epidemiology and
End results (SEER) database
18Clinical and genomic nets (Hugin 6.6)
SEER database 4731 cases progenetix
database 28 bands/502 cases
?
19obNet integration
obNet learnt from 2nd progenetix dataset - 119
cases with clinical annotation (lymph node
status, tumour size, grade)
CPT
22q12 -1 0 1 LN0 0.148 0.5 0.148
1 0.852 0.5 0.852
20Additional empirical knowledge
chr. 22
Fridlyand et al. 2006
21obNet integration
chr. 22
Fridlyand et al. 2006
CPT
22obNet integration
chr. 22
Fridlyand et al. 2006
CPT
23Metastasis-associated genes
KREMEN1 MYH9
cadherin11
CD97
BMP7, ELMO2, BCAS1, BCAS4, ZNF217
24KREMEN1
Howard et al., 2003
Biological knowledge suggests possible causal
link (in context of whole obNET HR status!)
25Knowledge integration
Multi-scale obNETs
Cancer clinical data epidemiology
Translation of clinical data to genomics research
Predictive markers
Molecular profiling of tumours
26Acknowledgements
- Jon Williamson (Philosophy, Unversity of Kent)
- www.kent.ac.uk/secl/philosophy/jw/
- Matt Williams (Cancer Research UK)
- Nadjet El-Mehidi (Cancer Systems Biology, UCL)
- Vivek Patkar (Cancer Research UK)
- Contact s.nagl_at_ucl.ac.uk
27obNET integration Proof of principle
- Two obNETs
- Non-independent rearrangements at chromosomal
locations in breast cancer from comparative
genome hybridisation (CGH) data - progenetix
database - Subset of bands with 3 or more genes implicated
in tumour progression and response to cytotoxic
therapies (28 bands) - Probabilistic dependencies between clinical
parameters from the American Surveillance,
Epidemiology and End results (SEER) database
28HR status link
29Genomic systems
- Genomes are dynamic molecular systems
- Selection acts on unstable cancer genomes as
integrated wholes, not just on individual
oncogenes or tumour suppressors. - A multitude of ways to solve the problems of
achieving a survival advantage in cancer cells - Irreversible evolutionary processes
- Randomness of mutation
- Modularity and redundancy of complex systems
30Genome-wide rearrangements
- Can we identify probabilistic dependency networks
in large sample sets of genomic data from
individual tumours? - If so, under which conditions may these be
interpreted as causal networks? - Can we identify probabilistic dependency networks
involving molecular and clinical levels?
31Systems Biology and Causation
- Profound conceptual challenge regarding physical
causation in complex biological systems - Mutual dependence of physical causes
- The biological relevance of any factor, and
therefore the information it conveys, is
jointly determined, frequently in a statistically
interactive fashion, by that factor and the
system state (Susan Oyama, The Ontogeny of
Information, 2000) - The influence of a gene, or a genetic mutation,
depends on the context, such as availability of
other molecular agents and the state of the
biological system, including the rest of the
genome
32System state
agents
Cell networks are dynamically instantiated
genes for components are switched on or off in
response to signals and cell state
33System state
Cell networks are reconfigured in response to
changes in environment or cells internal state
34System state
Cell computation networks are reconfigured in
response to changes in environment or cells
internal state
35Cancer Genome instability re-programs cell
networks
Selection for increased proliferation,
resistance, invasiveness etc. Driven by tumour
cell tissue interactions
36Genome-wide rearrangements
- Can we identify probabilistic dependency networks
in large sample sets of genomic data from
individual tumours? - Can we identify probabilistic dependency networks
involving molecular and clinical levels?
37Proof of principle
- Screen the whole genome for chromosomal
abnormalities in one experiment - Cytogenetics
- Comparative genomic hybridization (CGH)
- Fluorescence in situ hybridization (FISH) and
multicolour fluorescence in situ hybridization
(MFISH) - Detection of allelic instabilities, loss of
heterozygosity (LOH)