Title: Using Bayesian Networks to Analyze Expression Data
1Using Bayesian Networks to Analyze Expression Data
Nir Friedman Iftach Nachman Dana Peer Institute
of Computer Science, Hebrew University
2Outline
- Biological Background
- Bayesian Networks
- Analyzing Expression data
- Technical Aspects
- Experiment
- Conclusion and Future Work
3Part I Biological Background
- DNA is a double-stranded molecule
- Hereditary information is encoded
- Complementation rules
- Gene is a segment of DNA
- Contain the information required
- to make a protein
4Part I Biological Background
- Gene Expression refers to the processes involved
in converting genetic information from a DNA
sequence into an amino acid sequence, or protein. - The processes
(gene)
5Part I Biological Background
- Transcriptionthe focus of this research
the transfer of the genetic information from DNA
to messenger RNA (mRNA), a complementary copy of
the gene.
6Part I Biological Background
- Each gene encodes a protein and proteins are the
functional units of life
- Every gene is present in every cell, but only a
fraction of the genes are expressed at any time
- Many diseases result from the interaction
between genes
- Understanding the mechanisms that determine
which genes are expressed, and when they are
expressed, is the key to the development of new
treatments of diseases
7Part I Biological Background
- Why some genes are expressed while others not, or
having different expression level?
- gene expressions are not independent
- interactions between genes exist
e.g. the expression of gene A promotes the
expression of gene B
e.g. the expression of gene C inhibits the
expression of gene D
- interactions can be complicated
8Part I Biological Background
- Traditionally experimental means
- ?inefficient and
insufficient - Newly developed technique ?DNA Microarray
-
Make possible measure and compare quantatively
the expression level of tens of thousands of
genes in cells in a single experiment.
9Part II Bayesian Networks
- Prior work Clustering of expression data
- Groups together genes with similar expression
pattern
- Disadvantage does not reveal structural
relations between genes
- Extract meaningful information from the
expression data
- Discover interactions between genes based on the
measurements
10Part II Bayesian Networks
11Part II Bayesian Networks
- A Bayesian Network (BN) is a graphical
representation of a probability distribution
- Compact intuitive representation
- Useful for describing processes composed of
locally interacting components
- Have a good statistical foundation
- Efficient model learning algorithm
- Capture causal relationships
12Part II Bayesian Networks
- Why is it suitable for this problem?
- Gene expression is an inherently stochastic
phenomenon
- To capture the nature of interactions between
genes especially the
causal connection
A
- Microarray techniques are associated with
missing and noisy data values
B
13Part III Analyzing Expression Data
- Practical problem Small data sets
- variables hundreds of or thousands of genes
- samples just tens of microarray experiments
- On the positive side, genetic regulation networks
are sparse!!!
- Characterize and learn features that are common
to most of these networks
14Part III Analyzing Expression Data
- The first feature Markov relations
- Symmetric relation Y is in Xs Markov blanket
iff there is either an edge between them, or
both are parents of another variable (Pearl 98).
- Biological interpretation a Markov relation
indicates that the two genes are related in some
joint biological interaction or process
15Part III Analyzing Expression Data
- The second feature order relations
- Global property A is an ancestor of B in all
the equivalent Bayesian networks learned
- Biological interpretation an order relation
indicates that the transcription of one gene is a
direct cause of the transcription of another gene
A
B
16Part IV Technical Aspects
- Learning algorithm induce network structure
- Sparse Candidate Algorithm.
- Feature estimate extract useful features
- A Bootstrap Approach.
17Part IV Technical Aspects
- Sparse Candidate Algorithm
- An heuristic, iterative approach
- Identify a relatively small number of candidate
parents for each variable (gene) based on simple
local statistics at each iteration (Cin)
PaGn(Xi) Cin
Score(Xi , PaGn-1(Xi)?Xj D ) Score(Xi ,
PaGn-1(Xi) D)
18Part IV Technical Aspects
- Generate perturbed versions of original data
set, and learn from them
Resample with replacement N instances from D
(Di)
Learn on Di to induce a network structure Gi
- For each feature f of interest calculate
conf(f) Si1mf(Gi)/m f(Gi) 1 if f is a feature
in Gi
19Part V Experiment
- Induce Bayesian Networks for 250 yeast genes from
76 Microarray measurements
- Analyze features in the networks
20Part V Experiment
- The map left is an example of Markov relation
features for gene SVS1. - The width of edges corresponds to the confidence.
21Part V Experiment
- List of top Markov relation
22Part VI Conclusions
- Biological motivation
- The develop of microarray technology asks for
methodologies that are both statistically sound
and computationally tractable for analyzing data
sets and inferring biological interactions from
them - Advantages of Bayesian Network models
- Can describe local interaction components
- Can Reveal the structure of the transcription
regulation process - Provide clear methodologies for learning from
- Can Deal with uncompleted data sets
23Part VI Future Work
- Incorporate biological knowledge as a prior
- Model the condition attributes into the network,
such as temporal indicators, background variable
and exogenous cellular conditions, etc. - Learn from continuous data
- Combine Bayesian methods with clustering
algorithms to learn models over clustered genes
24VII Some Useful Reference
- A Brief Introduction to Graphical Models and
Bayesian Networks - DNA Microarray
- Project description
- http//www.ai.mit.edu/murphyk/Bayes/Bayes.html
- http//www.gene-chips.com/
- http//genome-www.stanford.edu/cellcycle/
25The End
26Complementation rules
A(adanine) C(cytosine) T(thymine)
G(guanine)
C
T
C
A
A
T
T
G
A
G
C
G
27DNA Microarray (informal, intuitive)
(gene)
experimental
controlled
Referential cDNA
28DNA Microarray (conti)
each slot corresponds to one gene
all genes to be studied are present