Title: DNA chip Data Analysis with Bayesian Networks
1DNA chip Data Analysis with Bayesian Networks
- Introduction
- Part 1 Micro fabrication and Process Steps
- Part 2 Basic of Bayesian Networks
- Part 3 DNA data analysis with BN
- Conclusion
- Cmpe 530 Term Project
- Prepared by
- Ali Osman Sevim
2INTRODUCTION What is DNA chip?
- DNA microarray/genome chip/DNA chip is a
collection of microscopic DNA spots attached to a
solid surface forming an array for the purpose of
expression profiling, monitoring expression
levels for thousands of genes simultaneously.
Biology
Nanotechnology
Artificial Intelligence
DNA chips Nanobiotechnology
3WHY Gene-chips (micro-arrays)?
- Traditionally experimental concludes inefficient
and insufficient results - Make possible measure and compare quantitatively
the expression level of tens of thousands of
genes in cells in a single experiment. - High NRE cost of huge PCR machines as compared to
micro DNA analysers.
4Part 1 Fabrication
- Fabrication via Printing
- DNA sequence stuck to glass substrate
- DNA solution pre-synthesized in the lab
- Fabrication In Situ
- Sequence built
- Photolithographic techniques use light to release
capping chemicals - 365 nm light allows 20-?m resolution
5DNA Microarrays
- Each probe consists of thousands of strands of
identical nucleotides - The DNA sequences at each probe represent
important genes (or parts of genes) - Printing Systems
- Ex HP, Corning Inc.
- Printing systems can build lengths of DNA up to
60 nucleotides long - 1.28 x 1.28 cm glass wafer
- Each print head has a 100 ?m diameter and are
separated by 100 ?m. (? 5,000 20,000 probes) - Photolithographic Chips
- Ex Affymetix
- 1.28 x 1.28 cm glass/silicon wafer
- 24 x 24 ?m probe site (? 500,000 probes)
- Lengths of DNA up to 25 nucleotides long
- Requires a new set of masks for each new array
type
6The Biologic Process
(In-vitro Transcription)
7Microarray Image Construction
Then, analyze image on computer
8Part 2 Bayesian Network
Bayesians Networks based on a statistical
approach presented by a mathematician, Thomas
Bayes in 1763. -an approach for calculating
probabilities -among several variables -variables
are causally related (cause or effect) -the
relationships can't easily be derived by
experimentation. Bayes formula provides the
mathematical tool that combines prior knowledge
with current data to produce a posterior
distribution
9Representation of Graphical Models
- the graphs in which nodes represent random
variables. - a Bayesian Network is kind of directed graphical
model, - it takes into account the directionality of the
arcs. (arrows between nodes) - an arc from A to B as indicating that A
causes'' B..()
A
B
10Types of inferences
- (a) Predictive - a can cause b
- (b) Diagnostic - b is evidence of a
- (c) Intercasual - a and b can cause c
- a explains c so its evidence against b
- (explaining away,Berkson's paradox, or
"selection bias")
a
a
a
b
b
b
c
11Simple Example
12Inference
Assume we observe the grass is wet, there are
two causes sprinkler or rain . Which is more
probable ???
Pr(S1W1) S Pr(S1, W1) / Pr(W1)
0.2781/0.6 0.4 Pr(S1W1) S Pr(R1, W1) /
Pr(W1) 0.4581/0.6 0.7
Normalizing Pr(W1) 0.6471
13Inference Two
Pr(S1 W1) 0.2781/0.6471
0.429 Pr((R1W1) 0.4581 / 0.6471
0.7079 More likely grass is wet because its
raining!! First inference, bottom up Bayes
Network from effects to causes. Secondly, top
down reasoning also possible using example above
we can deduce probability grass is wet given that
its cloudy.
14The Biologic Process
(In-vitro Transcription)
15Data Mining
16Part 3 Data Analysis
Genes
j
Bayesian Network Learning Algorithm
Experiments
i
Aij - the mRNA level of gene j in experiment i
- Goal
- Learn regulatory networks
- Identify causal sources of the biological problem
of interest
17BN is a representation of a joint probability
distribution.
- Consists of two parts
- Directed Acyclic Graph (Structure of network)
- Set of parameters for the DAG (Statistical
Hypothesis) - DAG represents the causal relations among a set
of random variables (gene expression levels) - X causes Y if and only if there is a direct edge
from X to Y
18Bayesian Networks
- A Bayesian network has two components.
- G a directed-acyclic graph structure
- ? a set of parameters for conditional
distribution of each variable - Given a training set D x1, , xN of
independent instances of X, - to find a network B ltG, ?gt that best matches D.
- The score function for a network is defined as,
- where C is a constant independent of G
- The marginal likelihood, which averages the
probability of the data over all possible
parameter assignments to G.
19Bayesian Networks (2)
Obtain score
- Model with the highest log likelihood is a model
that is the best predictor of the data D - Score can use local criteria ? Slocal(Xi,
Pa(Xi), D)
PaG(Xi) is the set of parents of Xi.
- Assign a score to each DAG based on the sample
data, and search for the highest scoring one, and
obtain a certain DAG. For example
20Example, Constructed Bayesian Network (G)
21Conclusion
- However,
- Data is very noisy, mRNA expression data alone
only gives a partial picture that does not
reflect key events (translation and protein
(in)activation) - The amount of samples does not provide enough
information to build a full detailed model - Bayesian Network based data analysis results
better estimation than fuzzy networks. - Good probabilistic relation between DNA base
pairs and targeted disease. - High success rate for serious disease such as
cancer, etc.
22Future Researches
- Researches for more accurate data gathering
- Anthropologic researches on fossils
- Similar nanobiotechnologic chips can be designed
for diagnose and treatment of illnesses - Other AI subjects can be applied to gene-chip
data, in addition to fuzzy and Bayesian networks
approaches
23Resources
- Introduction to Gene Chips and Microarray
Expression Data - Dr. Travis Doom, Assistant Professor BIRG
Lab, Department of Computer Science and
Engineering, Wright State University - http//learn.genetics.utah.edu/units/biotech/micro
array/ - Introduction to MEMS Lecture Notes, Dr. Senol
Mutlu, BU - Introduction to microarray, Lecture notes, Bin
Yao, byao_at_med.wayne.edu - http//www.affymetrix.com/support/technical/datash
eets/100k_datasheet.pdf - Biostatistical Methods in Molecular Biology
(Clifton, N.J.) V. 184, Looney, Stephen W. - THANKS FOR YOUR ATTENTIONS