DNA chip Data Analysis with Bayesian Networks - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

DNA chip Data Analysis with Bayesian Networks

Description:

Part 1 Micro fabrication and Process Steps. Part 2 Basic of ... Ex: HP, Corning Inc. Printing systems can build lengths of DNA up to 60 nucleotides long ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 24
Provided by: alio9
Category:

less

Transcript and Presenter's Notes

Title: DNA chip Data Analysis with Bayesian Networks


1
DNA chip Data Analysis with Bayesian Networks
  • Introduction
  • Part 1 Micro fabrication and Process Steps
  • Part 2 Basic of Bayesian Networks
  • Part 3 DNA data analysis with BN
  • Conclusion
  • Cmpe 530 Term Project
  • Prepared by
  • Ali Osman Sevim

2
INTRODUCTION What is DNA chip?
  • DNA microarray/genome chip/DNA chip is a
    collection of microscopic DNA spots attached to a
    solid surface forming an array for the purpose of
    expression profiling, monitoring expression
    levels for thousands of genes simultaneously.

Biology
Nanotechnology
Artificial Intelligence
DNA chips Nanobiotechnology
3
WHY Gene-chips (micro-arrays)?
  • Traditionally experimental concludes inefficient
    and insufficient results
  • Make possible measure and compare quantitatively
    the expression level of tens of thousands of
    genes in cells in a single experiment.
  • High NRE cost of huge PCR machines as compared to
    micro DNA analysers.

4
Part 1 Fabrication
  • Fabrication via Printing
  • DNA sequence stuck to glass substrate
  • DNA solution pre-synthesized in the lab
  • Fabrication In Situ
  • Sequence built
  • Photolithographic techniques use light to release
    capping chemicals
  • 365 nm light allows 20-?m resolution

5
DNA Microarrays
  • Each probe consists of thousands of strands of
    identical nucleotides
  • The DNA sequences at each probe represent
    important genes (or parts of genes)
  • Printing Systems
  • Ex HP, Corning Inc.
  • Printing systems can build lengths of DNA up to
    60 nucleotides long
  • 1.28 x 1.28 cm glass wafer
  • Each print head has a 100 ?m diameter and are
    separated by 100 ?m. (? 5,000 20,000 probes)
  • Photolithographic Chips
  • Ex Affymetix
  • 1.28 x 1.28 cm glass/silicon wafer
  • 24 x 24 ?m probe site (? 500,000 probes)
  • Lengths of DNA up to 25 nucleotides long
  • Requires a new set of masks for each new array
    type

6
The Biologic Process
(In-vitro Transcription)
7
Microarray Image Construction
Then, analyze image on computer
8
Part 2 Bayesian Network
Bayesians Networks based on a statistical
approach presented by a mathematician, Thomas
Bayes in 1763. -an approach for calculating
probabilities -among several variables -variables
are causally related (cause or effect) -the
relationships can't easily be derived by
experimentation. Bayes formula provides the
mathematical tool that combines prior knowledge
with current data to produce a posterior
distribution
9
Representation of Graphical Models
  • the graphs in which nodes represent random
    variables.
  • a Bayesian Network is kind of directed graphical
    model,
  • it takes into account the directionality of the
    arcs. (arrows between nodes)
  • an arc from A to B as indicating that A
    causes'' B..()

A
B
10
Types of inferences
  • (a) Predictive - a can cause b
  • (b) Diagnostic - b is evidence of a
  • (c) Intercasual - a and b can cause c
  • a explains c so its evidence against b
  • (explaining away,Berkson's paradox, or
    "selection bias")



a
a
a
b
b
b
c
11
Simple Example
12
Inference
Assume we observe the grass is wet, there are
two causes sprinkler or rain . Which is more
probable ???
Pr(S1W1) S Pr(S1, W1) / Pr(W1)
0.2781/0.6 0.4 Pr(S1W1) S Pr(R1, W1) /
Pr(W1) 0.4581/0.6 0.7
Normalizing Pr(W1) 0.6471
13
Inference Two
Pr(S1 W1) 0.2781/0.6471
0.429 Pr((R1W1) 0.4581 / 0.6471
0.7079 More likely grass is wet because its
raining!! First inference, bottom up Bayes
Network from effects to causes. Secondly, top
down reasoning also possible using example above
we can deduce probability grass is wet given that
its cloudy.
14
The Biologic Process
(In-vitro Transcription)
15
Data Mining

16
Part 3 Data Analysis
Genes
j
Bayesian Network Learning Algorithm
Experiments
i
Aij - the mRNA level of gene j in experiment i
  • Goal
  • Learn regulatory networks
  • Identify causal sources of the biological problem
    of interest

17
BN is a representation of a joint probability
distribution.
  • Consists of two parts
  • Directed Acyclic Graph (Structure of network)
  • Set of parameters for the DAG (Statistical
    Hypothesis)
  • DAG represents the causal relations among a set
    of random variables (gene expression levels)
  • X causes Y if and only if there is a direct edge
    from X to Y

18
Bayesian Networks
  • A Bayesian network has two components.
  • G a directed-acyclic graph structure
  • ? a set of parameters for conditional
    distribution of each variable
  • Given a training set D x1, , xN of
    independent instances of X,
  • to find a network B ltG, ?gt that best matches D.
  • The score function for a network is defined as,
  • where C is a constant independent of G
  • The marginal likelihood, which averages the
    probability of the data over all possible
    parameter assignments to G.

19
Bayesian Networks (2)
Obtain score
  • Model with the highest log likelihood is a model
    that is the best predictor of the data D
  • Score can use local criteria ? Slocal(Xi,
    Pa(Xi), D)

PaG(Xi) is the set of parents of Xi.
  • Assign a score to each DAG based on the sample
    data, and search for the highest scoring one, and
    obtain a certain DAG. For example

20
Example, Constructed Bayesian Network (G)
21
Conclusion
  • However,
  • Data is very noisy, mRNA expression data alone
    only gives a partial picture that does not
    reflect key events (translation and protein
    (in)activation)
  • The amount of samples does not provide enough
    information to build a full detailed model
  • Bayesian Network based data analysis results
    better estimation than fuzzy networks.
  • Good probabilistic relation between DNA base
    pairs and targeted disease.
  • High success rate for serious disease such as
    cancer, etc.

22
Future Researches
  • Researches for more accurate data gathering
  • Anthropologic researches on fossils
  • Similar nanobiotechnologic chips can be designed
    for diagnose and treatment of illnesses
  • Other AI subjects can be applied to gene-chip
    data, in addition to fuzzy and Bayesian networks
    approaches

23
Resources
  • Introduction to Gene Chips and Microarray
    Expression Data
  • Dr. Travis Doom, Assistant Professor BIRG
    Lab, Department of Computer Science and
    Engineering, Wright State University
  • http//learn.genetics.utah.edu/units/biotech/micro
    array/
  • Introduction to MEMS Lecture Notes, Dr. Senol
    Mutlu, BU
  • Introduction to microarray, Lecture notes, Bin
    Yao, byao_at_med.wayne.edu
  • http//www.affymetrix.com/support/technical/datash
    eets/100k_datasheet.pdf
  • Biostatistical Methods in Molecular Biology
    (Clifton, N.J.) V. 184, Looney, Stephen W.
  • THANKS FOR YOUR ATTENTIONS
Write a Comment
User Comments (0)
About PowerShow.com