Title: Computational modeling of microarrays
1Computational modeling of microarrays
- Li Zhang
- Department of Biostatistics and Applied
Mathematics - The University of Texas MD Anderson Cancer Center
- URL http//odin.mdacc.tmc.edu/zhangli
2Part I
- Introduction to Bioinformatics
3Bio-nanotechnology miniaturization and automation
100 mm
200 nm
A. Multilayer elastomer microfluidics for cell
sorting and single cell gene expression
profiling. B. Nonomechanical sensor for DNA
binding. C. Semiconductor nanowire sensor for
protein binding.(Hood et al., 2004. Science 203.)
4High throughput technologies lead to data
explosion
- High throughput
- Microarray 0.1 mg sample on 2 cm x 2 cm chip
- with 106 probe features giving 106
measurements. - Data explosion
- DNA Sequence. Mutation. Copy number.
Methylation. DNA-protein binding. - RNA Dynamic abundance.
- Protein Dynamic abundance. Chemical
modification. Protein-Protein interaction.
5New opportunities
- Global view Characterize cellular life on a
systemic level - Systems biology integration of high throughput
data to build and characterize gene network. - Biomarkers Diagnosis of diseases. Identify risk
factors for prevention. Treatment response
markers for personalized medicine.
6Network model of galactose utilization in yeast
Hood et al., Science. Vol 306. p640. (2004)
7Bioinformatics in MD Anderson Cancer Center
- An award winning team
- Microarray CAMDA 2002, 2003, 2004
- Proteomics PAMDA 2003.
- MDACC Faculty Scholar Award 2002, 2003, 2004.
- Mitchell Prize 2003.
- Web site http//bioinformatics.mdanderson.org
- Graduate School (GSBS) http//gsbs.gs.uth.tmc.edu
/
8Challenges
- Data quality Noise and technical bias.
- Complex data structure Heterogeneity.
- Biomarkers Multiple testing problem.
- Network Curse of dimensionality.
9Part II
10Microarray Platforms
- Spotted arrays
- Inserts from cDNA libraries, PCR products, or
oligonucleotides - Probed with labeled RNA or cDNA from 2 samples
- Affymetrix GeneChip arrays
- 25mer oligonucleotides synthesized on a glass
wafer - Probed with labeled RNA or cDNA from a single
sample
11Protocol of a microarray experiment
12Affymetrix GeneChip Probe Arrays
Hybridized Probe Cell
GeneChip Probe Array
Single stranded, fluorescently labeled DNA target
Oligonucleotide probe
24µm
1.28cm
Each probe cell or feature contains millions of
copies of a specific oligonucleotide probe
Over 250,000 different probes complementary to
genetic information of interest
Image of Hybridized Probe Array
13Double helix on microarrays
- The probe is a 25-mer DNA oligo
ATCAGCATACGAGAGAATGATGGAT
ATCAGCATACGACAGAATGATGGAT
AAUAGUCGUAUGCUCUCUUACUACCUAGC
cRNA fragment from solution
Average distance between probes is 80Ã…
14Technical factors affecting gene expression
measurements
- Interaction between base pairs (stacking)
-
- Interaction with microarray surface
-
- Interaction with unintended targets (cross
hybridization) - Kinetic process (equilibration washing)
-
- Physical properties of RNA sample
- Degradation (missing 5 ends)
- Alternative splicing (missing exons)
- Secondary structure (RNA hairpins loops)
- Biotinylation
15Technical factors affecting gene expression
measurements
- Interaction between base pairs (stacking)
- Nearest-neighbor model
- Interaction with microarray surface
- Positional dependant weights for stacking
energies - Interaction with unintended targets (cross
hybridization) - PDNN mean field theory
- Kinetic process (equilibration washing)
- Langmuir and Sips model
- Physical properties of RNA sample
- Degradation (missing 5 ends)
- Alternative splicing (missing exons)
- Secondary structure (RNA hairpins loops)
- Biotinylation
16Assumption two types of binding
- Gene-specific binding 25 n.t. exact
complementary sequences (binding with the
intended target). - Non-specific binding Many (gt5) mismatches or
short stretches (binding with unintended
targets).
17Positional Dependant Nearest-Neighbor (PDNN)
model of molecular interactions
Weighted sum base-pair stacking energies
Gene-specific binding energy
Non-specific binding energy
18PDNN model of probe signals
Probe Signal
Fitness
- N, B are the same on a microarray
- Nj is the same in a probe set.
Constraints
- Energy parameters
- B, N, Nj
Minimization of T
Software available at http//odin.mdacc.tmc.edu/
zhangli/PerfectMatch
19Fitting PDNN model
ln (signal)
Probe index
20Energy parameters in PDNN model
Weight factors
Stacking energy terms
21Baseline of non-specific binding
Non-specific binding energy
22Effects of Mismatches
- A Mismatch disrupts the double helix formation.
- Energetically, it is unfavorable for binding.
- It depends on the context of DNA sequences.
23Effect of mismatch at base13 depends on the
nearest-neighbors
C
T
A
A
G
24Sequence dependence of free energy cost of single
mismatch in DNA duplexes
25Pattern of cross hybridization MM and PM probes
bind to different molecules
Var(ln MM)
Var(ln PM)
Data source Affymetrix HG-U133 spike-in data
set. Large variation indicates resonse to
spike-ins. Number of arrays 42. Number of probes
on an array 0.5 million.
26Microarray surface effects
- DNA and RNA are negatively charged.
- Glass surface also charged
- Repulsion
27Pattern of cross hybridization bias towards the
5 end
5 end
28Sense and antisense
- Upon binding, sense and antisense probes form the
same double helix structure. - The same interactions should lead to the same
binding energy. - The observed data contradict with this prediction.
29Contrast of sense and antisense probe signals
- Y -0.17 0.05 Nt 0.05 Na 0.02 Ng
- R2 0.67 Sample size875.
Model fitted
Ln (sense probe signal / antisense probe signal)
30Summary
- Binding on array surface Probe binding free
energy can be approximated by a weighted sum of
base-pair stacking energies, with the probe ends
having less contributions. - Mismatches Mismatches disrupt hybridization,
especially in cross hybridization. The effects of
mismatches depend on sequences. The surface also
an effect. - Surface effects Cross hybridization is biased
towards the 5 end of the probes. Repulsion of
surface depends on nucleotides.
31Acknowledgements
Ken Hess Keith A. Baggerly Kevin R. Coombes
James Mitchell Norris Clift Lianchun
Xiao Roberto Carta Chunlei Wu Haitao Zhao Kenneth
D Aldape Michael F Miles