Title: Introduction to DNA Microarray
1Introduction to DNA Microarray
- Neha Jain
- Lecturer
- School of Biotechnology
- Devi Ahilya University, Indore
2.
Genes can be regulated at many levels
Usually, when we speak of gene regulation, we are
referring to transcriptional regulation. The
complete set of all genes being transcribed are
referred to as the transcriptome.
3- In the last dozen years, it has become possible
to look at the entire transcriptome in a single
experiment! - High Throughput - Simultaneous analysis of all
genes in a genome. - The high throughput analysis of all expressed
genes is termed as Transcriptome analysis. The
expression analysis of the full set of RNA
molecules produced by a cell under a given set of
conditions. - Transcriptome analysis facilitates our
understand-ing of how sets of genes work together
to form metabolic, regulatory, and signalling
pathways within the cell.
4Genomic analysis of gene expression
- Methods capable of giving a snapshot of RNA
expression of all genes - Can be used as diagnostic profile
- Example cancer diagnosis
- Can show how RNA levels change during
development, after exposure to stimulus, during
cell cycle, etc. - Provides large amounts of data
- Can help us start to understand how whole systems
function
5Types of Gene Expression Analysis
- While there are a number of variations, there are
essentially two basic ways of doing expressed
gene analysisusing sequencing-based methods and
microarrays. - These have largely replaced older methods such as
subtractive hybridization and differential
display. - Sequencing-based methods are very powerful but
have typically been prohibitively expensive. - However, with recent advances in low-cost,
high-throughput next generation sequencing, these
methodsreferred to as RNA-seqare becoming
more common and may soon be dominant.
6RNA-seq
- Although details of the methods vary, the concept
behind RNA-seq is simple - Isolate all mRNA
- Convert to cDNA using reverse transcriptase
- Sequence the cDNA
- Map sequences to the genome
- The more times a given sequence is detected, the
more abundantly transcribed it is. - If enough sequences are generated, a
comprehensive and quantitative view of the entire
transcriptome of an organism or tissue can be
obtained.
7DNA microarrays
8DNA microarrays
- Microarrays may eventually be eclipsed by
sequence-based methods, but meanwhile have become
incredibly popular since their inception in 1995
(Schena et al. (1995) Science 270467-70). - DNA microarrays rely on the hybridization
properties of nucleic acids to monitor DNA or RNA
abundance on a genomic scale in different types
of cells - In other words, the principle behind microarray
is the ability of complementary strands of DNA
(or DNA and RNA) to hybridize to one another in
solution with high specificity.
9Nucleic acid hybridization
10Introduction
- A microarray (or gene chip) is a slide attached
with a high-density array of immobilized DNA
oligomers (sometimes cDNAs) representing the
entire genome of the species under study. - Each DNA is attached to solid support
- Glass, plastic, or nylon
- Oligomer is spotted on the slide and serves as a
probe for binding to a unique, complementary
cDNA. - The cDNA population, labelled with fluorescent
dyes or radioisotopes, is allowed to hybridize
with the oligo probes on the chip. - The amount of fluorescent or radiolabels at each
spot position reflects the amount of
corresponding mRNA in the cell. - Sets of genes involved in the same regulatory or
metabolic pathways can potentially be identified.
11The Process
Building the chip
MASSIVE PCR
PCR PURIFICATION AND PREPARATION
PREPARING SLIDES
PRINTING
DNA/RNA preparation
Hybing the chip
POST PROCESSING
CELL CULTURE AND HARVEST
ARRAY HYBRIDIZATION
RNA ISOLATION
cDNA PRODUCTION
DATA ANALYSIS
PROBE LABELING
12(No Transcript)
13(No Transcript)
14- For each spot on the microarray, red and green
fluorescence signals are recorded. - The two fluorescence images from the scanner are
then overlaid to create a composite image, which
indicates the relative expression levels of each
gene. - Thus, the measurement from the composite image
reflects the ratio of the two color intensities. - If a gene is expressed at a higher level in the
experimental condition (red) than in the control
(green), the spot displays a reddish color. I - f the gene is expressed at a lower level than the
control, the spot appears greenish. - Unchanged gene expression, having equal amount
of green and red fluorescence, results in a
yellow spot. - The colored image is stored as a computer file
(in TIFF format) for further processing.
15Microarray life cyle
Biological Question
Data Analysis
Sample Preparation
Microarray Detection
Taken from Schena Davis
Microarray Reaction
16Steps of Microarray Experiment
- A typical DNA microarray experiment involves a
multistep procedure - Fabrication of microarrays by fixing properly
designed oligonucleotides representing specific
genes - Hybridization of cDNA populations onto the
microarray Scanning hybridization signals and
image analysis - Transformation and normalization of data
- Analyzing data to identify differentially
expressed genes as well as sets of genes that are
co regulated
17Some Important Points about Microarray
- DNA microarrays are generated by fixing
oligonucleotides onto a solid - support such as a glass slide using a
robotic device - The probes should be specific enough to minimize
cross-hybridization - with non-specific genes.
- This requires BLAST searches against genome
databases to find - sequence regions with least sequence
similarity with non target - genes.
- The probes should be sensitive and devoid of
low-complexity regions - (a string of identical Nucleotides)
- The oligonucleotide sequences should not form
stable internal - secondary structures.
- Number of programs have been developed for
designing probe - sequences for microarrays spotting.
- OligoWiz
- OligoArray
18Image Processing
- Image processing is to locate and quantitate
hybridization spots - and to separate true hybridization signals from
background noise. - The background noise and artifacts produced in
this step include nonspecific hybridization,
unevenness of the slide surface, and the presence
of contaminants such as dust on the surface of
the slide. - Computer programs are used to correctly locate
the boundaries of the spots and measure the
intensities of the spot images after subtracting
the background pixels. - After subtracting the background noise, the
array signals are converted into numbers and
reported as ratios between Cy5 and Cy3 for each
spot.
19ArrayDB(http//genome.nhgri.nih.gov/arraydb/) Sc
anAlyze(http//rana.lbl.gov/EisenSoftware.htm) T
IGR Spotfinder (http//www.tigr.org/softlab/) are
Windows program for microarray image processing
using the TIFF image format.
20Data Transformation and Normalization
- Following image processing, the digitized gene
expression - data need to be further processed before
differentially - expressed genes can be identified.
- This processing is referred to as data
normalization and is - designed to correct bias owing to variations in
microarray - data collection rather than intrinsic biological
differences. - When the raw fluorescence intensity Cy5 is
plotted against - Cy3, most of the data are clustered near the
bottom left of - the plot, showing a non-normal distribution of
the raw data. - one way to improve the data discrimination is to
transform - Raw Cy5 and Cy3 values by taking the logarithm to
the base of 2.
21- This has the major advantage that it treats
differential up-regulation and down-regulation
equally, and also has a continuous mapping space.
- For example, if the expression ratio is 1, then
log2(1) equals 0 represents no change in
expression. If the expression ratio is 4, then
log2 (4) equals 2 and for expression ratio of
log2(1/4) equals -2. - Thus, in this transformation the mapping space
is continuous and up-regulation and
down-regulation are comparable. - Normalization -When one compares the expression
levels of genes that should not change in the two
conditions (say, housekeeping genes), what one
quite often finds is that an average expression
ratio of such genes deviates from 1. This may be
due to various reasons, for example, variation
caused by differential labelling efficiency of
the two fluorescent dyes or different amounts of
starting mRNA material in the two samples. Thus,
in the case of microarray experiments, as for any
large-scale experiments, there are many sources
of systematic variation that affect measurements
of gene expression levels. - Normalization is a term that is used to describe
the process of eliminating such variations to
allow appropriate comparison of data obtained
from the two samples.
22- A method to normalize the data is by using Lowess
(locally weighted scatter plot smoother)regression
method. - The following two software programs that are
freely available are specialized in image
analysis and data normalization. - Arrayplot
- SNOMAD
23Statistical Analysis to Identify Differentially
Expressed Genes
- One of the reasons to carry out a microarray
experiment is to monitor the expression level of
genes at a genome scale. The processed data,
after the normalization procedure, can then be
represented in the form of a matrix, often called
gene expression matrix Each row in the matrix
corresponds to a particular gene and each column
could either correspond to an experimental
condition or a specific time point at which
expression of the genes has been measured. Once
we have obtained the gene expression matrix
additional levels of annotation can be added
either to the gene or to the sample. For example,
the function of the genes can be provided, or the
additional details on the biology of the sample
may be provided, such as ?disease state'or
?normal state'. - Depending on whether the annotation is used or
not, analysis of gene expression data can be
classified into two different types, - Supervised learning, we do use the annotation
of either the gene or the sample, and create
clusters of genes or samples in order to identify
patterns that are characteristic for the cluster.
- Unsupervised learning, the expression data is
analysed to identify patterns that can group
genes or samples into clusters without the use of
any form of annotation. For example, genes with
similar expression profi les can be clustered
together without the use of any annotation.
24Statistical Analysis to Identify Differentially
Expressed Genes
- To separate genes that are differentially
expressed, a normalization cut off of twofold as
a criterion - . But a data point above or below the cut off
line could simply be there by chance or because
of error. - The only way to ensure that a gene that appears
to be differentially expressed is truly
differentially expressed is to perform multiple
replicate experiments and to perform statistical
testing. - The repeat experiments provide replicate data
points that offer information about the
variability of the expression data at a
particular condition. - The main hindrance to obtaining multiple
replicate datasets is often the cost microarray
experiments are extremely expensive for regular
research laboratories. - To do the statistical analysis two test are used
ANOVA (analysis of variance) and T-Test - Softwares
- MA-ANOVA
- Cyber-T
25Microarray Data Clustering
- One of the goals of microarray data analysis is
to cluster genes or samples with similar
expression profiles together, to make meaningful
biological inference about the set of genes or
samples. - The similar expression patterns are often a
result of the fact that the genes involved are in
the same metabolic pathway and have similar
functions. - The genetic basis of the co regulation could be
the result of common promoters and regulatory
regions.
26- Clustering is one of the unsupervised approaches
to classify data into groups of genes or samples
with similar patterns that are characteristic to
the group. -
- Clustering methods can be
- Hierarchical (grouping objects into clusters and
specifying relationships among objects in a
cluster, resembling a phylogenetic tree)This can
be of 2 types - Agglomerative (starting with the assumption
that each object is a cluster and grouping
similar objects into bigger clusters) - Divisive (starting from grouping all objects
into one cluster and subsequently breaking the
big cluster into smaller clusters with similar
properties) - Non-hierarchical (grouping into clusters without
specifying relationships between objects in a
cluster).Non-hierarchical clustering requires
predetermination of the number of clusters.
Non-hierarchical clustering then groups existing
objects into these predefined - clusters rather than organizing them into a
hierarchical structure.
27- Experimental Design for Microarrays
- There are a number of important experimental
design considerations for a microarray
experiment - Technical vs biological replicates
- Amplification of RNA
- Dye swaps
28- Experimental Design for Microarrays
- Technical vs biological replicates
- Technical replicates are repeat hybridizations
using the same RNA isolate - Biological replicates use RNA isolated from
separate experiments/experimental organisms - Although technical replicates can be useful for
reducing variation due to hybridization, imaging,
etc., biological replicates are necessary for a
properly controlled experiment
29- Experimental Design for Microarrays
- Amplification of RNA
- Linear amplification methods can be used to
increase the amount of RNA so that microarray
experiments can be performed using very small
numbers of cells. Its not clear to what degree
this affects results, especially with respect to
rare transcripts, but seems to be generally OK if
done correctly
30Experimental Design for Microarrays Dye
swaps When using 2-color arrays, its important
to hybridize replicates using a dye-swap strategy
in which the colors (labels) are reversed between
the two replicates. This is because there can be
biases in hybridization intensity due to which
dye is used (even when the sequence is the same).
Normally 2 dyes Cy5(Red Florescence for
infected/experimental samples) and Cy3 (Green
florescence for Samples)
S1
S2
31Thanks