Title: Introduction to Bioinformatics Microarrays1: Microarray Technology
1Introduction to Bioinformatics Microarrays1
Microarray Technology
- Course 341
- Department of Computing
- Imperial College, London
- Moustafa Ghanem
2Aims for the 2nd part of Course Microarray
Bioinformatics
- Appreciate the bigger picture of bioinformatics
- Bioinformatics is more than nucleotide sequence
analysis - Functional Genomics and Drug Discovery
- Understand basic microarray technology and its
use in gene expression analysis. - Learn basic data analysis methods and how to
apply them in the analysis of gene expression
data - Data Clustering
- Data Classification
- Statistical Analysis
3Recommended Texts
- For this part of the course
- Lecture Notes
- Handouts
- General overview of microarray data analysis
- Microarray Gene Expression Data Analysis A
Beginners Guide (Causton, Quakenbush and
Brazma) - Microarray Bioinformatics (Stekel)
- Data Mining
- Data Mining Concepts and Techniques (Han)
4Microarray TechnologyLecture Overview
- Aims, Motivation and Overview of 2nd Part of
Course - Biology Background
- Basic Idea of Microarrays
- Types of Microarray technologies and how they
work - Outputs of Microarrays
- Image Analysis required to transform output to
gene expression matrices - Generating Gene Expression Matrices
5BackgroundFunctional Genomics
- Functional Genomics
- Systematic analysis of gene activity in healthy
and diseased tissues. - The study of obtaining an overall picture of
genome functions, including the expression
profiles at the mRNA level and the protein level.
- Functional Genome Analysis
- used to understand the functions of genes and
proteins in an organism. This is typically known
as genome annotation. - used in integrative biology and systems biology
studies aiming to understand health and disease
states (e.g. cancer, obesity, etc) - Used as an important step in the search for new
target molecules in the drug discovery process.
6BackgroundThe Drug Discovery Pipeline
- Drug Discovery is a lengthy process that takes
years and requires the use of bioinformatics,
chemoinformatics and clinical-informatics tools. - Functional genomics plays an important role in
speeding up the pipeline and also in allowing us
to try new therapeutic methods.
7BackgroundDrug Discovery
- Functional genomics plays an important role in
identifying functions of potential therapeutic
targets such as encoded proteins. Gene expression
studies plays an important role in most stages - Target Identification
- Understand disease states, identify genetics
changes that cause disease (genes, proteins,
tissues, environmental conditions, etc) - Target Validation
- Understand the role of a target and the effects
of manipulating a target candidate (e.g. what if
I knock a gene out) - Compound Screening
- Understand compounds effect on target and its
risk profile - Pre-clinical and clinical trials
- Prioritise studies
8BackgroundBiology, Cells and DNA
- All living organisms consist of cells. Humans
have trillions of cells Yeast - one cell. - Cells are of many different types (blood, skin,
nerve), but all arose from a single cell (the
fertilized egg) - Each cell contains a complete copy of the genome
(the program for making the organism), encoded in
DNA. - A gene is a segment of DNA that specifies how to
make a protein. Human DNA has about 30-35,000
genes Rice has about 50-60,000, but shorter
genes.
9What is?
- Gene Expression
- The process by which the information encoded in a
gene is converted into an observable phenotype
(most commonly production of a protein). - The degree to which a gene is active in a certain
tissue of the body, measured by the amount of
mRNA in the tissue. - Microarrays
- Tools used to measure the presence and abundance
of gene expression in tissue. - microarray technologies provide a powerful tool
by which the expression patterns of thousands of
genes can be monitored simultaneously
10BackgroundGene Expression
- Cells are different because of differential gene
expression. -
- About 40 of human genes are expressed at one
time. - Gene is expressed by transcribing DNA into
single-stranded mRNA - mRNA is later translated into a protein
- Microarrays measure the level of mRNA expression
11A Dynamic ViewGene expression depends on
environment!
Interactions
Environment
DNA
Protein
Growth rate
Expression
12A Dynamic ViewGene expression varies with time !
13Microarray Technology Quantitative Measurement
of Gene Expression
- Also known as DNA microarrays, DNA arrays, DNA
chips, gene chips, Whatever the name, their use
is effectively transforming a living from a black
box into a transparent box.
14Applications of Microarray Technology
15Data Analysis over microarray data
- What type of data analysis is required to
- Identify Genes expressed in different cell types
(e.g. Liver vs finger) - Learn how expression levels change in different
developmental stages (embryo vs. adult) - Learn how expression levels change in different
developmental stages (cancerous vs non-cancerous) - Learn how groups of genes inter-relate (gene-gene
interactions) - Identify cellular processes that genes
participate in (structure, repair, metabolism,
replication, etc) - Applications covered only as example contexts,
emphasis is on analysis methods
16MicroarraysBasic Idea
Affymetrix Inc. is the leading provider of
Microarray technology (GeneChip
) http//www.affymetrix.com/
- A Microarray is a device that detects the
presence and abundance of labelled nucleic acids
in a biological sample. - In the majority of experiments, the labelled
nucleic acids are derived from the mRNA of a
sample or tissue. - The Microarray consists of a solid surface onto
which known DNA molecules have been chemically
bonded at special locations. - Each array location is typically known as a probe
and contains many replicates of the same
molecule. - The molecules in each array location are
carefully chosen so as to hybridise only with
mRNA molecules corresponding to a single gene.
17Basic Idea
Several companies sell equipment to make DNA
chips, including spotters to deposit the DNA on
the surface and scanners to detect the
fluorescent or radioactive signals.
- A Microarray works by exploiting the ability of a
given mRNA molecule to bind specifically to, or
hybridize to, the DNA template from which it
originated. - By using an array containing many DNA samples,
scientists can determine, in a single experiment,
the expression levels of hundreds or thousands of
genes within a cell by measuring the amount of
mRNA bound to each site on the array. - With the aid of a computer, the amount of mRNA
bound to the spots on the Microarray is precisely
measured, generating a profile of gene expression
in the cell.
18BackgroundDNA/RNA Hybridization
- DNA molecules
- DNA molecules are long double-stranded chains 4
types of bases are attached to the backbone
adenine (A), guanine (G), cytosine (C), and
thymine (T). A pairs with T, C with G. - DNA-RNA hybridization
- When a mixture of DNA and RNA is heated to
denaturation temperatures to form single strands
and then cooled, RNA can hybridize (form a double
helix) with DNA that has a complementary
nucleotide sequence.
19The Array
The technology for making DNA chips has become so
well-defined that it is even possible to
construct all of the equipment for under 50,000
using directions on the Internet from Professor
Pat Browns laboratory at Stanford.
http//cmgm.stanford.edu/pbrown/
20Applying a Labelled Sample
- The molecules in the target biological sample are
labelled using a fluorescent dye before sample is
applied to array - If a gene is expressed in the sample, the
corresponding mRNA hybridises with the molecules
on a given probe (array location). - If a gene is not expressed, no hybridisation
occurs on the corresponding probe. - Reading the array output
- After the sample is applied, a laser light source
is applied to the array. - The fluorescent label enables the detection of
which probes have hybridised (presence) via the
light emitted from the probe. - If gene is highly expressed, more mRNA exists and
thus more mRNA hybridises to the probe molecules
(abundance) via the intensity of the light
emitted.
21The Process
Chemistry Basics Surface Chemistry is used to
attach the probe molecules to the glass
substrate. Chemical reactions are used to attach
the florescent dyes to the target molecules Probe
and Target hybridise to form a double helix
22The array
23Steps of a Microarray Experiment
- Prepare DNA chip(s) by choosing probes and
attaching them to glass substrate. Note location
and properties of each probe. - Generate a hybridization solution containing a
mixture of fluorescently labelled targets. - Incubate hybridization mixture.
- Detect probe hybridization using laser technology
- Scan the arrays and store output as images
- Quantify each spot
- Subtract background
- Normalize
- Export a table of fluorescent intensities for
each gene in the array - Analyze data using computational methods.
24Types of Microarrays
- How are Microarrays are made?
- What molecules make the probes?
- cDNA (PCR products) vs Oligos
- How are the probes added to the chip?
- Spotting vs. In-situ synthesis
- Output type
- Single label vs. Dual label
- Why ? Appreciation of some of the concepts of the
technology. - Helps us understand and choose between available
technology. - Helps us design our experiments.
- Helps understand sources of errors in array
outputs and compensate for them.
25Designing the Probes
Each probe represents the measurement for a
single gene An array represents measurements for
many genes
- The probes need to be of high specificity to
avoid hybridization with wrong target molecules. - The probes need to generate an output that is
easy to read (spots lie in defined positions and
be of regular size and shape and even spacing). - The probes have to have high sensitivity to
detect the mRNA and the intensity of the spot
light must be differentiable from background
noise. - The intensity of a spot light also needs to
correlate with the abundance of the target
molecule in the sample. - Results must be reproducible across multiple
experiments.
26Probe Types
Different chip manufacturers use different
technologies As an end user you will use the
probe types recommended for the chips, but would
have to select the sequences for the probes to be
used in your experiments Affymetrix technology
is based on oligos (20 bases per probe)
- The DNA probes used on a an array can either be
polymerase chain reaction (PCR) products (cDNAs)
or Oligonucleotides. - In the first case (cDNA), highly parallel PCR is
used to amplify DNA from a clone library, and the
amplified DNA is purified, the clones are
typically long sequences (Complete genes or
ESTs). - In the second case, DNA oligonucleotides are
presynthesised for use on the array --- An
oligonucleotide, or oligo as it is commonly
called, is a short fragment of a single-stranded
DNA that is typically 5 to 50 nucleotides long.
This can achieve a higher density of probes per
chip. - In both cases the probes are attached (fixed or
immobilized) to a glass (or nylon) surface using
special surface chemical techniques (Beyond this
course).
27Spotting vs. In-situ SynthesisSpotting
- Spotting works for both cDNA probes and oligo
probes - The Spotting Process
- The DNA probes are produced and stored in wells.
- A Spotting robot is used to deposit them onto
individual locations on the glass slide - The glass slide is post-processed so no further
DNA can attach to it. - Spotting is easy to automate but may generate
poor quality spots (irregular spots of different
shapes and sizes)
28The Spotting Robot
- The Operation of the Spotting Robot
- The pins are dipped into the wells to collect the
first batch of DNA. - This DNA is spotted onto a number of different
arrays, depending on the number of arrays being
made and the amount of liquid the pins can hold. - The pins are washed to remove any residual
solution and ensure no contamination of the next
sample. - The pins are dipped into the next set of wells.
- Return to step 2 and repeat until the array is
complete.
29Spotting Process
30Spotting vs. In-situ SynthesisIn-situ Synthesis
Affymetrix technology is based on in-situ
synthesis in a series of addition steps separated
by mask addition and then photo-deprotection.
- Since oligos are synthesized short sequences,
their bases can be added to the glass surface one
at a time. - Using high tech processes this can generate best
quality (regular even spots). - Different patented technologies are used to
enable this to happen while not allowing more
than one base to be added at a time, including - Photodeprotection technology (Affymetrix)
- Inkjet Array Synthesis
31In-situ SynthesisAffymetrix
32Comparison of Probe Types
Many other variations of the technology exist,
such as the use of longer oligos, the use of
fibre optics, etc.
In-situ Synthesis / Oligos
PCR Products / cDNA Probes
- Advantages
- Flexibility to study cDNAs from any source.
- cDNAs do not require any a priori information
about the corresponding genes. - Longer sequences increase hybridization
specificity, which reduces false positives.
- Advantages
- No need to isolate and purify cDNAs because
oligonucleotides can be synthesized. - Short oligonucleotides are less likely to have
cross-reactivity with other sequences in the
target DNA. - Density of chips is higher than with cDNAs.
- Limitations
- Isolation of individual cDNAs to immobilize on
each spot can be cumbersome. - Density is lower than synthesizing
oligonucleotides on the surface of the chip. - cDNAs are longer sequences and are more likely to
randomly contain sequences found in target DNA,
which results in cross-reactivity.
- Limitations
- The sequence has to be known.
- Synthesis can be expensive and time-consuming.
- The short sequences are not as specific for
target DNA, so appropriate controls must be
added.
33Single Label vs. Dual LabelSingle Channel vs
Dual Channel
Affymetrix technology is based on the use of
single labels
- Most laboratories use fluorescent labelling, with
the two dyes Cy3 (excited by a green laser) and
Cy5 (excited by a red laser). - In Dual label experiments, two samples are
hybridised to the arrays, one labelled with each
dye this allows the simultaneous measurement of
two samples (e.g. for differential analysis) - In Single label experiments, only one sample is
hybridised to the arrays labelled with one dye.
(in which case control needs to be measured using
a separate chip). - Choice between single and dual label is governed
by array technology and underlying chemistry.
34Dual Label Experiments
Typically used in custom made cDNA chips
Typically used to study one sample (e.g.
diseased tissue) vs. a control sample (e.g.
normal tissue) Separate images are obtained for
each channel, and then combined
35Qualitative Interpretation of Double Label
Experiments
- GREEN represents High Control hybridization
- RED represents High Sample hybridization
- YELLOW represents a combination of Control and
Sample where both hybridized equally. BLACK
represents areas where neither the Control nor
Sample hybridized. - Main issue is to quantify the results
- How green is green?
- What is the ratio of the signal to background
noise? - How to compare multiple experiments using
different chips? - How to quantify cross hybridization (if any)?
36Affymetrix GeneChipExample of Single Label Chips
- Hundreds of thousands of oligonucleotide probes
packed at extremely high densities. The probes
designed to maximize sensitivity, specificity,
and reproducibility, allowing consistent
discrimination between specific and background
signals, and between closely related target
sequences. - RNA labeled and scanned in a single color one
sample per chip
37Interpreting Affymetrix OutputPerfect
Match/Mismatch Strategy
- GeneChips use a Perfect Match/Mismatch probe
strategy - Each probe designed to be perfectly complementary
to a target sequence, a partner probe is
generated that is identical except for a single
base mismatch in its centre. - These probe pairs, called the Perfect Match probe
(PM) and the Mismatch probe (MM), allow the
quantitation and subtraction of signals caused by
non-specific cross-hybridization. - The difference in hybridization signals between
the partners, as well as their intensity ratios,
serve as indicators of specific target abundance.
38 PM to maximize hybridization MM to
ascertain the degree of cross-hybridization
Affymetrix GeneChipsPerfect Matches and
Mismatches
39Other Image Processing Problems Spot Quality
Problems
Various Image processing techniques may be
applied to read and interpret the outputs of
Microarrays Commercial Microarray (e.g.
Affymetrix) systems use proprietary
software Image Analysis software packages exist
for the analysis of the output of custom made
chips (e.g. GenePix Pro, Array Vision, TIGR Spot
Finder, etc)
40From Microarray images to Gene Expression
Matrices
41From Microarray images to Gene Expression
Matrices
- In spot quantitation matrices, rows typically
represent all the measurements made from
individual spots on the array. These can include
mean and median pixel intensities of the spot and
local background, etc. - An experiment typically consists of one or more
spot quantitation matrices representing all
arrays used in the study. - In the gene expression matrix, rows represent
genes (as opposed to features/spots on the array)
and columns represent measurements from different
experimental conditions measured on individual
arrays. - An example is each column representing
measurements at different time points (to, t1,
t2, ) in time course experiments - A second example is each column representing
different tissue type - A third is each column representing a different
individual - A fourth is having groups of columns representing
measurements from diseased cells, and other
groups representing measurements from health
cells, - etc,
- Each of the above matrices requires the
application of data normalisation technuiques as
discussed in the next lecture.
42SummaryMicroarrays
- Basic Concept
- Based on Crick-Watson Hybridization
- Different Microarray technologies exist.
- Probe type (cDNA vs oligo)
- Spotting vs in-situ synthesis
- Single vs. dual channel
- Output is a typically an image
- Sources of errors
- Image processing is required
- Images are converted into gene expression
matrices for further analysis