Title: Various Career Options Available
1Introduction to Microarray Dr G. P. S.
Raghava
2Molecular Biology Overview
Nucleus
Cell
Chromosome
Gene (DNA)
Gene (mRNA), single strand
Protein
3Measuring Gene Expression
Idea measure the amount of mRNA to see which
genes are being expressed in (used by) the cell.
Measuring protein would be more direct, but is
currently harder.
4(RT)
5The Goals
- Basic Understanding
- Arrays can take a snap shot of which subset of
genes in a cell is actively making proteins - Heat shock experiments
- Medical diagnosis
- Microarrays can indicate where mutations lie that
might be linked to a disease. Still others are
used to determine if a persons genetic profile
would make him or her more or less susceptible to
drug side effects - 1999 A genechip containing 6800 human genes was
used distinguish between myeloid leukemia and
lympholastic leukemia using a set of 50 genes
that have different activity levels - Drug design
- Pharmaceutical firms are in a rush to translate
the human genome results into new products - Potential profits are huge
- First, though, they must figure out what the
genes do, how they interact, and how they relate
to diseases. - Evaluation, Specificity, Response
6Microarray Potential Applications
- Biological discovery
- new and better molecular diagnostics
- new molecular targets for therapy
- finding and refining biological pathways
- Recent examples
- molecular diagnosis of leukemia, breast cancer,
... - appropriate treatment for genetic signature
- potential new drug targets
7History
- 1980s antibody-based assay (protein chip?)
- 1991 high-density DNA-synthetic chemistry
(Affymetrix/oligo chips) - 1995 microspotting (Stanford Univ/cDNA chips)
- replacing porous surface with solid surface
- replacing radioactive label with fluorescent
label - improvement on sensitivity
8What is a DNA Microarray?
genes or gene fragments attached to a substrate
(glass)
Tens of thousands of spots/genes entire genome
in 1 experiment A Revolution in Biology
Hybridized slide Two dyes Image analyzed
9Gene Expression Microarrays
- The main types of gene expression microarrays
- Short oligonucleotide arrays (Affymetrix)
- cDNA or spotted arrays (Brown/Botstein).
- Long oligonucleotide arrays (Agilent Inkjet)
10Terms/Jargons
- Stanford/cDNA chip
- one slide/experiment
- one spot
- 1 gene gt one spot or few spots(replica)
- control control spots
- control two fluorescent dyes (Cy3/Cy5)
- Affymetrix/oligo chip
- one chip/experiment
- one probe/feature/cell
- 1 gene gt many probes (2025 mers)
- control match and mismatch cells.
11Affymetrix Microarrays
Raw image
1.28cm
107 oligonucleotides, half Perfectly Match mRNA
(PM), half have one Mismatch (MM) Raw gene
expression is intensity difference PM - MM
12DNA Microarrays
- Each probe consists of thousands of strands of
identical oglionucleotides - The DNA sequences at each probe represent
important genes (or parts of genes) - Printing Systems
- Ex HP, Corning Inc.
- Printing systems can build lengths of DNA up to
60 nucleotides long - 1.28 x 1.28 cm glass wafer
- Each print head has a 100 ?m diameter and are
separated by 100 ?m. (? 5,000 20,000 probes) - Photolithographic Chips
- Ex Affymetix
- 1.28 x 1.28 cm glass/silicon wafer
- 24 x 24 ?m probe site (? 500,000 probes)
- Lengths of DNA up to 25 nucleotides long
- Requires a new set of masks for each new array
type
13The Process
(In-vitro Transcription)
14Hybridization and Staining
Biotin Labeled cRNA
GeneChip
Hybridized Array
SAPE Streptavidin- phycoerythrin
15Microarray Data
- First, the Problems
- The fabrication process is not error free
- Probes have a maximum length 25-60 nucleotides
- Biologic processes such as hybridization are
stochastic - Background light may skew the fluorescence
- How do we decide if/how strongly a particular
gene is being expressed? - Solutions to these problems are still in their
infancy
16Affymetrix Gene chip system
- Uses 25 base oligos synthesized in place on a
chip (20 pairs of oligos for each gene) - RNA labeled and scanned in a single color
- one sample per chip
- Can have as many as 20,000 genes on a chip
- Arrays get smaller every year (more genes)
- Chips are expensive
- Proprietary system black box software, can
only use their chips
17cDNA Microarray Technologies
- Spot cloned cDNAs onto a glass microscope slide
- usually PCR amplified segments of plasmids
- Label 2 RNA samples with 2 different colors of
flourescent dye - control vs. experimental - Mix two labeled RNAs and hybridize to the chip
- Make two scans - one for each color
- Combine the images to calculate ratios of amounts
of each RNA that bind to each spot
18cDNA microarrays
- Compare the genetic expression in two samples of
cells
PRINT cDNA from one gene on each spot
SAMPLES cDNA labelled red/green
e.g. treatment / control normal / tumor
tissue
19HYBRIDIZE Add equal amounts of labelled cDNA
samples to microarray.
SCAN
Laser
Detector
20Long Oligos
- Like cDNAs, but instead of using a cloned gene,
design a 40-70 base probe to represent each gene - Relies on genome sequence database and
bioinformatics - Reduces cross hybridization
- Cheaper and possibly more sensitive than Affy.
system
21Images from scanner
- Resolution
- standard 10?m currently, max 5?m
- 100?m spot on chip 10 pixels in diameter
- Image format
- TIFF (tagged image file format) 16 bit (65536
levels of grey) - 1cm x 1cm image at 16 bit 2Mb (uncompressed)
- other formats exist e.g.. SCN (used at Stanford
University) - Separate image for each fluorescent sample
- channel 1, channel 2, etc.
22Processing of images
- Addressing or gridding
- Assigning coordinates to each of the spots
- Segmentation
- Classification of pixels either as foreground or
as background - Intensity determination for each spot
- Foreground fluorescence intensity pairs (R, G)
- Background intensities
- Quality measures
23Images in analysis software
- The two 16-bit images (Cy3, Cy5) are compressed
into 8-bit images - Display fluorescence intensities for both
wavelengths using a 24-bit RGB overlay image - RGB image
- Blue values (B) are set to 0
- Red values (R) are used for Cy5 intensities
- Green values (G) are used for Cy3 intensities
- Qualitative representation of results
24Images examples
25Quantification of expression
- For each spot on the slide we calculate
- Red intensity Rfg - Rbg
- (fg foreground, bg background) and
- Green intensity Gfg - Gbg
- and combine them in the log (base 2) ratio
- Log2( Red intensity / Green intensity)
26Gene Expression Data
- On p genes for n slides p is O(10,000), n is
O(10-100), but growing,
Slides
slide 1 slide 2 slide 3 slide 4 slide 5 1
0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49
0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10
0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.
06 1.06 1.35 1.09 -1.09 ...
Genes
Gene expression level of gene 5 in slide 4
Log2( Red intensity / Green intensity)
These values are conventionally displayed on a
red (gt0) yellow (0) green (lt0) scale.
27Biological question Differentially expressed
genes Sample class prediction etc.
Experimental design
Microarray experiment
16-bit TIFF files
Image analysis
(Rfg, Rbg), (Gfg, Gbg)
Normalization
R, G
Estimation
Testing
Clustering
Discrimination
Biological verification and interpretation
28Quality control (-gt Flag)
- How good are foreground and background
measurements ? - Variability measures in pixel values within each
spot mask - Spot size
- Circularity measure
- Relative signal to background intensity
- Dapple
- b-value fraction of background intensities less
than the median foreground intensity - p-score extend to which the position of a spot
deviates from a rigid rectangular grid - Flag spots based on these criteria
29Replication
- Why?
- To reduce variability
- To increase generalizability
- What is it?
- Duplicate spots
- Duplicate slides
- Technical replicates
- Biological replicates
30Practical Application of DNA Microarrays
- DNA Microarrays are used to study gene activity
(expression) - What proteins are being actively produced by a
group of cells? - Which genes are being expressed?
- How?
- When a cell is making a protein, it translates
the genes (made of DNA) which code for the
protein into RNA used in its production - The RNA present in a cell can be extracted
- If a gene has been expressed in a cell
- RNA will bind to a copy of itself on the array
- RNA with no complementary site will wash off the
array - The RNA can be tagged with a fluorescent dye to
determine its presence - DNA microarrays provide a high throughput
technique for quantifying the presence of
specific RNA sequences
31Analysis and Management of Microarray Data
- Magnitude of Data
- Experiments
- 50 000 genes in human
- 320 cell types
- 2000 compunds
- 3 times points
- 2 concentrations
- 2 replicates
- Data Volume
- 41011 data-points
- 1015 1 petaB of Data
32Thanks