Title: Statistics for Microarrays
1Statistics for Microarrays
Biological background Gene Expression and
Molecular Laboratory Techniques
Class web site http//statwww.epfl.ch/davison/te
aching/Microarrays/
2Basic principles in physics, chemistry and biology
Principles Known?
Physics Matter
Chemistry Compound
Biology Organism
Elementary Particles Yes
Genes No
Elements Yes
3Central Paradigm
4(RT)
5(No Transcript)
6Protein Synthesis
7Transcription
- Transcription is a complex process involving
several steps and many proteins (enzymes) - RNA polymerase synthesizes a single strand of RNA
against the DNA template strand (anti-sense
strand), adding nucleotides to the 3 end of the
RNA chain - Initiation is regulated by transcription factors,
including promoters, usually an initiator element
and TATA box, usually lying just upstream (at the
5 end) of the coding region - 3 end cleaved at AAUAAA, poly-A tail added
8Exons and Introns
- Most of the genome consists of non-coding regions
- Some non-coding regions (centromeres and
telomeres) may have specific chomosomal functions - Other non-coding regions have regulatory purposes
- Non-coding, non-functional DNA often called junk
DNA, but may have some effect on biological
functions - The terms exon and intron refer to coding and
non-coding DNA, respectively
9Intron Splicing
10Transcription Overview
11(No Transcript)
12Transcription Illustration
13Translation
- The AUG start codon is recognized by
methionyl-tRNAiMet - Once the start codon has been identified, the
ribosome incorporates amino acids into a
polypeptide chain - RNA is decoded by tRNA (transfer RNA) molecules,
which each transport specific amino acids to the
growing chain - Translation ends when a stop codon (UAA, UAG,
UGA) is reached
14Translation Illustrated
15From Primary Transcript to Protein
16Alternative Splicing (of Exons)
- How is it possible that there are over 1,000,000
human antibodies when there are only about 30,000
genes? - Alternative splicing refers to the different ways
the exons of a gene may be combined, producing
different forms of proteins within the same
gene-coding region - Alternative pre-mRNA splicing is an important
mechanism for regulating gene expression in
higher eukaryotes
17(No Transcript)
18Molecular Laboratory Techniques
- Hybridizing DNA
- Copying DNA
- Cutting DNA
- Probing DNA
19Hybridization
- Hybridization exploits a potent feature of the
DNA duplex the sequence complementarity of the
two strands - Remarkably, DNA can reassemble with perfect
fidelity from separated strands - Strands can be separated (denatured) by heating
20(No Transcript)
21Polymerase Chain Reaction (PCR)
- PCR is used to amplify (copy) specific DNA
sequences in a complex mixture when the ends of
the sequence are known - Source DNA is denatured into single strands
- Two synthetic oligonucleotides complementary to
the 3 ends of the segment of interest are added
in great excess to the denatured DNA, then the
temperature is lowered - The genomic DNA remains denatured, because the
complementary strands are at too low a
concentration to encounter each other during the
period of incubation, but the specific
oligonucleotides hybridize with their
complementary sequences in the genomic DNA
22PCR, ctd
- The hybridized oligos then serve as primers for
DNA synthesis, which begins upon addition of a
supply of nucleotides and a temperature resistant
polymerase such as Taq polymerase, from Thermus
aquaticus (a bacterium that lives in hot springs) - Taq polymerase extends the primers at
temperatures up to 72C - When synthesis is complete, the whole mixture is
heated further (to 95C) to melt the newly formed
duplexes - Repeated cycles (2530) of synthesis (cooling)
and melting (heating) quickly provide many DNA
copies
23(No Transcript)
24(BREAK)
25Types of Viruses
A virus is a nucleic acid in a protein coat.
Reverse transcriptase makes a complementary DNA
copy from RNA.
26Reverse transcription
Clone cDNA strands, complementary to the mRNA
G U A A U C C U C
mRNA
Reverse transcriptase
T T A G G A G
cDNA
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
27RT-PCR
28Restriction Enzymes Cut DNA
29Restriction Enzymes
- When a bacterium is invaded by a DNA-containing
organism (e.g. virus), it can defend itself with
restriction enzymes (REs also called restriction
endonucleases) - REs recognize a specific short sequence of DNA
and cut both strands - The recognition sequence is typically a
palindrome i.e. the sequence in one strand is
the same as in the other, read in the other
direction (e.g. GAATTC) - REs named after the bacteria in which they occur,
plus sequence number (e.g. Eco RI)
30RE Example (Eco RI)
(cut) 5 GAATTC 3 3
CTTAAG 5 (cut)
31Probing DNA
- One way to study a specific DNA fragment within a
genome is to probe for the sequence of the
fragment - A probe is a labeled (usually radioactive or
fluorescent) single-stranded oligonucleotide,
synthesized to be complementary to the sequence
of interest probe sequence is known - Attach single-stranded DNA to a membrane (or
other solid support) and incubate with the probe
so that it hybridizes - Visualize the probe (e.g. by X-ray for
radioactive probes)
32The Southern blotting technique
33Sample Autoradiogragh (Gel)
34Types of Blots
- Southern Blot use DNA to probe DNA
- Northern Blot use DNA to probe RNA
- Western Blot use antibodies to probe Protein
35Measuring Gene Expression
Idea measure the amount of mRNA to see which
genes are being expressed in (used by) the cell.
Measuring protein would be more direct, but is
currently harder.
36Microarrays provide a means to measure gene
expression
37Areas Being Studied with Microarrays
- Differential gene expression between two (or
more) sample types - Similar gene expression across treatments
- Tumor sub-class identification using gene
expression profiles - Classification of malignancies into known classes
- Identification of marker genes that
characterize different tumor classes - Identification of genes associated with clinical
outcomes (e.g. survival)
38cDNA microarray experiments
- mRNA levels compared in many different contexts
- Different tissues, same organism (brain v.
liver) -
- Same tissue, same organism (ttt v. ctl, tumor v.
non-tumor) -
- Same tissue, different organisms (wt v. ko, tg,
or mutant) - Time course experiments (effect of ttt,
development) - Other special designs (e.g. to detect spatial
patterns).
39Web animation of a cDNA microarray experiment
http//www.bio.davidson.edu/courses/genomics/chip/
chip.html
40Yeast genome on a chip
41Brief outline of steps for producing a microarray
- cDNA probes attached or synthesized to solid
support - Hybridize targets
- Scan array
42cDNA microarrays
cDNA clones
43cDNA microarrays
- Compare the genetic expression in two samples of
cells
PRINT cDNA from one gene on each spot
SAMPLES cDNA labelled red/green
e.g. treatment / control normal / tumor
tissue
44HYBRIDIZE Add equal amounts of labelled cDNA
samples to microarray.
SCAN
Laser
Detector
45Quantification of expression
- For each spot on the slide we calculate
- Red intensity Rfg - Rbg
- (fg foreground, bg background) and
- Green intensity Gfg - Gbg
- and combine them in the log (base 2) ratio
- Log2( Red intensity / Green intensity)
46Gene Expression Data
- On p genes for n slides p is O(10,000), n is
O(10-100), but growing,
Slides
slide 1 slide 2 slide 3 slide 4 slide 5 1
0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49
0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10
0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.
06 1.06 1.35 1.09 -1.09 ...
Genes
Gene expression level of gene 5 in slide 4
Log2( Red intensity / Green intensity)
These values are conventionally displayed on a
red (gt0) yellow (0) green (lt0) scale.
47Biological question Differentially expressed
genes Sample class prediction etc.
Experimental design
Microarray experiment
16-bit TIFF files
Image analysis
(Rfg, Rbg), (Gfg, Gbg)
Normalization
R, G
Estimation
Testing
Clustering
Discrimination
Biological verification and interpretation