Title: Arrays
1Arrays How do they work ? What are they ?
2Arrays are inverted Northerns
Tissue sample
quantitate
Extract target RNA
3Probe preparation
Acquire or Generate probes All the genes you
want
Spot
4Hybridise Scan
5Arrays How do you make them ?
6Arrayers
7Pins
Pin type blunt, ring, quill, coated.. Breaking
bending, sticking Consistency of spots
coffee-cup, splash, drip Contamination
carry-over, dust, hairs, crystals. Etc etc.
8Slides
- Cracking
- Splitting
- Exfoliating
- Fluorescing
- Coatings - Hydrophobic, hydrophilic, correctly
aged poly-lysine (a bit of an art) - Home-made vs bought (cost of internal vs
external quality control. - Scan before coating, scan after coating, scan
after arraying, scan after hyb-ing all part of QC - Etcetc...
9RNA Quality control
10What biological questions can you answer with
arrays ?
11Global analysis of gene expression
Im a big fan of ignorance based techniques
because humans have a lot of ignorance, and we
want to play to our strong suit. -Eric Lander
12What it looks like
Before processing, we have a LOT of spots
13Example Hybridisation
After processing, we have a LOT of objective data
14Sorting out gene families
microarray
5 hormone response gene family members In
different experiments
Biopsy type
15However ..
Duplication in genomes is a real problem
Human
Plant
Yeast
16Apart from wholesale duplication
Gene families (plant) ( of members as a
proportion of the genome)
- Conservation between genes
- 37 of genes are highly conserved
- (TBLASTX Elt10-30)
- 10 more are partially conserved
- (TBLASTX Elt10-5)
17Segmental Duplication of the Human Genome
Sequence Identity
Exp lt 1e-10
18What goes on the slide ?
One choice would be Amplifications of cDNAs
chosen by partial sequence (ESTs)
19ESTs have inherent problems
20- Better solutions
- GSTs (gene specific tags)
- Oligo arrays
- Affymetrix genechips
21(No Transcript)
22Bioinformatics I How do you designthe
experiments to best sort out the data ?
23Replication of spots (hybridisation controls)
- 3 is a statistical minimum
- mean
- median
- or mode ? ?
Eliminate rogues throw away inconsistent data
24Why replicate spots ?
- spot variability
- hybridisation variability
- detection / analysis variability
- Lee et al (2000) PNAS 979834-9839
- Comparison of 3 replicates of 288 genes, 6mm
apart on one slide, using only one channel (Cy3). - 9 false positives for 1 of the replicates
(2,500) - 0.7 false positives for 3 spots per gene (200)
25Replication between slides
(Using the same target RNA sample)
- slide variability (spotting batches)
- Reduced by large batch generation and batch QC
- Fluors
- Swapped labelling experiments.
- Also Differential degradation (re-scanning)
26Replication between samples
- RNA probes (pooling vs replicates)
- equivalence of material
- environment
- sampling
- extraction
27Percentage CV as Estimate of Variability
- CV is a measure of variance amongst replicates
of a single condition - Defined as the standard deviation divided by the
mean multiplied by 100 - Example 5 signal values representing 5
replicates - - 230.4, 241.7, 252.9, 338.8, 178.9
- - Mean 248.56 ? 57.9 CV 23.29
- CV helps you assess pilot studies
28(No Transcript)
29(No Transcript)
30(No Transcript)
31Bioinformatics II How do you display the data
from arrays to make sense of it ?
32Problems with arraying
29
70
33The Yeast array
34Scanning software
- e.g. GenePix Pro to analyse images from GenePix
scanner - semi-automated spot finding
- produces mean, median, SD of the pixels in each
spot - reports background intensities around each spot
- reports a normalisation factor other QC
measures -
35Primary Analysis(Analysing the raw spot
intensities)
- Ratios of intensities
- Comparing the intensity to the control channel
- Logged intensities
- Makes variation of intensities and ratios of
intensities more independent of absolute
magnitude - Normalised or standardised data
- Removes systematic variations in experiments.
36Primary Analysis - Affy Data
37Why Normalise ?
- To correct for systematic measurement error and
bias in data - - Differences in probe labeling
- - Target concentration
- - Hybridization efficiency
- - Scanner noise
- Allows for data comparison
38Data Normalization Methods
- Scaling Factor (linear) normalization
- - Global or selected gene set
- - Works well when data quality metrics are
consistent - - Simplifies database construction
- - Weakness assumes error is uniform across all
genes - assumes total mRNA is the
same for all cells - Non-linear
- - Can provide higher precision, especially at
the extremes - - Requires selected gene (invariant) set
- - May give false confidence in poor data
39Secondary Analysis(Analysing intensity ratios
across whole slide, gene expression histories)
- scatter plot of array data e.g.
- log Cy5 vs. log Cy3
- Separate Affy chips
40Normalization Curves
Not normalized
Normalized
412 Slide - scatter
Using this you can plot two slides against each
other (this is with (optional) log scales).
Naturally you can click on each gene.
42Standards for Storing Data
Minimum Information About a Microarray Experiment
(MIAME) http//www.mged.org
- Experimental design time course, dose
response, normal vs. treated - Array description description of array which
physical copy used - Description of sample growth conditions, dev.
stage, labelling - Hybridisation conditions wash procedure,
time, concentration - Measurements scanning hardware software
- Normalisation controls housekeeping genes,
spiking controls -
43Database Software for Storing Data
ArrayExpress
- ArrayExpress
- A data model designed by EBI for array data
- Modelled to support MIAME standards
- Continuing adoption support of new standards
- MicroArray Gene Expression Object Model
(MAGE-OM) - MicroArray Gene Expression Markup Language
(MAGE-ML) -
44Secondary Analysis(Analysing intensity ratios
across whole slide, gene expression histories)
Gene expression history how has my favourite
gene been expressed in all experiments in the
database
45Scatter plot 2 two genes, many slides
Apetela (x) Vs Agamous (y)
petal
46Tertiary Analysis(Clustering)
- Clustering of genes based on similar expression
profiles - Several techniques have been applied to array
data - Hierarchical clustering
-
-
- Non-hierarchical clustering methods
- Principle Component Analysis Self Organising
Maps K-means clustering
47Yeast cell cycle array data set Organised by
Gene type
48Yeast cell cycle array data set Organised by
cycle expression
49An Expression Roadmap to Wood formation
Bark Phloem Cambium
Xylem
Division Expansion 2nd
cell-wall
50Differential expression patterns
51Cluster analysis
A3
A10
A3
A10
52Lignin biosynthesis
C4H EC1.14.13.11
COMT EC2.1.1.68
F5H EC1.14.13.-
COMT EC2.1.1.68
PAL EC4.3.1.5
C3H
4CL
4CL
4CL
4CL
EC4.1.1.28
EC3.2.1.21
4CL EC6.2.1.12
Coumarine
CCoAOMT EC2.1.1.104
CCoAOMT EC2.1.1.104
CCoA-3H
CCR
CCR
CCR
EC1.14.11.9
CCR EC1.2.1.44
F5H EC1.14.13.-
Flavonoids
CAD
CAD
Fold Change
COMT EC2.1.1.68
F5H EC1.14.13.-
CAD EC1.1.1.195
15 4 3 2 1.5 11 1.5 2 3
4 15
Transport
Anionic Peroxidases EC1.11.1.7
Polymerisation
C
D
E
A
B
Laccase EC1.10.3.2
Dirigent-like
L I G N I N
53...or general metabolites
54Amniotic membrane for Ocular surface
reconstruction
Oligo array
2D protein
55Caution !
These are just CLUES !!
56What software should I use ?
- Free
- Expression profiler
- D-Chip
- Commercial
- J-express
- Genespring
but dont forget Excel