Arrays

About This Presentation

Transcript and Presenter's Notes

Title: Arrays

1
Arrays How do they work ? What are they ?
2
Arrays are inverted Northerns
Tissue sample
quantitate
Extract target RNA
3
Probe preparation
Acquire or Generate probes All the genes you
want
Spot
4
Hybridise Scan
5
Arrays How do you make them ?
6
Arrayers
7
Pins
Pin type blunt, ring, quill, coated.. Breaking
bending, sticking Consistency of spots
coffee-cup, splash, drip Contamination
carry-over, dust, hairs, crystals. Etc etc.
8
Slides

Cracking
Splitting
Exfoliating
Fluorescing
Coatings - Hydrophobic, hydrophilic, correctly
aged poly-lysine (a bit of an art)
Home-made vs bought (cost of internal vs
external quality control.
Scan before coating, scan after coating, scan
after arraying, scan after hyb-ing all part of QC
Etcetc...

9
RNA Quality control
10
What biological questions can you answer with
arrays ?
11
Global analysis of gene expression
Im a big fan of ignorance based techniques
because humans have a lot of ignorance, and we
want to play to our strong suit. -Eric Lander
12
What it looks like
Before processing, we have a LOT of spots
13
Example Hybridisation
After processing, we have a LOT of objective data
14
Sorting out gene families
microarray
5 hormone response gene family members In
different experiments
Biopsy type
15
However ..
Duplication in genomes is a real problem
Human
Plant
Yeast
16
Apart from wholesale duplication
Gene families (plant) ( of members as a
proportion of the genome)

Conservation between genes
37 of genes are highly conserved
(TBLASTX Elt10-30)
10 more are partially conserved
(TBLASTX Elt10-5)

17
Segmental Duplication of the Human Genome
Sequence Identity
Exp lt 1e-10
18
What goes on the slide ?
One choice would be Amplifications of cDNAs
chosen by partial sequence (ESTs)
19
ESTs have inherent problems
20

Better solutions
GSTs (gene specific tags)
Oligo arrays
Affymetrix genechips

21
(No Transcript)
22
Bioinformatics I How do you designthe
experiments to best sort out the data ?
23
Replication of spots (hybridisation controls)

3 is a statistical minimum
mean
median
or mode ? ?

Eliminate rogues throw away inconsistent data
24
Why replicate spots ?

spot variability
hybridisation variability
detection / analysis variability
Lee et al (2000) PNAS 979834-9839
Comparison of 3 replicates of 288 genes, 6mm
apart on one slide, using only one channel (Cy3).
9 false positives for 1 of the replicates
(2,500)
0.7 false positives for 3 spots per gene (200)

25
Replication between slides
(Using the same target RNA sample)

slide variability (spotting batches)
Reduced by large batch generation and batch QC
Fluors
Swapped labelling experiments.
Also Differential degradation (re-scanning)

26
Replication between samples

RNA probes (pooling vs replicates)
equivalence of material
environment
sampling
extraction

27
Percentage CV as Estimate of Variability

CV is a measure of variance amongst replicates
of a single condition
Defined as the standard deviation divided by the
mean multiplied by 100
Example 5 signal values representing 5
replicates
- 230.4, 241.7, 252.9, 338.8, 178.9
- Mean 248.56 ? 57.9 CV 23.29
CV helps you assess pilot studies

28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
Bioinformatics II How do you display the data
from arrays to make sense of it ?
32
Problems with arraying
29
70
33
The Yeast array
34
Scanning software

e.g. GenePix Pro to analyse images from GenePix
scanner
semi-automated spot finding
produces mean, median, SD of the pixels in each
spot
reports background intensities around each spot
reports a normalisation factor other QC
measures

35
Primary Analysis(Analysing the raw spot
intensities)

Ratios of intensities
Comparing the intensity to the control channel
Logged intensities
Makes variation of intensities and ratios of
intensities more independent of absolute
magnitude
Normalised or standardised data
Removes systematic variations in experiments.

36
Primary Analysis - Affy Data
37
Why Normalise ?

To correct for systematic measurement error and
bias in data
- Differences in probe labeling
- Target concentration
- Hybridization efficiency
- Scanner noise
Allows for data comparison

38
Data Normalization Methods

Scaling Factor (linear) normalization
- Global or selected gene set
- Works well when data quality metrics are
consistent
- Simplifies database construction
- Weakness assumes error is uniform across all
genes
assumes total mRNA is the
same for all cells
Non-linear
- Can provide higher precision, especially at
the extremes
- Requires selected gene (invariant) set
- May give false confidence in poor data

39
Secondary Analysis(Analysing intensity ratios
across whole slide, gene expression histories)

scatter plot of array data e.g.
log Cy5 vs. log Cy3
Separate Affy chips

40
Normalization Curves
Not normalized
Normalized
41
2 Slide - scatter
Using this you can plot two slides against each
other (this is with (optional) log scales).
Naturally you can click on each gene.
42
Standards for Storing Data
Minimum Information About a Microarray Experiment
(MIAME) http//www.mged.org

Experimental design time course, dose
response, normal vs. treated
Array description description of array which
physical copy used
Description of sample growth conditions, dev.
stage, labelling
Hybridisation conditions wash procedure,
time, concentration
Measurements scanning hardware software
Normalisation controls housekeeping genes,
spiking controls

43
Database Software for Storing Data
ArrayExpress

ArrayExpress
A data model designed by EBI for array data
Modelled to support MIAME standards
Continuing adoption support of new standards
MicroArray Gene Expression Object Model
(MAGE-OM)
MicroArray Gene Expression Markup Language
(MAGE-ML)

44
Secondary Analysis(Analysing intensity ratios
across whole slide, gene expression histories)
Gene expression history how has my favourite
gene been expressed in all experiments in the
database
45
Scatter plot 2 two genes, many slides
Apetela (x) Vs Agamous (y)
petal
46
Tertiary Analysis(Clustering)

Clustering of genes based on similar expression
profiles
Several techniques have been applied to array
data
Hierarchical clustering
Non-hierarchical clustering methods
Principle Component Analysis Self Organising
Maps K-means clustering

47
Yeast cell cycle array data set Organised by
Gene type
48
Yeast cell cycle array data set Organised by
cycle expression
49
An Expression Roadmap to Wood formation
Bark Phloem Cambium
Xylem
Division Expansion 2nd
cell-wall
50
Differential expression patterns
51
Cluster analysis
A3
A10
A3
A10
52
Lignin biosynthesis
C4H EC1.14.13.11
COMT EC2.1.1.68
F5H EC1.14.13.-
COMT EC2.1.1.68
PAL EC4.3.1.5
C3H
4CL
4CL
4CL
4CL
EC4.1.1.28
EC3.2.1.21
4CL EC6.2.1.12
Coumarine
CCoAOMT EC2.1.1.104
CCoAOMT EC2.1.1.104
CCoA-3H
CCR
CCR
CCR
EC1.14.11.9
CCR EC1.2.1.44
F5H EC1.14.13.-
Flavonoids
CAD
CAD
Fold Change
COMT EC2.1.1.68
F5H EC1.14.13.-
CAD EC1.1.1.195
15 4 3 2 1.5 11 1.5 2 3
4 15
Transport
Anionic Peroxidases EC1.11.1.7
Polymerisation
C
D
E
A
B
Laccase EC1.10.3.2
Dirigent-like
L I G N I N
53
...or general metabolites
54
Amniotic membrane for Ocular surface
reconstruction
Oligo array
2D protein
55
Caution !
These are just CLUES !!
56
What software should I use ?

Free
Expression profiler
D-Chip
Commercial
J-express
Genespring

but dont forget Excel

Write a Comment

User Comments (0)

About PowerShow.com

Arrays PowerPoint PPT Presentation