Data normalisation for twocolour microarrays - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Data normalisation for twocolour microarrays

Description:

Download normalised M values (txt file) Microarray Data Flow at TIGR. Image Analysis ... Requires Java Runtime environment and Windows XP. MIDAS data analysis methods ... – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 48
Provided by: sallyw
Category:

less

Transcript and Presenter's Notes

Title: Data normalisation for twocolour microarrays


1
Data normalisation for two-colour microarrays
  • References for microarray analysis
  • 1. Leung and Cavalieri. 2003. Fundamentals of
    cDNA microarray data analysis. Trends in
    Genetics 19 649-659
  • 2. Hegde et al. (2000) A concise guide to cDNA
    microarray analysis. Biotechniques 29(3) 548-554
  • Quackenbush (2002) Microarray data normalization
    and transformation. Nature genetics 32496-501
  • Acknowledgements to John Quakenbush for slide
    material

2
Outline
  • Why Normalise?
  • Normalisation methods
  • Available Normalisation packages
  • DNMAD (GEPAS)
  • MIDAS (TIGR)
  • R

3
A two-colour normalisation
4
Why normalise?
  • During probe preparations technical variations
    can be generated including
  • Unequal amounts of cDNAs
  • Differences in dye properties
  • Differences in dye incorporation
  • Differences in scanning
  • Normalisation aims to correct for these variations

5
Assumptions
  • Most global normalisation methods assume the two
    dyes are related by a constant factor
  • RkG
  • Most normalisation methods assume that the mean
    ratio is 1 or that the log of this ratio is 0 ie
    a gene does not change its expression under the
    condition being studied.

6
Remember ratio vs log ratio
Acy3, Bcy5
R
4
Gene1
3
2
1
Gene2
0
AB
Advantage of log transformation Treat
up-regulated and down-regulated genes
symmetrically
7
Log2(ratio) measures treat up- and
down-regulated genes equally
log2(1) 0 log2(2) 1 log2(1/2) -1
8
Normalisation methods
  • Within array normalisation
  • Small targeted arrays
  • housekeeping genes
  • internal spikes
  • Larger randomly printed arrays
  • total signal intensity
  • linear regression
  • non-linear regression and Lowess
  • Between array normalisation

9
Normalisation to housekeeping genes
  • Idea Some genes shouldnt be differentially
    expressed
  • But what are these genes?
  • Perhaps actin, ubiquitin, ribosomal RNAs etc
  • Normalisation constants
  • (Cy3)/(Cy5) for those genes or the median value
    of Cy3/Cy5 for several housekeeping genes

10
Normalisation to spikes or exogenous RNAs
  • On the array place a number of sequences from a
    different organism - genes that have low homology
    to any gene in the organism of interest.
  • Synthesize RNA for each of these genes by IVT
  • Spike known quantities of these genes into known
    quantities of sample (NB- must start with same
    amt of sample RNA)
  • Set the normalization constant to get the
    expected value of the ratios for the exogenous
    added genes

11
Normalisation to Total Signal
  • Assume equal gene expression/signal in each
    sample
  • Global/ non-selective microarray
  • Normalisation constant
  • S(Cy3)/S(Cy5) or
  • S(Cy5)/S(Cy3) for the dye swap experiment
  • For each gene, multiply the ratio by the
    normalization constant.

12
Linear regression of cy3 vs cy5
  • Assume expression of majority of genes doesnt
    change between samples
  • Scatter plot of raw data (green vs red)
  • Median background subtraction from mean
    foreground? You can decide
  • data is bunched up in the left hand corner
  • solution log transformation

13
Linear regression of log(cy3) vs log(cy5)
  • draw scatter plot of log(green) vs log(red)
  • draw linear best fit line
  • yaxb, where a0.878 b1.419 (red line)
  • x normalisedaxb

14
Before after normalisation
Red- raw data, blue-normalised data
15
The RI or MA plot
  • Checks if data exhibits an intensity-dependent
    structure
  • Uncertainty in ratio measurements generally
    greater at lower intensities
  • For RI Plot log2(R/G) vs. log2(RG)
  • For MA Plot log2(R/G) vs (½ )log2(RG)
  • Remember
  • log (R/G) log(R) log(G)
  • log (RG) log(R) log(G)

16
MA plots pre- and post normalisation
After normalisation, the fit of the data to the
horizontal line through 0 is much better
17
Example of good data RI plot
18
Example of bad data RI plot
Each print tip is coloured differently NB! The
data is curved so a straight line normalisation
is not a good idea!
19
Non-linear regression
  • Global lowess (locally weighted scatterplot
    smoothing OR locally weighted linear regression)
  • Print tip lowess
  • 2D lowess (spatial bias)
  • DNA spots on array must be arranged randomly

20
Applying a Global Lowess to a single sample
  • Produce a MA plot (calculating the average log
    intensity and log ratio for each feature)
  • Curve fits data far better than a straight line
    as the data is curved.
  • Apply a lowess regression to the data
  • Calculate the normalised log ratio for each
    feature according to the equation of the curve

21
How does Lowess work?
  • Lowess performs a large number of local
    regressions in overlapping windows.
  • Each regression is then then combined to form a
    smooth curve ? the Lowess data set.

22
Print-tip lowess
  • Application of lowess to a sub-set of the
  • data (ie print group or subgrid) (providing
    sufficient genes printed per tip)
  • Advantage
  • can correct for systematic spatial variation
  • (inconsistencies between pins,
  • variability in slide surface,
  • slight local differences in hyb conditions)

Print tips normalised for mean around zero
Print-tip layout
23
Print-tip lowess
24
A potential problem Spatial effects
  • In some experiments there is a spatial difference
    between the two channels, resulting in parts of
    the array being brighter in the cy3 and other
    parts in the cy5.
  • A typical cause is slides with an uneven surface
    resulting in the different lasers being out of
    focus if they do not adjust as they scan.

25
Correcting for Spatial Effects
  • 2d lowess regression - fits a 2 dimensional
    polynomial surface to the data
  • Block-by-block lowess regression
  • MA plots do not show gradients across slides,
    therefore a pseudo colour or false colour overlay
    is required for each slide

or ?
26
Normalisation methods
  • Within array normalisation
  • Small targeted arrays
  • housekeeping genes
  • internal spikes
  • Larger random arrays
  • total signal intensity
  • linear regression
  • non-linear regression and Lowess
  • Between array normalisation

27
Between array normalisations
  • Usually you will want to do more than one array,
    either as replicates or as additional samples.
  • You will need to normalise between arrays if you
    are wanting to to compare the results.
  • You can even normalise across different platforms
    (e.g. spotted and Affymetrix) and laboratories!

28
Main assumption
  • The variation is due to experimental artefacts
    NOT biology!!
  • If you expect large differences between samples
    you should not try and normalise!!
  • Data needs to be centered ? means of log ratio0
  • Data needs to be scaled ? s.d.1
  • Data needs to be normally distributed ?
    distributions are identical

29
Scaling across slides

Equal spread of variation between slides
30
Normalization Approaches
The Solution(?)
  • Can minimise normalisation
  • by adjusting PMTs during scanning so that the
    mean/median of the ratios1
  • The best technique is experiment dependent
  • Check diagnostic plots carefully
  • All analysis methods depend on good
  • experimental design

31
Normalisation packages
  • Genepix
  • Excel
  • DNMAD (GEPAS)
  • MIDAS (TIGR)
  • R

32
What you can do in Genepix
  • Array quality control
  • Scatter plots (cy3 vs cy5)
  • MA plots
  • Normalisation using housekeeping genes or
    internal spikes
  • Global normalisation
  • total intensity (mean/median of ratios1)
  • linear regression (regression ratio1)

By default, flags are excluded from
normalisation
33
Excel
  • What you can do in excel
  • Scatter plots (cy3 vs cy5)
  • MA plots
  • Normalisation using housekeeping genes or
    internal spikes
  • Normalisation using total intensity
  • Linear regression normalisation

34
DNAMAD
  • http//dnmad.bioinfo.cnio.es/
  • Input GPR files
  • Input array layout (eg Arabidopsis Arizona
    slides Main grid 12 rows X 4 columns, sub-grid
    26 rows X 25 columns)
  • Normalisation options eg use flags, return flags
    as NA, use background subtraction, use global
    lowess
  • Work through tutorial http//bioinfo.cnio.es/docu
    s/courses/dnmad/index.html

35
Output
Can down load normalised M values as txt file
(remember to multiple dye swaps by -1)
36
Box plot of M values by print-tip
37
MA plots pre- and post LOWESS
38
Slide1 diagnostic plots
39
Slide1 Image plots
Background and M plots also shown
40
Slide scale normalisation
Next send to Preprocessor Download normalised M
values (txt file)
41
Microarray Data Flow at TIGR
Image Analysis
.tiff Image File
Raw Gene Expression Data
Gene Annotation
Normalization / Filtering
Normalized Data with Gene Annotation
Expression Analysis
Data Entry / Management
Interpretation of Analysis Results
42
MIDAS is a Normalization and Filtering tool
for microarray data analysis!
Serves as a data pre-processor for clustering
analysis (MeV).
Requires Java Runtime environment and Windows XP
43
MIDAS data analysis methods
  • 7 normalization/transformation methods

Total Intensity normalization
LOWESS (Locfit) normalization
Iterative linear regression normalization
Iterative log mean centering normalization
Ratio Statistics normalization
Standard deviation regularization
In-slide replicates analysis
  • 8 filtering (quality control) methods

Low intensity filter
Slice analysis
Flip-dye consistency checking
Ratio Statistics Confidence Interval checking
Invalid-intensity checking
Signal/Noise checking
Spot QC flag checking
Cross-file-trim
44
Graphical scripting language
45
Graphical scripting language
  • Read input files
  • Define analysis
  • pipeline and set
  • parameters for
  • each analysis module
  • Write output files
  • NB
  • Input MEV files only (convert GPR files using
    Expression Converter)
  • Click create PDF report

46
Using R
statistical microarray analysis (sma) module
  • sma will normalise, compare slides, and do
    statistical tests on data
  • Allows simultaneous multiple slide analysis
  • To process the data
  • load experiments into R
  • describe slide printing configuration
  • load experiments into a working data set
  • Analyse data

47
Normalisation exercises
  • Two exercises using DNMAD
  • http//bioinfo.cnio.es/docus/courses/dnmad/index.h
    tml
  • Using your own data
  • Try linear regression normalisation (in excel) vs
    print-tip LOWESS (DNMAD and/or MIDAS)
Write a Comment
User Comments (0)
About PowerShow.com