Introduction to Bioinformatics Microarrays2: Microarray Data Normalisation - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Introduction to Bioinformatics Microarrays2: Microarray Data Normalisation

Description:

Each array location is typically known as a probe and contains ... (Rspot, Rbkg), (Gspot, Gbkg) Data Mining. Classification. Statistical Analysis. Motivation ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 29

Provided by: simonc4

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Bioinformatics Microarrays2: Microarray Data Normalisation

1
Introduction to Bioinformatics Microarrays2
Microarray Data Normalisation

Course 341
Department of Computing
Imperial College, London
Moustafa Ghanem

2
Lecture Overview

Background and Motivation
Introduction
Microarray experiments and microarray data
analysis
Sources of variability
Experimental design
Normalisation Examples
Probe intensity values
Two colour arrays
Positive controls
Spatial normalisation within array
Between array normalisation
Normalisation Methods
Total intensity normalisation
Scaling and centring
Linear regression
MA plots and Lowess

3
BackgroundMicroarrays
A Microarray works by exploiting the ability of
mRNA molecule to hybridize to its complementary
DNA probe The mRNA molecules in a target
biological sample are labelled using a
fluorescent dye and applied to the array The
fluorescent label enables the detection of which
probes have hybridised (presence) via the light
emitted from the probe.

A Microarray is a device detects the presence and
abundance of labelled nucleic acids in a
biological sample.
The Microarray consists of a solid surface onto
which known DNA molecules have been chemically
bonded at special locations.
Each array location is typically known as a probe
and contains many replicates of the same
molecule.
The molecules in each array location are
carefully chosen so as to hybridise only with
mRNA molecules corresponding to a single gene.

4
BackgroundMicroarray Data Analysis
Biological question
Sample Attributes
Experimental design Platform Choice
16-bit TIFF Files
Microarray experiment
(Rspot, Rbkg), (Gspot, Gbkg)
Image analysis
Normalization
Clustering
Statistical Analysis
Data Mining
Pattern Discovery
Classification
Biological verification and interpretation
5
Motivation

Data generated from Microarray experiments are
inherently highly variable.
First, there is the Law of Large Numbers
Any measurement of thousands of values will find
some large differences due to chance (normal
distribution)
However, the average gene does not change its
expression across experiments
Must have replication (e.g. different patients
different experiments) and statistics to show
that measured differences are real.
Second, there are also Systematic Sources of
Variability
e.g. Errors is scanning microarray images,
differences between properties of Cy3 and Cy5
channels, etc
Must have systematic methods for addressing such
errors.

6
Motivation

Normalisation is a general term for a collection
of methods that are directed at reasoning about
and resolving the systematic errors and bias
introduced by microarray experimental platforms
Normalisation methods stand in contrast with the
data analysis methods described in other lectures
(e.g. differential gene expression analysis,
classification and clustering).
Our overall aim is to be able to quantify
measured/calculated variability, differentials
and similarity
Are they biologically significant or just side
effects of the experimental platforms and
conditions?

7
IntroductionSources of Microarray Data
Variability
The measured gene expression in any experiment
includes true gene expression,together with
contributions from many sources of variability

There are several levels of variability in
measured gene expression of a feature.
At the highest level, there is biological
variability in the population from which the
sample derives.
At an experimental level, there is
variability between preparations and labelling of
the sample,
variability between hybridisations of the same
sample to different arrays, and
variability between the signal on replicate
features on the same array.

8
IntroductionSources of Microarray Data
Variability
There are many standard experimental protocols
that biologists need to follow when conducting
their experiments to minimize variability

Population Variation
Whose mRNA are we using? May need to test
different samples in parallel.
May need many replicates to study biological
variation
Sample Treatment
Experimental conditions
Tissue preparation
Target Preparation
RNA isolation need to use use identical amounts
of tissue, identical extraction methods use
minimum number of steps measure amount of RNA
and normalize concentration
Labelling need to account for and measure
incorporation of label and normalize samples to
same concentration
Amount Need to add same amount of label to each
hybridization

9
IntroductionSources of Microarray Data
Variability
Oligos reduce variability of probes compared to
PCR products. In-situ synthesis standardises
probe production and produces better spot quality
and reduces errors in image acquisition

Arrays
Same sample may be hybridized to different arrays
in different labs
PCR products probes prepared through
amplification directly from cells, must add same
amount of product to each spot on filter
Uniformity of spotting must use arraying tool
for filter arrays or robot for microarrays.
Treatment and handling of filters or slides
Hybridization and washing
Time long hybridization ensures that
hybridization goes to completion.
Temperature most hybridisations performed
between 45 and 65 oc
Data acquisition
Image acquisition
Spot and background detection

10
IntroductionBiological and Technical Variability

Biological Variability variation between
individuals in the population and is independent
of the microarray process itself
Population variability can be measured with pilot
studies
Technical Variability is dependent on the
microarray process itself.
Technical variability is measured in calibration
experiments.
In good experiments, technical variation should
be much less than biological variation

11
IntroductionExperimental Design

Tree representation of replicate experiments
The first level is at the level of biological
replicates
This is followed by two independent mRNA
extractions, and reciprocal Cy3 and Cy5 labelling
Finally on each array, each probe is printed in
triplicate.
In this example, each data point in the
experiment is replicated a total of 24 times.
Furthermore, in each microarray experiment, each
gene (each probe or probe set) is really a
separate experiment in its own right

12
IntroductionConducting Good Experiments
13
IntroductionGene Expression Matrices
14
Normalisation Examples Probe Intensity Value
Typical Problem Usually more variability at low
intensity

The raw intensities of signal from each spot on
the array are not directly comparable. Depending
on the types of experiments done, a number of
different approaches to normalization may be
needed. Not all types of normalization are
appropriate in all experiments. Some experiments
may use more than one type of normalization.
Reasonable Assumption intensities of fluorescent
molecules reflect the abundance of the mRNA
molecules generally true but could be
problematic
Example
intensity of gene A spot is 100 units in
normal-tissue array
intensity of gene A spot is 50 units in
cancer-tissue array
Conclusion gene As expression level in normal
issue is significantly higher than in cancer
tissue

15
Normalisation Examples Probe Intensity Value
Images showing examples of how background
intensity can be calculated

Problem? What if the overall background intensity
of the normal-tissue array is 95 units while the
background intensity of cancer-tissue array is 10
units?
Solutions
Subtract background intensity value
Take ratio of spot intensity to background
intensity (preferable)
In both cases have to decide where to measure
background intensity (e.g. local to spot or
globally per chip)
In general, There could be many factors
contributing to the background intensity of a
microarray chip
To compare microarray data across different
chips, data (intensity levels) need to be
normalized to the same level

16
Normalisation Examples Two Colour Arrays

Reasonable Assumption For two colour arrays, in
a self self hybridization, we expect for each
spot Red Green
Problem This is not necessarily true due to
labelling effects, chemistry (dye properties),
scanner properties, etc
Dye Bias in Two-channel microarrays Intensity in
one channel may be higher than the other
Solutions
Dye swapping experiments in first replicate
label control with red and experiment with green
in second replicate swap colours
Calibration Experiments (Self vs. self
Hybridisation)label same extract with both
colours and calculate variation

17
Normalisation Examples Two Colour Arrays

Error correction

y ax
y x

possible ways of correction
dividing all x by a 2. multiplying all y by a
Can easily be extended when regression line is y
axb

18
Normalisation ExamplesRatio of Signal to
Positive Control
How does this approach compare to the affymetrix
PM/MM probes?

Problem Is there any cross hybridisation?
Solution It is often useful to spike the
labelling reaction with some foreign RNA or DNA
that is not normally in the RNA population.
The signal si for gene i would therefore be raw
counts gi divided by the median of the counts for
the vector spots.

19
Normalisation ExamplesRatio of Signal to
Positive Control

Normalization of signal for each gene to a ratio
makes it possible to compare ratios between
experiments, provided that the spiked controls
are the same in all experiments.
Normalization to a positive control is typically
used in single-label experiments. Comparison of
one experiment to another can either be done by
plotting signal si directly on a graph, or
signals from two experiments can be converted
into a ratio, usually by choosing one treatment
as a control.
For example, in a time course, a 0 hour time
point might be chosen, and signal from all other
time points divided by the signal for the 0 hour
time point, to give a ratio.

20
Normalisation ExamplesSpatial variation within
array

Problem Signal varies according to spot location
Particularly corners Less hybridization solution
Also because of print-tip group of robot
Solutions
Calculate ratio to mean or total intensity value
Use Locally Weighted Regression (Lowess)
Use Block-Block Lowess

21
Normalisation ExamplesBetween Array Normalisation

Assumption the overall intensities across two
arrays should be similar
Problem Not always the case
Solution1 Ensure that data points in the
two-intensity coordinate system should be roughly
centered around the diagonal

Solution2 Use total intensity normalization for
large number
22
Normalisation MethodsBetween Array Normalisation

Mean/Median centering mean/median intensity of
every chip brought to same level
Total intensity normalization scaling factor
determined by summing intensities
Spiked-control, housekeeping normalization
(Positive Controls)

23
Normalisation MethodsCentring and Scaling

Data is scaled to ensure that the means and the
standard deviations of all of the distributions
are equal. For each measurement on the array,
subtract the mean measurement of the array and
divide by the standard deviation. Following
centring, the mean measurements on each array
will be zero, and the standard deviation will be
1

24
Normalisation MethodsNormalized ratios usually
expressed as logs

To facilitate easier mathematical handling of the
data, as well as comparisons over a wide range of
expression levels, ratios are usually expressed
as logs.
For example, if a gene is expressed at 4 times
the level in the control than in the mutant, log2
(1/4) -2. A log ratio of 0 is therefore
indicative of a gene whose expression is the same
in both conditions or treatments.

Rg
Ratio Tg
Gg
Rg
log2
Log Ratio log2(Tg)
Gg
25
Normalisation MethodsRegression Normalisation

Regression normalization
Fit the linear regression model y ax b
Test the significance of the intercept b. Fit a
linear regression without b if it is
insignificant.
Transform the data
Problem assumption may not hold due to nonlinear
trend

26
Normalisation MethodsFrom Scatter Plot to MA Plot

Instead of plotting the two intensity values
against one another (Scatter plot), it is common
to use an MA plot
M log2(R/G) ratio of two intensities
A log2SQRT(RG) ½ log2(RG) mean log
intensity of two values

27
Normalisation MethodsLowess Normalization