Introduction to Bioinformatics Microarrays2: Microarray Data Normalisation - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Introduction to Bioinformatics Microarrays2: Microarray Data Normalisation

Description:

Each array location is typically known as a probe and contains ... (Rspot, Rbkg), (Gspot, Gbkg) Data Mining. Classification. Statistical Analysis. Motivation ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 29
Provided by: simonc4
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Bioinformatics Microarrays2: Microarray Data Normalisation


1
Introduction to Bioinformatics Microarrays2
Microarray Data Normalisation
  • Course 341
  • Department of Computing
  • Imperial College, London
  • Moustafa Ghanem

2
Lecture Overview
  • Background and Motivation
  • Introduction
  • Microarray experiments and microarray data
    analysis
  • Sources of variability
  • Experimental design
  • Normalisation Examples
  • Probe intensity values
  • Two colour arrays
  • Positive controls
  • Spatial normalisation within array
  • Between array normalisation
  • Normalisation Methods
  • Total intensity normalisation
  • Scaling and centring
  • Linear regression
  • MA plots and Lowess

3
BackgroundMicroarrays
A Microarray works by exploiting the ability of
mRNA molecule to hybridize to its complementary
DNA probe The mRNA molecules in a target
biological sample are labelled using a
fluorescent dye and applied to the array The
fluorescent label enables the detection of which
probes have hybridised (presence) via the light
emitted from the probe.
  • A Microarray is a device detects the presence and
    abundance of labelled nucleic acids in a
    biological sample.
  • The Microarray consists of a solid surface onto
    which known DNA molecules have been chemically
    bonded at special locations.
  • Each array location is typically known as a probe
    and contains many replicates of the same
    molecule.
  • The molecules in each array location are
    carefully chosen so as to hybridise only with
    mRNA molecules corresponding to a single gene.

4
BackgroundMicroarray Data Analysis
Biological question
Sample Attributes
Experimental design Platform Choice
16-bit TIFF Files
Microarray experiment
(Rspot, Rbkg), (Gspot, Gbkg)
Image analysis
Normalization
Clustering
Statistical Analysis
Data Mining
Pattern Discovery
Classification
Biological verification and interpretation
5
Motivation
  • Data generated from Microarray experiments are
    inherently highly variable.
  • First, there is the Law of Large Numbers
  • Any measurement of thousands of values will find
    some large differences due to chance (normal
    distribution)
  • However, the average gene does not change its
    expression across experiments
  • Must have replication (e.g. different patients
    different experiments) and statistics to show
    that measured differences are real.
  • Second, there are also Systematic Sources of
    Variability
  • e.g. Errors is scanning microarray images,
    differences between properties of Cy3 and Cy5
    channels, etc
  • Must have systematic methods for addressing such
    errors.

6
Motivation
  • Normalisation is a general term for a collection
    of methods that are directed at reasoning about
    and resolving the systematic errors and bias
    introduced by microarray experimental platforms
  • Normalisation methods stand in contrast with the
    data analysis methods described in other lectures
    (e.g. differential gene expression analysis,
    classification and clustering).
  • Our overall aim is to be able to quantify
    measured/calculated variability, differentials
    and similarity
  • Are they biologically significant or just side
    effects of the experimental platforms and
    conditions?

7
IntroductionSources of Microarray Data
Variability
The measured gene expression in any experiment
includes true gene expression,together with
contributions from many sources of variability
  • There are several levels of variability in
    measured gene expression of a feature.
  • At the highest level, there is biological
    variability in the population from which the
    sample derives.
  • At an experimental level, there is
  • variability between preparations and labelling of
    the sample,
  • variability between hybridisations of the same
    sample to different arrays, and
  • variability between the signal on replicate
    features on the same array.

8
IntroductionSources of Microarray Data
Variability
There are many standard experimental protocols
that biologists need to follow when conducting
their experiments to minimize variability
  • Population Variation
  • Whose mRNA are we using? May need to test
    different samples in parallel.
  • May need many replicates to study biological
    variation
  • Sample Treatment
  • Experimental conditions
  • Tissue preparation
  • Target Preparation
  • RNA isolation need to use use identical amounts
    of tissue, identical extraction methods use
    minimum number of steps measure amount of RNA
    and normalize concentration
  • Labelling need to account for and measure
    incorporation of label and normalize samples to
    same concentration
  • Amount Need to add same amount of label to each
    hybridization

9
IntroductionSources of Microarray Data
Variability
Oligos reduce variability of probes compared to
PCR products. In-situ synthesis standardises
probe production and produces better spot quality
and reduces errors in image acquisition
  • Arrays
  • Same sample may be hybridized to different arrays
    in different labs
  • PCR products probes prepared through
    amplification directly from cells, must add same
    amount of product to each spot on filter
  • Uniformity of spotting must use arraying tool
    for filter arrays or robot for microarrays.
  • Treatment and handling of filters or slides
  • Hybridization and washing
  • Time long hybridization ensures that
    hybridization goes to completion.
  • Temperature most hybridisations performed
    between 45 and 65 oc
  • Data acquisition
  • Image acquisition
  • Spot and background detection

10
IntroductionBiological and Technical Variability
  • Biological Variability variation between
    individuals in the population and is independent
    of the microarray process itself
  • Population variability can be measured with pilot
    studies
  • Technical Variability is dependent on the
    microarray process itself.
  • Technical variability is measured in calibration
    experiments.
  • In good experiments, technical variation should
    be much less than biological variation

11
IntroductionExperimental Design
  • Tree representation of replicate experiments
  • The first level is at the level of biological
    replicates
  • This is followed by two independent mRNA
    extractions, and reciprocal Cy3 and Cy5 labelling
  • Finally on each array, each probe is printed in
    triplicate.
  • In this example, each data point in the
    experiment is replicated a total of 24 times.
  • Furthermore, in each microarray experiment, each
    gene (each probe or probe set) is really a
    separate experiment in its own right

12
IntroductionConducting Good Experiments
13
IntroductionGene Expression Matrices
14
Normalisation Examples Probe Intensity Value
Typical Problem Usually more variability at low
intensity
  • The raw intensities of signal from each spot on
    the array are not directly comparable. Depending
    on the types of experiments done, a number of
    different approaches to normalization may be
    needed. Not all types of normalization are
    appropriate in all experiments. Some experiments
    may use more than one type of normalization.
  • Reasonable Assumption intensities of fluorescent
    molecules reflect the abundance of the mRNA
    molecules generally true but could be
    problematic
  • Example
  • intensity of gene A spot is 100 units in
    normal-tissue array
  • intensity of gene A spot is 50 units in
    cancer-tissue array
  • Conclusion gene As expression level in normal
    issue is significantly higher than in cancer
    tissue

15
Normalisation Examples Probe Intensity Value
Images showing examples of how background
intensity can be calculated
  • Problem? What if the overall background intensity
    of the normal-tissue array is 95 units while the
    background intensity of cancer-tissue array is 10
    units?
  • Solutions
  • Subtract background intensity value
  • Take ratio of spot intensity to background
    intensity (preferable)
  • In both cases have to decide where to measure
    background intensity (e.g. local to spot or
    globally per chip)
  • In general, There could be many factors
    contributing to the background intensity of a
    microarray chip
  • To compare microarray data across different
    chips, data (intensity levels) need to be
    normalized to the same level

16
Normalisation Examples Two Colour Arrays
  • Reasonable Assumption For two colour arrays, in
    a self self hybridization, we expect for each
    spot Red Green
  • Problem This is not necessarily true due to
    labelling effects, chemistry (dye properties),
    scanner properties, etc
  • Dye Bias in Two-channel microarrays Intensity in
    one channel may be higher than the other
  • Solutions
  • Dye swapping experiments in first replicate
    label control with red and experiment with green
    in second replicate swap colours
  • Calibration Experiments (Self vs. self
    Hybridisation)label same extract with both
    colours and calculate variation

17
Normalisation Examples Two Colour Arrays
  • Error correction

y ax
y x
  • possible ways of correction
  • dividing all x by a 2. multiplying all y by a
  • Can easily be extended when regression line is y
    axb

18
Normalisation ExamplesRatio of Signal to
Positive Control
How does this approach compare to the affymetrix
PM/MM probes?
  • Problem Is there any cross hybridisation?
  • Solution It is often useful to spike the
    labelling reaction with some foreign RNA or DNA
    that is not normally in the RNA population.
  • The signal  si for gene i would therefore be raw
    counts gi divided by the median of the counts for
    the vector spots.

19
Normalisation ExamplesRatio of Signal to
Positive Control
  • Normalization of signal for each gene to a ratio
    makes it possible to compare ratios between
    experiments, provided that the spiked controls
    are the same in all experiments.
  • Normalization to a positive control is typically
    used in single-label experiments. Comparison of
    one experiment to another can either be done by
    plotting signal si  directly on a graph, or
    signals from two experiments can be converted
    into a ratio, usually by choosing one treatment
    as a control.
  • For example, in a time course, a 0 hour time
    point might be chosen, and signal from all other
    time points divided by the signal for the 0 hour
    time point, to give a ratio.

20
Normalisation ExamplesSpatial variation within
array
  • Problem Signal varies according to spot location
  • Particularly corners Less hybridization solution
  • Also because of print-tip group of robot
  • Solutions
  • Calculate ratio to mean or total intensity value
  • Use Locally Weighted Regression (Lowess)
  • Use Block-Block Lowess

21
Normalisation ExamplesBetween Array Normalisation
  • Assumption the overall intensities across two
    arrays should be similar
  • Problem Not always the case
  • Solution1 Ensure that data points in the
    two-intensity coordinate system should be roughly
    centered around the diagonal

Solution2 Use total intensity normalization for
large number
22
Normalisation MethodsBetween Array Normalisation
  • Mean/Median centering mean/median intensity of
    every chip brought to same level
  • Total intensity normalization scaling factor
    determined by summing intensities
  • Spiked-control, housekeeping normalization
    (Positive Controls)

23
Normalisation MethodsCentring and Scaling
  • Data is scaled to ensure that the means and the
    standard deviations of all of the distributions
    are equal. For each measurement on the array,
    subtract the mean measurement of the array and
    divide by the standard deviation. Following
    centring, the mean measurements on each array
    will be zero, and the standard deviation will be
    1

24
Normalisation MethodsNormalized ratios usually
expressed as logs
  • To facilitate easier mathematical handling of the
    data, as well as comparisons over a wide range of
    expression levels, ratios are usually expressed
    as logs.
  • For example, if a gene is expressed at 4 times
    the level in the control than in the mutant, log2
    (1/4) -2. A log ratio of 0 is therefore
    indicative of a gene whose expression is the same
    in both conditions or treatments.

Rg
Ratio Tg
Gg
Rg
log2
Log Ratio log2(Tg)
Gg
25
Normalisation MethodsRegression Normalisation
  • Regression normalization
  • Fit the linear regression model y ax b
  • Test the significance of the intercept b. Fit a
    linear regression without b if it is
    insignificant.
  • Transform the data
  • Problem assumption may not hold due to nonlinear
    trend

26
Normalisation MethodsFrom Scatter Plot to MA Plot
  • Instead of plotting the two intensity values
    against one another (Scatter plot), it is common
    to use an MA plot
  • M log2(R/G) ratio of two intensities
  • A log2SQRT(RG) ½ log2(RG) mean log
    intensity of two values

27
Normalisation MethodsLowess Normalization
  • Locally Weighted Least Square Regression
  • Assumption Variation in data is intensity
    dependent
  • Smoothes the intensity function
  • Lowess is typically applied to M-A plots

28
Summary
  • Normalisation used to identify if variation is
    due to experimental conditions.
  • Typical sources of variation are
  • Population, Sample, Target, Array (Probe),
    Hybridisation, Data Acquisition
  • Different Normalisation Examples
  • Probe intensity values
  • Two colour arrays
  • Positive controls
  • Spatial normalisation within array
  • Between array normalisation
  • Common Normalisation Methods
  • Mainly scaling factors and regression
Write a Comment
User Comments (0)
About PowerShow.com