Normalisation and Analysis of the Affymetrix Data - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Normalisation and Analysis of the Affymetrix Data

Description:

NASC. Normalisation and Analysis of the Affymetrix Data. David J Craigon. NASC ... Analyse down to one number per gene. NASC. What do we want to find out? ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 52
Provided by: DavidJC150
Category:

less

Transcript and Presenter's Notes

Title: Normalisation and Analysis of the Affymetrix Data


1
Normalisation and Analysis of the Affymetrix Data
  • David J Craigon

2
What I am not going to talk about
  • General microarray topics
  • Biology

3
The introduction
4
Affymetrix workflow
Biological sample of some sort
Amplify
Extract mRNA
Label and Fragment
Analyse down to one number per gene
Hybridise to a chip
Scan chip
Find features in scan
5
What do we want to find out?
  • We want to find out how much mRNA of each type
    was in the original sample

6
Biological sample of some sort
Amplify
Extract mRNA
Label and Fragment
Each of these steps need to be proportional
Analyse down to one number per gene
Hybridise to a chip
Scan chip
Find features in scan
7
Biological sample of some sort
Amplify
Extract mRNA
Label and Fragment
This talk is about this bit
Analyse down to one number per gene
Hybridise to a chip
Scan chip
Find features in scan
8
Affymetrix Chips
  • On an Affymetrix chip each oligo takes up a
    square
  • The RNA extracted from the plant is first
    amplified. Then is labelled. This allows the
    scanner to see it.
  • The RNA is then hybridised to the array. Matching
    RNA for that square sticks to the square, and can
    be seen by the scanner.
  • By observing the intensity of a square, the
    amount of RNA bound to that oligo can be
    calculated

9
Design of the oligos
5
3
  • Series of oligos designed for one gene
  • Each oligo comes in two versions

10
Match and mismatch
  • The exact match is a section of the mRNA sequence
    you wish to probe for
  • The mismatch is identical except for one base
    difference from its exact match counterpart, and
    is used to calculate a background.
  • There are typically 11 probe pairs scattered
    around the chip- called a probe set.
  • By combining the expression values for a probe
    set, a value for the expression of mRNA can be
    found.

11
EXP, DAT, CEL, CHP files
  • EXP file- experiment file
  • DAT file- the picture- like a TIFF.
  • CEL file- a unnormalised number for each probe.
  • CHP file- one number for each probeset

12
What do you think of it so far?
  • So far
  • What we want to find out is the amount of each
    mRNA in the starting sample.
  • The mRNA hybridises to a series of probes.
  • We can get a number for each probe from the CEL
    file.

13
The rest of this talk
  • We are going to go through four distinct ways of
    determining Signal values from CEL file data
  • MAS 4
  • MAS 5
  • MBEI (dChip)
  • RMA

14
Mismatch probes in detail
15
All about mismatch probes
ATGCTGTACAATCGCTTGATACTGG
Mismatch probe
ATGCTGTACAATAGCTTGATACTGG
Perfect match probe
ATGCTGTACAATAGCTTGATACTGG
Target sequence
16
Why do we have mismatch probes?
  • Mismatch probes (MM) are trying to detect
    background.
  • The mismatch probes are supposed to detect things
    that are close but not an exact match.
  • It is assumed that these things also bind to the
    perfect match (PM), erroneously.

17
Yes folks, its Expression Method No 1!
  • The original method that was used by MAS 4

18
MAS 4 Algorithm
  • For a probe set
  • A is the set of probes you havent thrown away
    due to being outliers
  • j0 to the number of probesets
  • In English, the formula is very simple- throw
    away the outliers, then simply average the
    differences between PM and MM of the probes
    youve got left.

19
Problems with the MAS4 algorithm
  • Better fit with log(PM) preferred

20
Expression Method No 2!
  • MAS 5 method.
  • Still used by GCOS- the current Affymetrix
    supplied method.

21
Normalisation Procedure
  • Before any work is done with the CEL data, the
    CEL file is normalised.
  • Corrects for intra-chip differences

22
Normalisation Procedure
  • Divides the chip into K zones (by default, 16
    zones)
  • Select the lowest 2 of probes (of any
    description)
  • Assume these are switched off

23
Normalisation Procedure
  • Calculate Mean, SD of these switched off probes
    for each section.
  • Used as background.
  • Each points local background weighted difference
    between each zone
  • Subtract background from each probe.

24
MAS 5 Algorithm
  • For a probe set
  • Tukeys Biweight is an average that minimises the
    effect of outliers.
  • IM is the ideal mismatch. This is the same as
    the MM intensity, except in the case where the MM
    is greater than the PM, in which case a new MM
    values is calculated based on other probes nearby

25
(No Transcript)
26
MAS4 to MAS5 comparison
27
Signal Normalisation
  • To try to eliminate chip-to-chip variability.
  • Sort the signal values and remove the top and
    bottom 2
  • Calculate a scaling factor to adjust this middle
    96s mean to 100 (configurable, and variable)
  • Multiply all signal values by the scaling factor
  • Affymetrix state that scaling factors should be
    similar for arrays to be comparable

28
Expression Method No 3!
  • The MBEI method of Li and Wong.
  • Found in dChip, so often known as the dChip
    method.

29
Observation
30
Observation
  • The probes are vastly variable in effectiveness
  • Li and Wong point out that the difference between
    probes is much greater than the difference
    between arrays!
  • They contend that any proper model should take
    this into account.

31
MBEI model
32
MBEI model
Baseline response due to noise
Rate of increase of MM probe as signal increases
(really? See later)
Error term
Rate of increase of PM probe as signal increases
(separate for each probe)
Expression value (the thing we are interested in)
33
Model is fitted over all chips
  • Processes an entire experiment at once
  • Model is fitted using residual sum of squares
  • In their paper on the subject they talk a lot
    about how you can use this model to detect
    outliers, scratches on the array, etc. Im not
    going to talk about that.

34
RMA paper observations
35
A spiked in experiment from the RMA paper
  • It would be useful if we had an experiment where
    we knew the answer
  • Run a series of experiments with a fixed
    background, but spike in some artificial RNA for
    a series of probes, at different concentrations.

36
Mismatch probes
  • Mismatch probes are supposed to calculate what
    similar things hybridise to probes, to detect
    background for PM probes.
  • The background should be at a relatively low
    level most of the time

37
Yikes!
  • Actually MMgtPM between 33 and 40 of the time!

38
Mismatch probes
  • Mismatch probes are supposed to calculate what
    similar things hybridise to probes, to detect
    background for PM probes.
  • The amount of this stuff shouldnt depend on how
    much interesting RNA there is about

39
Man the lifeboats!
40
Some observations from the RMA paper
perfect match probes appear to be additive (in
the log scale)
41
  • The amount of signal does affect mismatch probes.
  • Clearly some of the useful mRNA is hybidising to
    the MM probes.
  • This kind of shock has led to some people
    abandoning the use of MM probes altogether!

42
Whats going on?
43
Perfect match probes
in RMA, in the log scale, they assume that
probe effects are effectively additive
44
How RMA (roughly) works
45
RMA process
  • Normalise array
  • Fit model

46
Normalisation procedure involves adusting
distributions
47
RMA process
  • Normalise array
  • Fit model

48
Fit model
  • Correct background using estimate from all
    mismatch probes for each array.
  • Fit model

Additive probe affinitive effect for this probe
over all slides
Background corrected PM value
Log scale expression value
49
In summary then
  • There are various ways you can get from a CEL
    file to expression estimates.
  • These models are derived by considering the
    behaviour of PM and MM probes
  • Both dChip and RMA show better results than the
    standard Affy algorithm
  • MM probes in particular behave contrary to how
    you would expect.

50
Enough theory- how do you actually do these
things?
  • The MAS5 algorithm can be performed using (erm)
    MAS5!
  • dChip is a piece of software that will be making
    an appearance later this afternoon, and can do
    the MBEI algorithm
  • The RMA authors have a piece of software called
    RMAExpress, which does RMA for Windows.
  • All of these algorithms can be done using the
    Bioconductor package in R.

51
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com