Microarray Basics - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Microarray Basics

Description:

Microarray Basics Part 1: Choosing a platform, setting up, data preprocessing Experimental design What type of microarray What overall design strategy How many ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 48
Provided by: clinlabsP
Category:

less

Transcript and Presenter's Notes

Title: Microarray Basics


1
Microarray Basics
  • Part 1 Choosing a platform, setting up, data
    preprocessing

2
Experimental design
  • What type of microarray
  • What overall design strategy
  • How many replicates

3
Type of Microarray
One colour
Two colour
cDNA
Long oligo
Short oligo
Genome wide
Custom
availability, cost, represented genes, need,
perceived accuracy/reproducibility
4
Experimental Design Strategies
5
How many replicates?
True situation
Not diff expressed
Diff expressed
correct
Type 2 error
NS
(power)
Your call
correct
Type 1 error
S
(confidence)
Technical replicates do NOT count as different
samples in the power calculation
6
Power analysis requires decisions about
Difference in mean that you are trying to detect
The std dev of the population variability
Power you are trying to achieve
Significance level that you are trying to achieve
Experimental design
You have a 10,000 gene chip, and want to identify
95 of the genes that are 2 fold up or down
regulated in samples following treatment. You
will tolerate 1 false positive call out of the
10,000 genes tested. The coefficient of
variability in your population is 50. You are
doing a paired analysis.
One can conclude that you will need 22 patients
7
Technical replicates
  • Most publications recommend at least 3 if that is
    possible
  • These are considered to be replicates at the
    level of the experimental platform
  • Beware of doing 2 now and hoping to add one more
    later
  • In downstream analysis, generally suggested to
    use the average of technical replicates- these
    are not different samples for analysis

8
RNA required to get started
  • Source of both experimental and reference RNA
  • Will need about 10-20ug of total RNA from each
    source for each experiment or chip
  • This RNA needs to be of high quality
  • How do you check quality?

9
Common sources of RNA
Cultured animal cells generally easy to disrupt
and get large amounts of high quality RNA
Animal tissues some require harsh disruption
treatments (such as soft tissues like kidney or
liver) and some may require addition treatments
(such as fatty tissues or fibrous tissues that
may require more stringent lysis)
Blood may be influence by anticoagulant in
collection system, and also seems to contain
enzyme inhibitors
Plant material some metabolites make
purification difficult- extractions may also be
highly viscous
Bacteria may want to consider stabilization
10
Checking RNA quality
  • Conventional methods include agarose gel
    electrophoresis to look for evidence of
    degradation
  • Spectrophotometric readings to give an idea of
    purity
  • Bioanalyzer to provide scan- integrity and
    quantity measurements

11
Provides an RIN
Provides a
Requires 1 µl of 50ng/µl stock
12
RNA amplification
  • When quantity of RNA is limited, may have to
    consider amplification
  • Several strategies, but need to decide up front
    if you want sense or antisense amplified material

13
(No Transcript)
14
(No Transcript)
15
What do you get back after an experiment?
  • TIFF images- one image for each fluor used in the
    experiment- same chip scanned twice (or more
    times if multiple scans were done to compensate
    for intensity)
  • Spreadsheet of quantitated data

16
TIFF images
  • Generally named as bar code_fluor_PMT
    setting_laser setting
  • These settings will not necessarily be the same
    for your two scans from the same chip- they are
    manipulated to try to produce scans of even
    intensity from the two fluors
  • The final image should have only a few white
    spots over the whole array- these represent
    saturated spots

17
How can you tell anything about the quality of
your data?
  • Easiest way to start is to look at your TIFF
    images
  • Look for blank areas on the slide
  • Look for areas where one fluor consistently is
    brighter than the other
  • Look for gradients of intensity
  • Differentiate between artifacts introduced by
    slide quality and those by RNA quality and those
    by experimental procedure

18
Slide issues- printing
  • Presence of donuts
  • Smeared spots
  • Scratches on surface of slide
  • Non circular spots
  • Spots off the grid
  • No signals in areas
  • Consistent problems with the same area of each
    subarray

19
RNA quality issues
  • General low intensity
  • Consistent problems with one sample, regardless
    of fluor used
  • High level of background-
  • grainy over entire slide

20
Experimental issues
  • One fluor consistently not giving good signal
    regardless of RNA sample labelled
  • High areas of local background
  • not covering entire slide
  • Obvious intensity gradients
  • Bubbles over surface of chip

21
  • After looking at your images you should have a
    sense of whether or not these data are likely to
    be clean and high enough quality to warrant
    proceeding
  • If not you need to try to determine where the
    problem originates

22
Image processing
  • Choice of methods for quantitating image
  • Fixed circle
  • Good for arrays with regular sizes of spots
  • Variable circle
  • Better for arrays with irregular sizes
  • Histogram
  • Best for arrays with irregular sizes and shapes

23
Data quantitation
  • The images are quantitated, generating a lengthy
    spreadsheet
  • This is done in the facility using QuantArray,
    but can be done using other freeware (Scanalzye)
    or commercial software
  • The output can generally be opened in Excel for
    first pass manipulation of data

24
QuantArray output
  • QA generates a series of columns that many people
    find confusing
  • In general, it provides the data in two ways on a
    single sheet- the first method is showing one
    channel as a proportion of the other, the second
    method provides absolute pixel counts for each
    channel

25
Information about the experiment
Data presented as ratios
Raw quantitated data
26
(No Transcript)
27
(No Transcript)
28
Locator and identifier columns
  • A unique number assigned to that spot
  • B Row of subgrid
  • C Column of subgrid
  • D Row of spot within subgrid
  • E Column of spot within subgrid
  • F Gene identification
  • G x coordinate of each spot
  • H y coordinate of each spot

29
Spot Values
  • I/U intensity of signal in ch1/ch2
  • J/V intensity of background in ch1/ch2
  • K/W std dev of intensity of signal in ch1/ch2
  • L/X std dev of background of signal in ch
    1/ch2

30
Quality control measurements
  • M/Y spot diameter
  • N/Z spot area
  • O/AA spot footprint
  • P/AB spot circularity
  • Q/AC spot uniformity
  • R/AD background uniformity
  • S/AE signal to noise ratio

31
Data Cleaning
Are there flagged spots?
-may see flags in last column- these are added
by user during quantitation
Are there areas of the images that you just
wouldnt trust?
Are there saturated spots?
Have the option of removing, recalculating,
ignoring , flagging or resetting the results of
these spots so that they dont interfere with
downstream analysis
At this stage, may also want to background
subtract the raw intensities
32
On chip controls and how they behave
  • Blank spots generally 3XSSC (print buffer)
  • Expect no signal- can use the average or median
    intensity of these spots as the lower cutoff for
    what represents a real signal
  • However not all empty spots are the same on some
    chips
  • Possibility that there is carryover from
    non-empty spots printed with the same pin

33
On chip controls
  • Multiple spots of the same gene
  • In general if it is exactly the same sequence,
    can assess the variability of these spots to
    assess artifacts of geography on the chip
  • If it is not the same sequence, less
    straightforward

34
On chip controls
  • Housekeeping genes if you can identify a set of
    genes that should remain at constant expression,
    can use these to standardize the two channels
  • to correctly identify such genes is difficult
  • May also have exogenous controls that can be
    added, but must identify these prior to
    hybridizing the slides

35
Log transformation of data
Most data bunched in lower left
corner Variability increases with intensity
Data are spread more evenly Variability is more
even
36
Within array normalization
In two colour arrays, are measuring two different
samples, labelled in two different reactions with
two different fluors and measured using two
different lasers at two different
wavelengths In addition, dealing with the
distribution of spots across a relatively large
surface
Need to try to eliminate some of these potential
sources of variation so that the variation that
is left is more likely to be due to biological
effects
37
Dye Bias
  • The two dyes incorporate differently into DNA of
    different abundance
  • The two dyes may have different emission
    responses to the laser at different abundances
  • The two dye emissions may be measured by the PMT
    differently at different intensities
  • The intensities of the dyes may vary over the
    surface of the slide, but not in synch, as the
    focus of each laser is separate

38
Correcting for dye bias
  • Global normalization using median or mean
  • Linear regression of Cy3 against Cy5
  • Linear regression of the log ratio against the
    average intensity (MA plots)
  • Non linear regression of the log ratio against
    the average intensity (loess)
  • assumption that most genes are not
    differentially expressed

39
Simple global normalization to try to fit the data
Slope does not equal 1 means one channel responds
more at higher intensity
Non zero intercept means one channel is
consistently brighter
Non straight line means non linearity in
intensity responses of two channels
40
Linear regression of Cy3 against Cy5
41
MA plots
Regressing one channel against the other has the
disadvantage of treating the two sets of signals
separately
Also suggested that the human eye has a harder
time seeing deviations from a diagonal line than
a horizontal line
MA plots get around both these issues
Basically a rotation and rescaling of the data
A (log2R log2G)/2
X axis
M log2R-log2G
Y axis
42
Scatterplot of intensities
MA plot of same data
43
Non linear normalization
Normalization that takes into account intensity
effects
Lowess or loess is the locally weighted
polynomial regression
User defines the size of bins used to calculate
the best fit line
Taken from Stekal (2003) Microarray Bioinformatics
44
Adjusted values for the x axis (average intensity
for each feature) calculated using the loess
regression
Should now see the data centred around 0 and
straight across the horizontal axis
45
Spatial defects over the slide
  • In some cases, you may notice a spatial bias of
    the two channels
  • May be a result of the slide not lying completely
    flat in the scanner
  • This will not be corrected by the methods
    discussed before

46
Regressions for spatial bias
  • Carry out normal loess regression but treat each
    subgrid as an entire array (block by block loess)
  • Corrects best for artifacts introduced by the
    pins, as opposed to artifacts of regions of the
    slide
  • Because each subgrid has relatively few spots,
    risk having a subgrid where a substantial
    proportion of spots are really differentially
    expressed- you will lose data if you apply a
    loess regression to that block
  • May also perform a 2-D loess- plot log ratio for
    each feature against its x and y coordinates and
    perform regression

47
Acknowledgements
  • Perseus Missirlis
  • Natasha Gallo
  • Jim Gore
  • Jennifer Kreiger
  • Scott Davey
Write a Comment
User Comments (0)
About PowerShow.com