Title: Two%20Color%20Microarrays
1Two Color Microarrays
- EPP 245
- Statistical Analysis of
- Laboratory Data
2Two-Color Arrays
- Two-color arrays are designed to account for
variability in slides and spots by using two
samples on each slide, each labeled with a
different dye. - If a spot is too large, for example, both signals
will be too big, and the difference or ratio will
eliminate that source of variability
3Dyes
- The most common dye sets are Cy3 (green) and Cy5
(red), which fluoresce at approximately 550 nm
and 649 nm respectively (red light 700 nm,
green light 550 nm) - The dyes are excited with lasers at 532 nm (Cy3
green) and 635 nm (Cy5 red) - The emissions are read via filters using a CCD
device
4(No Transcript)
5(No Transcript)
6(No Transcript)
7File Format
- A slide scanned with Axon GenePix produces a file
with extension .gpr that contains the
resultshttp//www.axon.com/gn_GenePix_File_Forma
ts.html - This contains 29 rows of headers followed by 43
columns of data (in our example files) - For full analysis one may also need a .gal file
that describes the layout of the arrays
8"Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F635 Median" "F635 Mean" "F635 SD" "B635 Median" "B635 Mean" "B635 SD" " gt B6351SD" " gt B6352SD" "F635 Sat." "F532 Median" "F532 Mean" "F532 SD" "B532 Median" "B532 Mean" "B532 SD" " gt B5321SD" " gt B5322SD" "F532 Sat." "Ratio of Medians (635/532)" "Ratio of Means (635/532)" "Median of Ratios (635/532)" "Mean of Ratios (635/532)" "Ratios SD (635/532)" "Rgn Ratio (635/532)" "Rgn R² (635/532)" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log Ratio (635/532)" "F635 Median - B635" "F532 Median - B532" "F635 Mean - B635" "F532 Mean - B532" "Flags"
9Analysis Choices
- Mean or median foreground intensity
- Background corrected or not
- Log transform (base 2, e, or 10) or glog
transform - Log is compatible only with no background
correction - Glog is best with background correction
10Array normalization
- Array normalization is meant to increase the
precision of comparisons by adjusting for
variations that cover entire arrays - Without normalization, the analysis would be
valid, but possibly less sensitive - However, a poor normalization method will be
worse than none at all.
11Possible normalization methods
- We can equalize the mean or median intensity by
adding or multiplying a correction term - We can use different normalizations at different
intensity levels (intensity-based normalization)
for example by lowess or quantiles - We can normalize for other things such as print
tips
12Example for Normalization
Group 1 Group 1 Group 2 Group 2
Array 1 Array 2 Array 3 Array 4
Gene 1 1100 900 425 550
Gene 2 110 95 85 110
Gene 3 80 65 55 80
13gt normex lt- matrix(c(1100,110,80,900,95,65,425,85,
55,550,110,80),ncol4) gt normex ,1 ,2
,3 ,4 1, 1100 900 425 550 2, 110 95
85 110 3, 80 65 55 80 gt group lt-
as.factor(c(1,1,2,2)) gt anova(lm(normex1,
group)) Analysis of Variance Table Response
normex1, Df Sum Sq Mean Sq F value
Pr(gtF) group 1 262656 262656 18.888
0.04908 Residuals 2 27812 13906
--- Signif. codes 0 ' 0.001 ' 0.01
' 0.05 .' 0.1 ' 1
14gt anova(lm(normex2, group)) Analysis of
Variance Table Response normex2,
Df Sum Sq Mean Sq F value Pr(gtF) group 1
25.0 25.0 0.1176 0.7643 Residuals 2 425.0
212.5 gt anova(lm(normex3,
group)) Analysis of Variance Table Response
normex3, Df Sum Sq Mean Sq F value
Pr(gtF) group 1 25.0 25.0 0.1176
0.7643 Residuals 2 425.0 212.5
15Additive Normalization by Means
Group 1 Group 1 Group 2 Group 2
Array 1 Array 2 Array 3 Array 4
Gene 1 975 851 541 608
Gene 2 -15 46 201 168
Gene 3 -45 16 171 138
16gt cmn lt- apply(normex,2,mean) gt cmn 1 430.0000
353.3333 188.3333 246.6667 gt mn lt- mean(cmn) gt
normex - rbind(cmn,cmn,cmn)mn ,1
,2 ,3 ,4 cmn 974.58333 851.25 541.25
607.9167 cmn -15.41667 46.25 201.25 167.9167 cmn
-45.41667 16.25 171.25 137.9167 gt normex.1 lt-
normex - rbind(cmn,cmn,cmn)mn
17gt anova(lm(normex.11, group)) Analysis of
Variance Table Response normex.11,
Df Sum Sq Mean Sq F value Pr(gtF) group 1
114469 114469 23.295 0.04035 Residuals 2
9828 4914 gt
anova(lm(normex.12, group)) Analysis of
Variance Table Response normex.12,
Df Sum Sq Mean Sq F value Pr(gtF) group 1
28617.4 28617.4 23.295 0.04035 Residuals 2
2456.9 1228.5 gt
anova(lm(normex.13, group)) Analysis of
Variance Table Response normex.13,
Df Sum Sq Mean Sq F value Pr(gtF) group 1
28617.4 28617.4 23.295 0.04035 Residuals 2
2456.9 1228.5
18Multiplicative Normalization by Means
Group 1 Group 1 Group 2 Group 2
Array 1 Array 2 Array 3 Array 4
Gene 1 779 776 687 679
Gene 2 78 82 137 136
Gene 3 57 56 89 99
19gt normexmn/rbind(cmn,cmn,cmn) ,1
,2 ,3 ,4 cmn 779.16667 775.82547
687.33407 679.13851 cmn 77.91667 81.89269
137.46681 135.82770 cmn 56.66667 56.03184
88.94912 98.78378 gt normex.2 lt-
normexmn/rbind(cmn,cmn,cmn) gt anova(lm(normex.21
, group)) Response normex.21,
Df Sum Sq Mean Sq F value Pr(gtF) group
1 8884.9 8884.9 453.71 0.002197 Residuals 2
39.2 19.6 gt
anova(lm(normex.22, group)) Response
normex.22, Df Sum Sq Mean Sq F value
Pr(gtF) group 1 3219.7 3219.7 696.33
0.001433 Residuals 2 9.2 4.6
gt anova(lm(normex.23,
group)) Response normex.23, Df
Sum Sq Mean Sq F value Pr(gtF) group 1
1407.54 1407.54 57.969 0.01682 Residuals 2
48.56 24.28
20Multiplicative Normalization by Medians
Group 1 Group 1 Group 2 Group 2
Array 1 Array 2 Array 3 Array 4
Gene 1 1000 947 500 500
Gene 2 100 100 100 100
Gene 3 73 68 65 73
21gt cmd lt- apply(normex,2,median) gt cmd 1 110 95
85 110 gt normex.3 lt- normexmd/rbind(cmd,cmd,cmd)
gt normex.3 ,1 ,2 ,3
,4 cmd 1000.00000 947.36842 500.00000
500.00000 cmd 100.00000 100.00000 100.00000
100.00000 cmd 72.72727 68.42105 64.70588
72.72727 gt anova(lm(normex.31,
group)) Response normex.31, Df Sum
Sq Mean Sq F value Pr(gtF) group 1
224377 224377 324 0.003072 Residuals 2
1385 693 gt
anova(lm(normex.32, group)) Response
normex.32, Df Sum Sq Mean Sq F value
Pr(gtF) group 1 0 0
Residuals 2 0 0 gt
anova(lm(normex.33, group)) Response
normex.33, Df Sum Sq Mean Sq F value
Pr(gtF) group 1 3.451 3.451 0.1665
0.7228 Residuals 2 41.443 20.722
22Intensity-based normalization
- Normalize by means, medians, etc., but do so only
in groups of genes with similar expression
levels. - lowess is a procedure that produces a running
estimate of the middle, like a robustified mean - If we subtract the lowess of each array and add
the average of the lowesss, we get the lowess
normalization
23norm lt- function(mat1) mat2 lt-
as.matrix(mat1) p lt- dim(mat2)1 n lt-
dim(mat2)2 cmean lt- apply(mat2,2,mean)
cmean lt- cmean - mean(cmean) mnmat lt-
matrix(rep(cmean,p),byrowT,ncoln)
return(mat2-mnmat)
24lnorm lt- function(mat1,span.1) mat2 lt-
as.matrix(mat1) p lt- dim(mat2)1 n lt-
dim(mat2)2 rmeans lt- apply(mat2,1,mean)
rranks lt- rank(rmeans,ties.method"first")
matsort lt- mat2order(rranks), r0 lt- 1p
lcol lt- function(x) lx lt-
lowess(r0,x,fspan)y lmeans lt-
apply(matsort,2,lcol) lgrand lt-
apply(lmeans,1,mean) lgrand lt-
matrix(rep(lgrand,n),byrowF,ncoln) matnorm0
lt- matsort-lmeanslgrand matnorm1 lt-
matnorm0rranks, return(matnorm1)
25(No Transcript)
26(No Transcript)
27(No Transcript)