Understanding the design matrix in linear models for microarray experiments Natalie Thorne

About This Presentation

Title:

Understanding the design matrix in linear models for microarray experiments Natalie Thorne

Description:

Many thanks to Terry Speed, Gordon Smyth, Jean (Yee Hwa) Yang, Ingrid Lonnstedt, ... Based on your idea about what is measured in each sample type (in relation to ... – PowerPoint PPT presentation

Number of Views:597

Avg rating:3.0/5.0

Slides: 82

Provided by: mrc80

Category:

more less

Transcript and Presenter's Notes

Title: Understanding the design matrix in linear models for microarray experiments Natalie Thorne

1
Understanding the design matrix in linear models
for microarray experimentsNatalie Thorne

Many thanks to Terry Speed, Gordon Smyth, Jean
(Yee Hwa) Yang, Ingrid Lonnstedt, Matthew Ritchie
for sharing their teaching material with me.

The Hutchison/MRC Research Center
2
Linear models from experimental design to model
3
Specifying the design
Possible parameters
Sample types
1. What differences are important?
1. Represent the effect measured by each sample
type.
1
A
Single-channel representation
Log-ratio representation
Ref
2
B
3
4
Specifying the design
Possible parameters
Sample types
1. What differences are important?
1. Represent the effect measured by each sample
type. A baseline a B baseline b Ref
baseline
1
A
Ref
2
B
3
Based on your idea about what is measured in each
sample type (in relation to the other sample
types), we do this to help you understand and
interpret results from your log-ratios
5
Specifying the design
Possible parameters
Sample types
1. What differences are important?
1. Represent the effect measured by each sample
type. A baseline a B baseline
b Ref baseline
A - Ref baseline a a -(baseline) B
- Ref baseline b b -(baseline) A -
B baseline a - a-b (baseline b)
1
A
Ref
2
B
3
Parameters in two-colour experiments need to be
representative of log-ratio comparisons (this is
the data that you would typically model). Once
youve defined the effects in each sample type,
it is fairly easy to define possible parameters
for your model.
6
Specifying the design
Possible parameters
Choose parameters
Samples
1. What differences are important?
1. Parameters must be independent!
1. Represent the samples by the effects present
in each. A baseline a B baseline
b Ref baseline
A - Ref a B - Ref b A - B a-b
1
A
Ref
2
B
3
7
Specifying the design
Possible parameters
Choose parameters
Samples
1. What differences are important?
1. Parameters must be independent!
1. Represent the samples by the effects present
in each. A baseline a B baseline
b Ref baseline
These two are independent
A - Ref a B - Ref b A - B a-b
1
A
Ref
2
X
B
3
But this parameter is not independent i.e. a-b
(a) - (b)
8
Specifying the design
Possible parameters
Choose parameters
Samples
1. What differences are important?
1. Parameters must be independent!
1. Represent the samples by the effects present
in each. A baseline a B baseline
b Ref baseline
These two are independent
A - Ref a B - Ref b A - B a-b
1
A
X
Ref
2
B
3
But this parameter is not independent i.e. b
(a-b) - (a)
A definition of independent parameters No
combination of the parameters can equal any of
the paramters
9
Specifying the design
Choose parameters
Possible parameters
Samples
1. Parameters must be independent!
1. Represent the samples by the effects present
in each. A baseline a B baseline
b Ref baseline
1. What differences are important?
A - Ref a B - Ref b A - B a-b
A - Ref , a B - Ref , b
1
A
Ref
2
B
3
Now, specify the design and the parameters that
youve chosen as a matrix
10
Specifying the design as a matrix
Parameters
A - Ref
B - Ref
1
slide 1
A
Observations
Ref
2
slide 2
B
3
slide 3
Represent log-ratio from each slide by a
parameter gt specify the model for your data
11
Specifying the design as a matrix
Parameters
A - Ref
B - Ref
1

slide 1
A
Observations
Ref
2
slide 2
B
3
slide 3
Represent log-ratio from each slide by a
parameter gt specify the model for your data
12
Specifying the design as a matrix
Parameters
A - Ref
B - Ref
1

0
-1 0
0 1

slide 1
A
Observations
Ref
2
slide 2
B
3
slide 3
Represent log-ratio from each slide by a
parameter gt specify the model for your data
13
Specifying the design as a matrix
Parameters
A - Ref
B - Ref
1

0
-1 0
0 1

slide 1
A
Observations
Ref
2
slide 2
B
3
slide 3
Represent log-ratio from each slide by a
parameter gt specify the model for your data
14
Specifying the design as a matrix
Parameters
A - Ref
B - Ref
1

0
-1 0
0 1

slide 1
A
Observations
Ref
2
slide 2
B
3
slide 3
This is called the design matrix
15
Write the model using design matrix
A - Ref
B - Ref
1
slide 1

0
-1 0
0 1

A
Ref
2
slide 2
B
3
slide 3
16
Write the model using design matrix
A - Ref
1

0
-1 0
0 1

slide 1
A
Ref
2
B - Ref
slide 2
B
3
slide 3
17
Write the model using design matrix
Observed data modelled by these parameters
Matrix notation
A - Ref
1

0
-1 0
0 1

slide 1
A
x
Ref
2

B - Ref
slide 2
B
3
slide 3
18
Write the model using design matrix
1

0
-1 0
0 1

a
y1
A
x
Ref
2

E
b
y2
B
3
y3
19
Write the model using design matrix
Matrix multiplication
a
1

0
-1 0
0 1

y1
1 x a 0 x b
A
x
Ref
2

E

b
y2
-1 x a 0 x b
B
3
y3
0 x a 1 x b
a
-a

b
20
Write the model using design matrix
Matrix multiplication
a
1

0
-1 0
0 1

y1
1 x a 0 x b
A
x
Ref
2

E

b
y2
-1 x a 0 x b
B
3
y3
0 x a 1 x b
a
-a

b
21
Write the model using design matrix
Matrix multiplication
a
1

0
-1 0
0 1

y1
1 x a 0 x b
A
x
Ref
2

E

b
y2
-1 x a 0 x b
B
3
y3
0 x a 1 x b
a
-a

b
22
Modelling data
With two observations the line is DETERMINED!!!
23
Modelling data
With many observations the line is NOT DETERMINED
- we must estimate it!!!!
24
Simple regression
Minimize the difference between the observation
and its prediction according to the line. Method
of least squares find the line which minimizes
the sum of square errors.
25
Linear model for different groups
.
.
.
.
.
M (log-ratios)
.
.
B
.
.
C
A
.
.
.
n
.
n
n
.
Example data for one gene
.
R
.
.
Experiment that might result in this data
.
Minimise the errors around the means of each
group. Notice, that only with replication in
each group, can we estimate a mean for each group
and fit a statistical model to the data.
26
Linear model for different groups at two levels
.
.
Drugs result in different transcriptional
response (main effects) Effect over time is also
different for different drugs (interaction)
.
.
.
.
B
.
A
C
.
.
.
B
.
.
C
A
.
.
.
.
.
R
.
Experiment that might result in this data i.e. A
treatment with drug A 1hr later A
treatment with drug A 24hrs later
Example data for one gene
27
Linear model and array platforms

Linear modelling approach applies to both single
channel (Affymetrix) and two-colour spotted
arrays.
Two colour with common reference is virtually
equivalent to single channel from an analysis
point of view
Need to cover some special features of two-colour
arrays using direct comparisons.

28
Specifying the design
Sample types
Possible parameters
1. What differences are important?
1. Represent the effect measured by each sample
type.
1
Single-channel representation (2 types
of samples)
Log-ratio representation (choose 2-1 1
parameters)
A
B
2
3
A a B b
A - B a - b
4
B - A b - a
Choose one parameter to model your data. Write
the design matrix for this experiment.
29
Write the model using design matrix
1
Samples 2
Parameter 1
A
B
A a B b
B - A b - a
2
3
4
y1
b-a
x

E
y2
y3
y4
Y
X
ß
30
Write the model using design matrix
1
A
B
2
3
4
a-b
-1 1 -1 1
y1
-1 x (b - a)
b-a
x

E
b-a

y2
1 x (b - a)
-1 x (b - a)
a-b
y3
1 x (b - a)
b-a
y4
Y
X
ß
31
Write the model using design matrix
1
A
B
2
3
4
a-b
1 -1 1 -1
y1
1 x (a - b)
a-b
x

E
b-a

y2
-1 x (a - b)
1 x (a - b)
a-b
y3
-1 x (a - b)
b-a
y4
Y
X
ß
32
Specifying the design
A
Sample types
1
Possible parameters
2
1. What differences are important?
1. Represent the effect measured by each sample
type.
R
3
B
4
Single-channel representation (3 types
of samples)
Log-ratio representation (choose 2-1 1
parameters)
A - B basea-(baseb)
A base a B base b R base
A - R a
B - R b
more
33
Specifying the design (alternative)
A
Sample types
1
Possible parameters
2
1. What differences are important?
1. Represent the effect measured by each sample
type.
R
3
B
4
Single-channel representation (3 types
of samples)
Log-ratio representation (choose 3-1 2
parameters)
A - B a - b
A a B b R r
A - R a - r
B - R b - r
more
Choose two parameters to model your data. Write
the design matrix for this experiment.
34
Write the model using design matrix
A
1
Samples 3
Parameters 2
2
A a B b R r
A - R a - r A - B a - b
R
3
B
4
y1
a-r
x

E
a-b
y2

y3
y4
Y
X
ß
35
Write the model using design matrix
A
1
Samples 3
Parameters 2
2
A a B b R r
A - R a - r A - B a - b
R
3
B
4
a-r
y1
1 0 -1 0 1 -1 -1 1
(a-r)0
a-r
x

E
r-a
a-b

y2
-1(a-r)0

(a-r)-(a-b)
b-r
y3
-1(a-r)(a-b)
r-b
y4
Y
X
ß
36
design matrix with alternative parameterisation
A
1
Samples 3
Parameters 2
2
A ra B rb R r
A - R a B - R b
R
3
B
4
y1
a
x

E
b
y2
y3
y4
Y
X
ß
37
design matrix with alternative parameterisation
A
1
Samples 3
Parameters 2
2
A ra B rb R r
A - R a B - R b
R
3
B
4
a
y1
1 0 -1 0 0 1 0 -1
a0
a
x

E
-a
b

y2
-a0

0b
b
y3
0-b
-b
y4
Y
X
ß
38
Specifying a contrast matrix
A
1
Samples 3
Parameters 2
2
A ra B rb R r
A - R a B - R b
R
3
B
4
Linear model estimates of parameters
â
b
Parameter estimates (called coefficients in limma)
39
Specifying a contrast matrix
A
1
Samples 3
Parameters 2
2
A ra B rb R r
A - R a B - R b
R
3
B
4
a
y1
1 0 -1 0 0 1 0 -1
1 0 0 1 1 -1 .5 .5
a
x
x
â

E
-a

b
y2
b
b
y3
-b
y4
Parameter estimates (called coefficients in limma)
Y
X
ß
Contrast matrix
40
Specifying a contrast matrix
A
1
Samples 3
Parameters 2
2
A ra B rb R r
A - R a B - R b
R
3
B
4
1 0 0 1 1 -1 .5 .5
x
â
â
A

B
â -
A - B
â
.5( )
1/2(A B)
Parameter estimates (called coefficients in limma)
Contrast matrix
Contrasts of interest
41
Specifying the design
Sample types
Possible parameters
1. What differences are important?
1. Represent the effect measured by each sample
type.
A
1
6
5
2
3
Single-channel representation (3 types
of samples)
Log-ratio representation (choose 3-1 2
parameters)
B
C
4
A - B a - b
A a B b C c
B - C b - c
C - A c - a
more
Choose two parameters to model your data. Write
the design matrix for this experiment.
42
Write the model using design matrix
A
Samples 3
Parameters 2
1
6
A a B b C c
A - B a - b B - C b - c
5
2
3
B
C
4
y1
a-b
x

E
y2
b-c
y3
y4
y5
y6
Y
X
ß
43
Write the model using design matrix
A
Samples 3
Parameters 2
1
6
A a B b C c
A - B a - b B - C b - c
5
2
3
B
C
4

0
-1 0
0 -1
0 1
-1 -1
1 1

(a-b)0
a-b
y1
a-b
x

E

0-1(a-b)
b-a
y2
b-c
0-1(b-c)
c-b
y3
01(b-c)
b-c
y4
-1(a-b)-(b-c)
c-a
y5
(a-b)(b-c)
a-c
y6
Y
X
ß
44
Specify contrasts of interest
A
Samples 3
Parameters 2
1
6
A a B b C c
A - B a - b B - C b - c
5
2
3
B
C
4

0
-1 0
0 -1
0 1
-1 -1
1 1

y1
a-b
x
x

E
y2
b-c
y3
y4
y5
Parameter estimates (called coefficients in limma)
y6
Contrast matrix
Contrasts of interest
Y
X
ß
In limma, the contrast matrix is the transpose of
the above!! i.e. rows become the columns and the
columns become the rows
45
Specify contrasts of interest
A
Samples 3
Parameters 2
1
6
A a B b C c
A - B a - b B - C b - c
5
2
3
B
C
4

0
-1 0
0 -1
0 1
-1 -1
1 1

y1
a-b
x
1 0 0 1 1 1
x

E
y2
b-c
y3
y4
y5
Parameter estimates (called coefficients in limma)
Contrasts of interest
y6
Contrast matrix
Y
X
ß
In limma, the contrast matrix is the transpose of
the above!! i.e. rows become the columns and the
columns become the rows
46
Factorial experiment one sample as a common
reference
2
Parameters 3
Samples 4
A
C
5
A - C a B - C b AB - C ab
A basea B base b
C base AB baseab
1
3
4
B
AB
6
47
Factorial experiment one sample as a common
reference
2
Parameters 3
Samples 4
A
C
5
A - C a B - C b AB - C ab
A basea B base b
C base AB baseab
1
3
4
B
AB
6
y1
a
x

E
b
y2
ab
y3
y4
y5
y6
Y
X
ß
48
Factorial experiment one sample as a common
reference
2
Parameters 3
Samples 4
A
C
5
A - C a B - C b AB - C ab
A basea B base b
C base AB baseab
1
3
4
B
AB
6
0 1 0 1 0 0 -1 0 1 0 0 1
1 -1 0 0 -1 1
0 b 0
b
y1
a
x

E

b
a 0 0
a
y2
-a 0 ab
ab
-aab
y3
0 0 ab
00ab
y4
a - b 0
a-b
y5
0 - b ab
-bab
y6
Y
X
ß
49
Design for factorial experiment with interaction
2
Parameters 3
Samples 4
A
C
5
A ca B cb C
c AB cabab
A - C a B - C b AB - A - B C ab
1
3
4
B
AB
6
50
Design for factorial experiment with interaction
2
Parameters 3
Samples 4
A
C
5
A ca B cb C
c AB cabab
A - C a B - C b AB - A - B C ab
1
3
4
B
AB
6
y1
a
x

E
b
y2
ab
y3
y4
y5
y6
Y
X
ß
51
Design for factorial experiment with interaction
2
Parameters 3
Samples 4
A
C
5
A - C a B - C b AB - A - B C ab
A ca B cb C
c AB cabab
1
3
4
B
AB
6
0 1 0 1 0 0 0 1 1 1 1 1 1 -1 0 1
0 1
0 b 0
b
y1
a
x

E

b
a 0 0
a
y2
0 b ab
ab
bab
y3
a b ab
abab
y4
a - b 0
a-b
y5
a 0 ab
aab
y6
Y
X
ß
52
Interaction
ab positive
ab negative
ab
c
ca
ab
ab
cb
cabab
joint
B
joint
B
A
A
53
Trend analysis
Parameters 1
Samples 5
T2 - T1 a
C base T1 basea
T2 base2a T3 base3a
T4 base 4a
6
Note the possible number of parameters is 4, but
we choose here to use only one parameter
y1
x
a

E
y2
y3
y4
y5
y6
Y
X
ß
54
Trend analysis
Parameters 3
Samples 5
T2 - T1 a
C base T1 basea
T2 base2a T3 base3a
T4 base 4a
T1
T3
3
1
2
4
6
T2
T4
C
5
straight line model is fitted
1 1 1 1 -4 2
y1
x
a
big a

E
y2
y3
small a
y4
time
y5
y6
Y
X
ß
large -ve a
55
Trend analysis
Parameters 3
Samples 5
T2 - T1 a
C base T1 basea
T2 base4a T3 base9a
T4 base16a
T1
T3
3
1
2
4
6
T2
T4
C
5
quadratic model is fitted
1 3 5 7 -16 4
y1
x
a
big a

E
y2
y3
small a
y4
time
y5
y6
Y
X
ß
large -ve a
56
Trend analysis
Parameters 3
Samples 5
T2 - T1 a
C base T1 base16a
T2 base9a T3 base4a
T4 basea
T1
T3
3
1
2
4
6
T2
T4
C
5
quadratic model is fitted
16 -7 -5 -3 1 9
y1
x
a

E
y2
big a
y3
small a
y4
time
y5
y6
large -ve a
Y
X
ß
57
2 by 3 factorial experiment

Identify DE genes that have different time
profiles between different mutants.
a time effect, b strains, ab
interaction effect

M
a gt 0 b 0 ab0
strain A
Strain B
0 12 24
time
58
Design matrix for single-colour arrays
Samples Parameters
Samples
1. Represent the effect measured by each sample.
D
C
B
A
Replicates
1
4
7
9
Single-channel representation (4 types
of samples)
2
8
10
5
3
6
A a B b C c D d
Squares represent single-colour arrays, numbers
represent the array number. Observations (data)
are log-intensities.
59
D
C
B
A
Samples Parameters
1
4
7
9
A a B b C c D d
2
5
8
10
3
6
y1
y2

E
y3
Data are log-intensities NOT log-ratios
y4
a
x
y5
b
y6
c
y7
d
y8
y9
y10
Y
X
ß
60
D
C
B
A
Samples Parameters
1
4
7
9
A a B b C c D d
2
5
8
10
3
6
y1
1 0 0 0 1 0 0 0 1 0 0 0 0
1 0 0 0 1 0 0 0 1 0 0 0 0
1 0 0 0 1 0 0 0 0 1 0 0 0
1
a a a b b b c c d d
y2

E
y3
Data are log-intensities NOT
log-ratios Design matrix is EASY!!!
y4
a
x

y5
b
y6
c
y7
d
y8
y9
y10
Y
X
ß
61
D
C
B
A
Contrast matrix
1
4
7
9
2
5
8
10
3
6
1 -1 0 0 1 0 -1 0
x
In limma, the contrast matrix is the transpose of
the above!! i.e. rows become the columns and the
columns become the rows
62
Recipe getting the design matrix right
everytime
63

Step 1 draw a picture of your experiment
Make sure your arrays are connected
Step 2 decide on the parameters of interest
Make sure your parameters of interest do not form
a loop in your experimental design picture
Make sure your parameters involve every treatment
type at least once
Try a few different ways of parameterising your
experiments
Step 3 label your parameters
Step 4 specify each slide using the parameters
you selected
Some slides will need a combination of parameters
in order to specify them
Step 5

64
Experimental design what is hybridised to what?
a4
a26
a30
ref
w14
w13
w8
65
A suitable parameterisation
a4
a26
a30
ref
w14
w13
w8
66
Can we add another parameter to our model?
a4
a26
a30
ref
w14
w13
w8
67
Can we add another parameter to our model?
a4
a26
a30
ref
w14
w13
w8
68
Can we add another parameter to our model? No!
a4
a26
a30
ref
w14
w13
w8
There are 7 treatments, there can only be 6
parameters .. any more parameters would create a
loop any less would leave out one
of the treatments
69
Can we add this parameter to our model?
a4
a26
a30
ref
w14
w13
w8
70
Can we add this parameter to our model? No! We
created a loop.
a4
a26
a30
ref
w14
w13
w8
71
Whats wrong with this parameterisation?
There are 6 (n-1) parameters! There is no loop!
Have we left out a treatment?
a4
a26
a30
ref
w14
w13
w8
72
Whats wrong with this parameterisation? No
parameter involving w13
a4
a26
a30
ref
w14
w13
w8
No combination of parameters to specify treatment
w13
73
Whats wrong with this parameterisation? No
parameter involving w13
a4
a30
ref
w13
w14
w8
a26
74
Whats wrong with this parameterisation? No
parameter involving w13. There is also a
loop! (we can easily see this when we rearrange
the picture)
a4
a30
ref
w13
w14
w8
a26
75
A suitable parameterisation
a4
a26
a30
ref
w14
w13
w8
76
Another suitable parameterisation
a4
a26
a30
ref
w14
w13
w8
77
Yet another suitable parameterisation!
a4
a26
a30
ref
w14
w13
w8
78
Naming the parameters allows you to specify each
array according to your parameterisation (or
model)
a4
a26
a30
a4
a26
a30
ref
w13
w8
w14
w14
w13
w8
79
a4a30
a26a4
a4
a26
a30
a30
ref
a4w14
a30w8
a26w13
w14
w13
w8
80
a4a30
a26a4
a4
a26
a30
ref
a26w13
a30w8
a4w14
w13
w14
w13
w8
81
A statistician might name your parameters
differently
ß3
ß4
a4
a26
a30
ref
ß2
ß6
ß5
ß1
w14
w13
w8

Write a Comment

User Comments (0)