Title: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions
1Statistical Analysis ofcDNA Microarray
DataChallenges and Solutions
- Toni Reverter
- CSIRO Livestock Industries
AAHL Seminar - 12 Dec. 2002
2Challenges
Time Dependent
Data Dependent
Human Dependent
Chronology
Paradigm
Skill Integration
AAHL Seminar - 12 Dec. 2002
3Challenges
Human Dependent
Historical
- Traditionally Statistics grew alongside
Agriculture - Introduction to Statistical Analysis
- Law of Large Numbers
- Central Limit Theorem
SST SSM SSE
- Nowadays Statistics alongside (Bio)Technology
AAHL Seminar - 12 Dec. 2002
4AAHL Seminar - 12 Dec. 2002
Challenges
Human Dependent
Excitement (source of)
Eg. Always log spot intensities and ratios T
Speed. Hints and Prejudices
- Biochemist My software does it, therefore its
great!
- Statistician Well, I need further evidence to be
convinced
5Challenges
Human Dependent
Balance
Evidence It takes 1 ship, 10 days to cross the
ocean Question How many days does it take for 10
ships to cross the ocean?
Evidence It takes 1 builder, 10 days to build a
wall Question How many days does it take for 10
builders to build a wall?
AAHL Seminar - 12 Dec. 2002
6AAHL Seminar - 12 Dec. 2002
Challenges
Human Dependent
Balance
PHD SCHOLARSHIP Statistical Science
Program MATHEMATICAL SCIENCES INSTITUTE THE
AUSTRALIAN NATIONAL UNIVERSITY Stipend 22,771
(2002 rate, indexed annually, tax free) A PhD
Scholarship (APAI) is being offered by the
Mathematical Sciences Institute at The ANU. An
ARC Linkage Grant held by Professors Peter Hall
(ANU) and Don Poskitt (Monash University), in
conjunction with BAE Systems, Melbourne, will
fund the scholarship. The research problem is in
the area of stochastic control applied to ship
motion, and involves the development and
implementation of both parametric and
nonparametric methods. The successful applicant
will have a strong interest in statistical
methodology, computational techniques,
theoretical analysis, and the development of
statistical research problems.
7AAHL Seminar - 12 Dec. 2002
Challenges
Human Dependent
Balance
Treated?
No
Yes
100
150
No
Died?
Yes
120
120
Survival Rates Treated 150/270
55.55 Non-Tr 100/220 45.45
22 Increase!
8(No Transcript)
9AAHL Seminar - 12 Dec. 2002
Challenges
Human Dependent
Balance
r 0.87
y
r 0.00
r 0.00
x
10AAHL Seminar - 12 Dec. 2002
Challenges
Human Dependent
Interdisciplinary Skills
Minimal knowledge of the application discipline
is needed
..failing that, the Statisticians will win,
..but with the wrong weapons.
- Amount of Expression Amount of Response
- Same cut-off point to judge all genes
- Over-emphasis in normalization (Thus, reject
Boutique Arrays) - Over-emphasis in variance stabilization
11AAHL Seminar - 12 Dec. 2002
Challenges
Human Dependent
Interdisciplinary Skills
Minimal knowledge of the application discipline
is needed Animal Breeding Genetics
Ex.1 Whats a Steer?
Ex.2 Ralf Mosers Data
Wt Gain, Kg
Options 1. Gain vs. Disease 2. Medians
instead of Means 3. Regression coefficients
Lung Disease
12AAHL Seminar - 12 Dec. 2002
Solutions
Wt Gain, Kg
A
O
O Control (Untreated) A Treatment A B
Treatment B AB Both Treatments
B
AB
Disease
Model ?O ? ?A ? ? ?B ?
? ?AB ? ? ? ?
13AAHL Seminar - 12 Dec. 2002
Solutions
A
O
B
AB
14AAHL Seminar - 12 Dec. 2002
Solutions
A
A
A
O
O
O
B
AB
B
AB
B
AB
Reference
Loop
All-Pairs
15AAHL Seminar - 12 Dec. 2002
Solutions
Probability of both Female?
Case 1. No Information
1/4
Case 2. The one on the left is female
1/2
Case 3. One of them is female
.1/3
16AAHL Seminar - 12 Dec. 2002
Solutions
17AAHL Seminar - 12 Dec. 2002
Solutions
Clever Programming
Tailored to your needs
N1 for filename in R16T0S1.gpr R16T0S2.gpr
R16T24S1.gpr R16T24S2.gpr
S32T0S1.gpr S32T0S2.gpr S32T24S1.gpr
S32T24S2.gpr do Get valid readings, compute
log ratios awk 'NRgt30 NFgt0 4!"no_spot"
\ substr(4,1,5)!"score"
substr(4,1,5)!"custo" \
substr(4,1,6)!"spotre" 9gt12 18gt21 \
print 4, 9-12, 18-21, \
log(9-12)/log(2.0), log(18-21)/log(2.0)' \
filename sort gt junk1 awk '2!3 print 0,
4-5, 0.5(45)' junk1 gt junk2 get the
median of log ratios RECwc -l junk2 awk
'print int(1/2)' MEDsort -n 5 junk2 awk
-v recREC 'NRrec print 6' echo "Median of
file" filename " " MED Global
normalization substract the median to each log
ratio awk -v medianMED -v slideN \
'print "Slide_"slide, int(slide/2.5), 1,
6-median' junk2 \ sort 2 gt
dat.N Nexpr N 1 done cat dat.1 dat.2
dat.3 dat.4 dat.5 dat.6 dat.7 dat.8 gt total.dat
18AAHL Seminar - 12 Dec. 2002
Solutions
Clever Programming
Tailored to your needs
- Your Needs Important values are
- Away from (0,0)
- In quadrants 1 and 4.
19AAHL Seminar - 12 Dec. 2002
Solutions
Clever Programming
Tailored to your needs
20AAHL Seminar - 12 Dec. 2002
Solutions
Clever Programming
Tailored to your needs
Get to know/use all the available options
1. t-Statistics Standard Penalised 2.
Clustering Location-Based (k-Means,
) Model-Based (Mixtures of Distributions) 3.
ANOVA (Linear Models)
21Conclusions
AAHL Seminar - 12 Dec. 2002
Statistical Analysis of cDNA Microarray Data
- GENERAL
- Still in its infancy (possibly even embryonic
stage) - Many decisions have a heuristic rather than a
theoretical foundation - No hope for a One size fits all software
- Safer to aim towards Tailor to ones needs
- Integration of interdisciplinary skills is a must
- LIVESTOCK SPECIES
- Tailing humans (at the moment)
- Strong background knowledge of genetics
accumulated - Journals will soon be inundated
- CLI has the opportunity to participate
22(No Transcript)