Title: Whole genome transcriptome variation in Arabidopsis thaliana
1Whole genome transcriptome variation in
Arabidopsis thaliana Xu Zhang Borevitz Lab
2Arabidopsis thaliana have been adapted to highly
variable environments
3Transcription and splicing
Chromosomal DNA
Exon 1
Exon 2
Exon 3
Intron 1
Intron 2
Transcription
Nuclear RNA
RNA splicing
Messenger RNA
Exon 1
Exon 2
Exon 3
Exon 1
Exon 3
4Whole genome tiling array
- High density and resolution 1.6M unique probes
at 35bp spacing - Without bias toward known transcripts
Genetic hybridization polymorphisms could affect
the estimation of gene expression
5The experiment
Col? x Col?
Van ? x Van ?
Col ? x Van ?
Van ? x Col ?
- parental strains and reciprocal F1 hybrids
- mRNA from totoal RNA genomic DNA
6Double-stranded random labeling
AAAAA
Random reverse transcription
AAAAA
Double-stranded cDNA
Random priming
7Outlines
- Sequence polymorphisms
- Gene expression variation
- Splicing variation
- HMM for a de novo transcription profiling
- Deletions/duplications underlying expression
variation
8Outlines
- Sequence polymorphisms
- Gene expression variation
- Splicing variation
- HMM for a de novo transcription profiling
- Deletions/duplications underlying expression
variation
9Single Feature Polymorphisms and indels
SFP
SFP
SFPs
deletion or duplication in Van
10Sequence polymorphisms
SFPs FDR Col gt Vana Van gt Colb Total
SFPs 11.82 135769 14934 150703
SFPs 7.66 126443 9479 135922
SFPs 5.22 118381 6662 125043
SFPs 3.88 110861 4979 115840
SFPs 3.15 104115 3820 107935
Indels Model selection deletion duplication Total
Indels BICc 527 32 559
Indels AICd 1180 100 1280
SPFs were removed before gene expression
analysis The analyzed Indels were gt200bp
11Deletions vs duplications
12Distribution of indels along chromosomes
13Outlines
- Sequence polymorphisms
- Gene expression variation
- Splicing variation
- HMM for a de novo transcription profiling
- Deletions/duplications underlying expression
variation
14Additive, dominant and maternal effects of gene
expression
15The linear model
Gene probe Intensity additive dominant
maternal e
16Gene expression variation between genotypes
17The pattern of gene expression inheritance
Mean gene intensity
paternal
Maternal
Col dominant
over dominant
F1v dominant
F1c dominant
Van dominant
Col Van F1v F1c
18The pattern of gene expression inheritance
19Outlines
- Sequence polymorphisms
- Gene expression variation
- Splicing variation
- HMM for a de novo transcription profiling
- Deletions/duplications underlying expression
variation
20Default expression status of exons and introns
Exon/intron probe Intensity additive dominant
maternal e
- Exons correction for gene expression before
comparison - corrected by gene mean
- corrected by a median-polish
- splicing index (Meanexon/Meangene)
- Introns direct comparison
21Differential exonic splicing
Delta Sig Sig- Total False FDR
Correction by mean 0.1 410 141 551 1378 250
Correction by mean 0.2 143 47 190 197 104
Correction by mean 0.3 87 27 114 70 61.10
Correction by mean 0.4 46 19 65 34 51.60
Correction by mean 0.5 26 13 39 19 47.90
Correction by med-polish 0.3 267 182 449 69 15.30
Correction by med-polish 0.4 126 52 178 33 18.40
Correction by med-polish 0.5 68 31 99 18 18.10
Correction by med-polish 0.6 43 21 64 11 16.60
Correction by med-polish 0.7 28 18 46 7 14.50
Slicing index 0.6 675 500 1175 6 0.52
Slicing index 0.7 532 347 879 2 0.25
Slicing index 0.8 409 261 670 1 0.12
Slicing index 0.9 343 183 526 0 0.06
Slicing index 1 273 148 421 0 0.03
Exon probe Intensity additive dominant
maternal e
22Comparison for enrichment in known alternatively
spliced exons
Threshold 1 Threshold 1 Threshold 2 Threshold 2 Threshold 3 Threshold 3
Called Not called Called Not called Called Not called
Corrected by mean Known 24 1054 12 1090 10 1096
Corrected by mean Not known 437 72429 156 73272 91 73467
Corrected by mean Fold enrichment 3.8 3.8 5.2 5.2 7.4 7.4
Corrected by mean p-value 1.21E-07 1.21E-07 8.83E-06 8.83E-06 2.78E-06 2.78E-06
Corrected by median Known 22 1104 10 1116 5 1121
Corrected by median Not known 371 73369 145 73595 84 73656
Corrected by median Fold enrichment 3.9 3.9 4.5 4.5 3.9 3.9
Corrected by median p-value 1.98E-07 1.98E-07 1.31E-04 1.31E-04 1.12E-02 1.12E-02
Splicing index Known 11 1115 6 1120 4 1122
Splicing index Not known 456 73284 160 73580 98 73642
Splicing index Fold enrichment 1.6 1.6 2.5 2.5 2.7 2.7
Splicing index p-value 1.26E-01 1.26E-01 4.04E-02 4.04E-02 6.84E-02 6.84E-02
23Differential intronic splicing
Delta Sig Sig- total FALSE FDR
0.3 187 166 353 17 4.83
0.4 119 92 211 7 3.22
0.5 84 65 149 4 2.44
0.6 61 59 120 2 1.77
0.7 53 53 106 1 1.22
Intron probe Intensity additive dominant
maternal e
24Experimental determined FDR for differential
splicing
Significant estimated FDR tested confirmed experimental FDR experimental FDR
Exon (corrected by mean) 551 250 45 24 46.70
Exon (corrected by mean) 190 104 25 14 44.00
Exon (corrected by mean) 27 61 8 7 12.50
Exon (corrected by median) 449 15.30 33 17 48.50
Exon (corrected by median) 178 18.40 19 10 47.40
Exon (corrected by median) 46 14.50 7 5 28.60
Intron 353 4.83 59 38 35.60
Intron 93 1.01 38 25 34.20
Intron 70 0.97 35 24 31.40
25Differential splicing is predominantly additive
in F1 hybrids
26Examples differential splicing
27Limitation of annotation-based splicing analysis
- Depends on annotation
- The correction of gene expression is overall
conservative - Quality of probes
28Outlines
- Sequence polymorphisms
- Gene expression variation
- Splicing variation
- HMM for a de novo transcription profiling
- Deletions/duplications underlying expression
variation
29Generalized tiling array HMM
(by Jake Byrnes)
- 3-state HMM
- Discrete distribution for emission probability
- Transition probability counts for probe spacing
- Baum-Welch parameter estimation
30Generalized tiling array HMM
(by Jake Byrnes)
31A nice model also needs better array
- Array density is not enough to distinguish
exon/intron boundaries - Probe quality
32Differential segments gt3 continuous probes
with posterior probability gt0.99. Differentially
expressed genes annotated genes for which 33
of their probes reside within the observed
differential segments. Differentially spliced
genes annotated genes for which lt33 of probes
resided within the differential segment, or
annotated genes containing 2 differential
segments with different states. Novel gene
boundaries differential segments with gt 5
probes extending beyond annotated gene
boundary Novel transcripts differential segments
with gt 5 probes and outside any annotated gene
boundary.
33Length distribution of segments called by HMM
34An example of HMM detected segments
35Comparison of annotation-based analysis and HMM
Col gt Van Van gt Col total
Annotation differential expression 1264 679 1943
Annotation alternative exonic splicing 267 182 449
Annotation alternative intronic splicing 187 166 353
HMM differential expression 1676 966 2642
HMM differential splicing 876 540 1416
HMM un-annotated transcript 35 42 77
HMM un-annotated 5' 32 21 53
HMM un-annotated 3' 30 10 40
36Comparison of annotation-based analysis and HMM
HMM Expression (Col gt Van) Expression (Van gt Col) Splicing (Col gt Van) Splicing (Van gt Col)
Annotation 1676 966 925 560
Expression (Col gt Van) 1264 1033 138
Expression (Van gt Col) 679 553 72
Splicing (Col gt Van) 544 183 53
Splicing (Van gt Col) 270 72 35
37Outlines
- Sequence polymorphisms
- Gene expression variation
- Splicing variation
- HMM for a de novo transcription profiling
- Deletions/duplications underlying expression
variation
38Genes with deletions in Van also have low
expression in Col
39Differential expression of genes with
deletion/duplication
40Acknowledgements
Justin Borevitz Yan Li Christos Noutsos Geoff
Morris Andy Cal
Jake Byrnes Josh Rest