Title: RNA surveillance and degradation: the Yin Yang of RNA
1RNA surveillance and degradation the Yin Yang of
RNA
AAAAAAAAAAA
RNA Pol II
production
RNA
destruction
AAA
Ribosome
2MODEL
Mtr4
Polyadenylation by Trf4p
Trf4p
AAAAA
Hypomodified tRNAiMet
Rrp46p
Rrp43p
Csl4p
AAAAA
Rrp45p
Rrp42p
Rrp44p
Exosome
Mtr3p
Rrp41p
Rrp4p
Rrp40p
Rrp6p
Degradation of hypomodified tRNAiMet
- Hypothetical diagram of the exosome
3Workflow
4Next Gen sequencing PolyA-Seq
TRAMP Complex
AAAA
Papd5
Mtr4
ZCCHC7
AAAA
AAAA
siRNA knockdown
5Library creation for NGS
6Map paired end reads to genome
- BWA (Burrows-Wheeler Aligner) Algorithm used to
map each pair of reads to the genome - Report each pair of reads as a single nucleotide
position within the genome where polyadenylation
detected in an RNA sample - Average insert size 300
- Read size 45
7Raw reads vs Mapped reads
Data type/kd type Raw reads Mapped reads positions
Replicate Data
Mtr4 15,135,078 10,853,534 651,551
Ctrl 16,348,780 11,708,310 652,128
Rrp6 15,971,926 12,388,266 705,173
Original data
Mtr4 ND 34,204,534 1,124,968
Ctrl ND 7,195,942 582,256
Rrp6 ND 8,241,505 597,672
Normalization of data reads per million (rpm)
8Analysis
- Starting with refseq database
- Raw read counts converted to reads per million
- Reads at position/total reads in sample
- Remove all non-coding RNAs
- From each sample collect normalized reads mapping
at the 3 end /- 50 bases of each refseq
encoding protein - Dot Plot normalized reads on log scale, X
axiscontrol and Y axismMtr4KD
9mRNA polyadenylation does not change between Mtr4
and control KD
R20.95141
10Problems encountered
- Sequencing read depth very different in the
original data - 34 mil mapped reads in one sample 8 mil in other
- Lack of 3 replicates for robust statistical
analysis of data - Removal of internal A
- Seq reads that map to a oligoadenylate track in
the genome - Algorithm developed misses many
- Manual removal takes too much time.
11Remove Internal A
AAAAAAAA
TTTTTTTTT
TTTTTTTTT
12How to mine the data based on a hypothesis
- Hypothesis PolyA RNAs of unknown identity will
accumulate upon depletion of mMtr4 vs. the
control. - How can the transcriptome be queried?
- How detailed should a query be?
- Every pA position, or only those exhibiting
greater than x number of raw/normalized reads? - How do we find significant differences with one
sample, or possibly two? - How can repetitive elements be accounted for in
the data?
13Custom annotation to remove bias from existing
annotations
- Data mapped with Bowtie to mouse genome mm10
build - Mapped data from KD and control compared using
cufflinks to explore gene expression differences
using a custom annotation - Custom annotation
- 1000 base pair genes with 500 base pair overlap
with next gene - This did not work well
14Problems with using custom annotation
- First real problem was the no computing could
handle more than 5000 genes of the custom
annotation at a time - One chromosome had 147K genes
- There was a problem with assignment when the
reads overlapped - Cuffdiff would randomly assign the reads to only
one of the genes. - Overlaps split into two fasta files, but we could
not capture differences in the data that we knew
exists. - cuffdiff collects data from the entire 1000 bp
gene and compares between 2 samples - This method leads to false negatives for pA data
where the focus is on one or a few positions as a
pA event.
15What next?
16F-Seq
- Tags to identify specific sequence features for
different library preparations (ChIP-seq),
(DNase-seq) and (pA-seq). - Will summarize and display individual sequence
data as an accurate and interpretable signal, by
generating a continuous tag sequence density
estimation.
17Generating Peaks with FSeq
- 1. Estimate kernel density to estimate pdf
- 2. compute threshold
- nwnw/L.
- xc,
- Repeat step 2 k times
- s SDs above the mean
- 2.1 threshold output module is modifiable
18Magnitude of data one sample both strands
51 million bases of Chromosome 12
12 thousand bases of Chromosome 12
Chromsome 12 is 121 million base pairs long
19rRNA workflow
20pA reads intersecting 45S pre-rRNA
18S
28S
5.8S
21pA reads intersecting 45S pre-rRNA
18S
5.8S
28S
22Accumulation of micro RNA processed 5 leader
upon depletion of Mtr4
- Comparison of Mtr4 V. Control KD
- Abundant polyA found near 5 end of annotated
Mir322 - Confirmed using molecular technique