RNA surveillance and degradation: the Yin Yang of RNA

About This Presentation

Title:

RNA surveillance and degradation: the Yin Yang of RNA

Description:

RNA surveillance and degradation ... both strands rRNA workflow pA reads intersecting 45S pre-rRNA pA reads intersecting 45S pre-rRNA Accumulation of micro RNA ... – PowerPoint PPT presentation

Number of Views:119

Avg rating:3.0/5.0

Slides: 23

Provided by: NewU311

Learn more at: http://www.mscs.mu.edu

Category:

more less

Transcript and Presenter's Notes

Title: RNA surveillance and degradation: the Yin Yang of RNA

1
RNA surveillance and degradation the Yin Yang of
RNA
AAAAAAAAAAA
RNA Pol II
production
RNA
destruction
AAA
Ribosome
2
MODEL
Mtr4
Polyadenylation by Trf4p
Trf4p
AAAAA
Hypomodified tRNAiMet

Rrp46p
Rrp43p
Csl4p
AAAAA
Rrp45p
Rrp42p
Rrp44p
Exosome
Mtr3p
Rrp41p
Rrp4p
Rrp40p
Rrp6p
Degradation of hypomodified tRNAiMet
- Hypothetical diagram of the exosome
3
Workflow
4
Next Gen sequencing PolyA-Seq
TRAMP Complex
AAAA
Papd5
Mtr4
ZCCHC7
AAAA
AAAA
siRNA knockdown
5
Library creation for NGS
6
Map paired end reads to genome

BWA (Burrows-Wheeler Aligner) Algorithm used to
map each pair of reads to the genome
Report each pair of reads as a single nucleotide
position within the genome where polyadenylation
detected in an RNA sample
Average insert size 300
Read size 45

7
Raw reads vs Mapped reads
Data type/kd type Raw reads Mapped reads positions
Replicate Data
Mtr4 15,135,078 10,853,534 651,551
Ctrl 16,348,780 11,708,310 652,128
Rrp6 15,971,926 12,388,266 705,173
Original data
Mtr4 ND 34,204,534 1,124,968
Ctrl ND 7,195,942 582,256
Rrp6 ND 8,241,505 597,672
Normalization of data reads per million (rpm)
8
Analysis

Starting with refseq database
Raw read counts converted to reads per million
Reads at position/total reads in sample
Remove all non-coding RNAs
From each sample collect normalized reads mapping
at the 3 end /- 50 bases of each refseq
encoding protein
Dot Plot normalized reads on log scale, X
axiscontrol and Y axismMtr4KD

9
mRNA polyadenylation does not change between Mtr4
and control KD
R20.95141
10
Problems encountered

Sequencing read depth very different in the
original data
34 mil mapped reads in one sample 8 mil in other
Lack of 3 replicates for robust statistical
analysis of data
Removal of internal A
Seq reads that map to a oligoadenylate track in
the genome
Algorithm developed misses many
Manual removal takes too much time.

11
Remove Internal A
AAAAAAAA
TTTTTTTTT
TTTTTTTTT
12
How to mine the data based on a hypothesis

Hypothesis PolyA RNAs of unknown identity will
accumulate upon depletion of mMtr4 vs. the
control.
How can the transcriptome be queried?
How detailed should a query be?
Every pA position, or only those exhibiting
greater than x number of raw/normalized reads?
How do we find significant differences with one
sample, or possibly two?
How can repetitive elements be accounted for in
the data?

13
Custom annotation to remove bias from existing
annotations

Data mapped with Bowtie to mouse genome mm10
build
Mapped data from KD and control compared using
cufflinks to explore gene expression differences
using a custom annotation
Custom annotation
1000 base pair genes with 500 base pair overlap
with next gene
This did not work well

14
Problems with using custom annotation

First real problem was the no computing could
handle more than 5000 genes of the custom
annotation at a time
One chromosome had 147K genes
There was a problem with assignment when the
reads overlapped
Cuffdiff would randomly assign the reads to only
one of the genes.
Overlaps split into two fasta files, but we could
not capture differences in the data that we knew
exists.
cuffdiff collects data from the entire 1000 bp
gene and compares between 2 samples
This method leads to false negatives for pA data
where the focus is on one or a few positions as a
pA event.

15
What next?
16
F-Seq

Tags to identify specific sequence features for
different library preparations (ChIP-seq),
(DNase-seq) and (pA-seq).
Will summarize and display individual sequence
data as an accurate and interpretable signal, by
generating a continuous tag sequence density
estimation.

17
Generating Peaks with FSeq

1. Estimate kernel density to estimate pdf
2. compute threshold
nwnw/L.
xc,
Repeat step 2 k times
s SDs above the mean
2.1 threshold output module is modifiable

18
Magnitude of data one sample both strands
51 million bases of Chromosome 12
12 thousand bases of Chromosome 12
Chromsome 12 is 121 million base pairs long
19
rRNA workflow
20
pA reads intersecting 45S pre-rRNA
18S
28S
5.8S
21
pA reads intersecting 45S pre-rRNA
18S
5.8S
28S
22
Accumulation of micro RNA processed 5 leader
upon depletion of Mtr4