Title: Genomics of microRNA genes and their target mRNAs
1Genomics of microRNA genes and their target mRNAs
2Readings
- Bentwich (review for background)
- Smalheiser and Torvik (experimental, for
detailed analysis) - Hill et al. (points to the future)
3(No Transcript)
4Where are miR genes?
- Intergenic tendency to cluster
- Primary transcripts are polycistronic
- Promoter, (splicing), 5-cap, poly A tail
- Intronic most are in sense orientation
- may also be in clusters
- At least some are pol II
- No promoter needed?
- Relation to genes they regulate? Chimeric
transcripts, antisense orientation, origin from
pseudogenes or inverted duplication of target
mRNAs - miRs in viral genomes too
5(No Transcript)
6Rules for predicting hairpins as being miR
precursors
- Stem-loop, stem is over 23 bases
- LOTS of hairpins in the genome all over
- predicted loop often smaller than actual size
- cross species conservation and pattern of change
over evolution (loop vs. arm vs. arm), drop of
conservation just outside the hairpin - nucleotide composition, melting temperature,
symmetrical bubbles - Clustering (better signal-to-noise ratio)
- validation using small RNA cloning
7Machine Learning
- Training Sets positive and negative examples
- Goal recognize patterns, summarize features
- Features to be encoded (may be redundant,
nonlinear or non-independent) - Algorithm for learning or deciding something
about the testing set. How to weight each
feature, how to combine features? - Optimization criterion (when to stop analyzing??)
- Form clusters, set cut-off values, optimize
multiple parameters into a single ranked value,
predict groups - Process is only as robust as the training sets!
8Where do miR genes come from originally?
- some come from inverted target duplications,
genomic repeats, pseudogenes these give
automatic targets too in some cases - Very large pool of candidate hairpins!!
- could evolve by change in hairpin sequences that
allow Drosha or Dicer to cut it - or could acquire promoter to transcribe the
hairpin
9Computational Prediction of miR targets (in
Animals)
- Machine learning with positive and negative
examples - Prediction confidence vs. biological reality
- Improve signal to noise ratio
- cross species conservation,
- Binding energy of miR to target sequence
- Multiple hits to same target
- Multiple hits from different miRs on same target
- Restrict attention to 3-UTR
- 5- seeds, 6-8 with or without GU matches
allowed - Mismatch between miR and target in central region
- Validation by cloning, expression, genetics
10What is the purpose of miR-target interaction?
- Turn OFF genes that should not be expressed at a
given time or cell type? - Lim paper and other recent papers showing
OPPOSITE pattern of expression of miR and target
mRNAs - Or the opposite, modulate levels up and down
dynamically? - Examples in plants
- miR-214 hits all five synaptic targets.
- Mir-134 in brain, regulates a brain target LIMK1
- combinatorial miRs targets with
CO-expression..
11miRs are constrained by targets and vice versa
- miRs often NOT co-expressed with target mRNAs (in
time or tissue type) - housekeeping genes have short 3-UTRs to avoid
regulation by miRs (anti-targets) - so, miRs constrain target sequences
- conversely, targets constrain miR seed sequences
too!!! - (many different miRs that are otherwise unrelated
share the same seed sequence)
12Evolution of miR genes
- duplication and divergence
- keep one miR conserved (paired with target mRNA)
- but the others may drift in terms of the pattern
of their expression - or the sequence of their target mRNAs
13(No Transcript)
14Our Study of Human microRNA targetsBMC
Bioinformatics 2004
- No training set, no heuristics required.
- Compare the characteristics of the ENTIRE SET of
miRs vs. scrambled counterparts against ENTIRE
human RefSeq mRNAs, look for statistical outliers - Identify outlying tails for exact hit length,
gapped-BLAST score, proximity of hits from
distinct microRNAs on same target
15microRNA sequences (nonredundant set)
Scrambled
Number of hits of exact hit length x or greater
0.2
Exact hit length, x
16(No Transcript)
17(No Transcript)
1835
microRNA sequences (nonredundant set)
Scrambled
30
25
20
Number of microRNA sequences
15
10
5
0
2
4
6
8
10
12
14
16
Number of distinct targets hit
19Take-Home Lessons for Human Long Seed Targets
- Longer exact hit length, better overall match,
closer multiple hits all predict true targets
(unlikely to arise by chance) - gt10 bases up to perfect and near-perfect
complementarity even in mammals - NO preference for 3-UTR on target
- NO preference for 5 end of miR to be a perfect
6 base match
20perfect microRNA-target interactions
- A few examples of microRNAs exhibiting perfect
complementarity have been described (miR-196,
miR-127 and miR-136. - During our analysis of long-seed interactions we
were struck by the existence of targets that had
perfect complementarity to microRNAs along their
full length. - Whereas long-seeds are defined as having exact
Watson-Crick base pairing with no GU matches,
recent data suggest that complementarity
interactions that contain up to, say, 5-7 GU
matches (but no frank mismatches) could still be
deemed perfect in gene silencing. - In particular, we noted that human miR-95 was
perfectly complementary (including up to 4 GU
matches) with scores of human mRNAs and ESTs
(fig. 6). Similarly, miR-151 was perfectly
complementary to 6 transcripts, which was
significantly above the level expected by chance.
21SW perc perc perc query position in
query matching repeat position in
repeat score div. del. ins. sequence
begin end (left) repeat class/family
begin end (left) ID 203 10.0 0.0 0.0
hsa-mir-151 2 31 (59) C L2
LINE/L2 (94) 3178 3149 1 193
28.0 8.0 0.0 hsa-mir-151 4 78 (12)
L2 LINE/L2 3102 3182 (90)
2 202 17.6 0.0 0.0 hsa-mir-28
2 35 (51) C L2 LINE/L2 (93)
3179 3146 3 236 18.2 0.0 0.0
hsa-mir-28 43 86 (0) L2
LINE/L2 3137 3180 (92) 3 392
2.2 0.0 0.0 hsa-mir-321 2 47 (12)
tRNA-Arg-AGG tRNA 28 73 (3) 4
194 22.5 0.0 0.0 hsa-mir-325 3
42 (56) L2 LINE/L2 3220
3259 (13) 5 219 23.4 0.0 0.0
hsa-mir-325 52 98 (0) C L2
LINE/L2 (18) 3295 3249 6 252
10.5 0.0 0.0 hsa-mir-340 9 46 (49)
C MARNA DNA/Mariner (346) 240 203
7 206 15.2 0.0 0.0 hsa-mir-95
2 34 (47) L2 LINE/L2
3273 3305 (8) 8 201 17.1 0.0 0.0
hsa-mir-95 47 81 (0) C L2
LINE/L2 (7) 3306 3272 8 195
7.1 0.0 0.0 mmu-mir-151 1 28 (40) C
L2 LINE/L2 (96) 3176 3149 1
193 20.6 0.0 0.0 mmu-mir-28 2
35 (51) C L2 LINE/L2 (93)
3179 3146 2 214 20.4 0.0 0.0
mmu-mir-28 43 86 (0) L2
LINE/L2 3137 3180 (92) 2 304
21.9 2.7 0.0 mmu-mir-297-1 1 73 (3)
RMER12 LTR/ERVK 774 848 (474)
3 185 22.8 0.0 3.4 mmu-mir-297-2
3 61 (3) (TATATG)n Simple_repeat 2
58 (0) 4 392 2.2 0.0 0.0
mmu-mir-321 2 47 (12) tRNA-Arg-AGG
tRNA 28 73 (3) 5 202
20.0 0.0 0.0 mmu-mir-325 3 42 (56)
L2 LINE/L2 3220 3259 (13)
6 240 21.3 0.0 0.0 mmu-mir-325
52 98 (0) C L2 LINE/L2
(18) 3254 3208 6 252 10.5 0.0 0.0
mmu-mir-340 12 49 (49) C MARNA
DNA/Mariner (346) 240 203 7 402
15.1 0.0 0.0 mmu-mir-341 12 84 (12)
(CGGT)n Simple_repeat 3 75 (0)
8 203 10.0 0.0 0.0 rno-mir-151
7 36 (61) C L2 LINE/L2
(94) 3178 3149 1 194 10.3 2.4 4.9
rno-mir-151 38 78 (19) L2
LINE/L2 3180 3219 (94) 2 236
19.2 0.0 0.0 rno-mir-28 35 86
(0) L2 LINE/L2 3137 3221
(92) 3 486 8.8 0.0 0.0 rno-mir-297
1 68 (0) (TATG)n
Simple_repeat 1 68 (0) 4 392
2.2 0.0 0.0 rno-mir-321 2 47 (12)
tRNA-Arg-AGG tRNA 28 73 (3) 5
202 20.0 0.0 0.0 rno-mir-325 3
42 (56) L2 LINE/L2 3220
3259 (13) 6 240 21.3 0.0 0.0
rno-mir-325 52 98 (0) C L2
LINE/L2 (18) 3254 3208 6 451
23.1 2.2 0.0 rno-mir-327 4 94 (0)
C RodERV21 LTR/ERV1 (2811) 2180 2088
7 476 5.0 0.0 0.0 rno-mir-333
36 95 (0) B2_Rat1 SINE/B2
2 61 (127) 8 234 13.2 0.0 0.0
rno-mir-340 12 49 (49) C MARNA
DNA/Mariner (346) 240 203 9 402
15.1 0.0 0.0 rno-mir-341 12 84 (12)
(CGGT)n Simple_repeat 3
22(No Transcript)
23lt--22 mir-151 lt--1
lt--22
mir-95 lt--1 atgatctgacactcgaggagct
acgagttatttatgggcaactt
lt--22 mir-28 lt--1
1--gt mir-325 21--gt gagttatctgacactcgag
gaa
cctagtaggtgtccagtaagt
tccactagaatgtaagctccatgagggcagggactttgtctgttttgt
tcactgctgtatccccagcgcctagmacagtgcctggcacatagtaggc
3188
--L2A consensus--gt
3305/3314
cccactggactgtgagctccgcgagggcagggac
tgtgtctgtcttgttcaccactgtatccccagcgcctagcacagtgcctg
gcacatagcaggcgctcagtaaatgtttgttgaa 294
--L2B
consensus--gt
411/419
24(No Transcript)
25miR-151 perfect hits
NM_014400.1 - Homo sapiens GPI-anchored
metastasis-associated protein homolog (C4.4A),
mRNA mir-151 22 TACTAGACTGTGAGCTCCTCGA 1 hit
C4.4A 1584
TACTAGACTGTGAGCTCCTCGA 1605/1698
3'UTR Repeatmasker predicts a L2 element
within the mRNA at position 1575-1673. 2)
NM_005373.1 - Homo sapiens myeloproliferative
leukemia virus oncogene (MPL), mRNA mir-151
22 TACTAGACTGTGAGCTCCTCGA 1 hit
?? MPL 5' 3526
TACTAGATTGTGAGCTCCTTGA 3547/3646
3'UTR Repeatmasker predicts a L2 element
within the mRNA at position 3504-3646.
26miR-28 showed an excellent hit upon transcription
factor E2F6
miR 3' GAGT--T-ATCTGACACTCGAGGAA 5' w/GU
matches BLAST
Target 5' 2207 TTTACCATTAGACTGTGAGCTCCTT
2231/2342
27LINE-2 derived miRs
- The LINE-2 repeat derived microRNAs appear to
recognize transcripts that share repeats in their
3-UTR regions. - when they bind with perfect or near-perfect
complementarity, they may be expected to lead to
degradation of the transcripts. - This could serve as a mechanism for detecting and
neutralizing aberrant transcripts (having
readthrough transcription from retained introns
or neighboring genomic regions) - as well as serving to regulate specific mRNAs.
28Take-Home Lessons
- miRs may not all derive from the same genomics,
may not all interact with their targets via the
same rules. - Transposable elements have contributed to both
miR genes and targets. - Helps explain why targets have generic tags, why
many targets are in 3-UTR region, why miRs are
not all conserved. - Alu repeats as a putative miR target!
29 CAGCACUUUGGGAG 14-mer from Alu consensus
position 32-45 a) sequence length
microRNA CAGCACUUU 9 93, 372, 106b
AGCACUUUG 9 17-5p, 20b, 520g,h
CAGCACUU 8 512-3p AGCACUUU
8 520a,b,c,d,e, 526b GCACUUUG 8 519d
AGCACUU 7 302a,b,c,d, 373
UUGGGAG 7 150 b)
AGCACUUU 8 106a, 20a AGCACUU 7 520f
GCACUUU 7 519a,b,c,e
30(No Transcript)
31Hill et al. do introns have miR-like effects?
- Transfected isolated introns into HeLa cells
(incorporated into genome) - Lots of genes went up and down, some pattern to
the effects - Reminiscent of Lim et al. with miRs, but.
- Did not show if drosha/dicer is involved..
- Did not show if there is a sequence
complementarity involved with targets. - Did not show if introns are processed to small
RNAs - Much less whether 22 nt. in length.
- But, introns could be superset of small RNA
world?!