Title: Comparative genomics in flies and mammals
1Comparative genomics in flies and mammals
Broad Institute of MIT and Harvard
MIT Computer Science Artificial Intelligence
Laboratory
2Resolving power in mammals, flies, fungi
12 flies
32 mammals
20 fungi
Many species lead to high resolving power in
close distances
3Comparative genomics and evolutionary signatures
- Comparative genomics can reveal functional
elements - For example exons are deeply conserved to
mouse, chicken, fish - Many other elements are also strongly conserved
exons / regulatory? - Can we also pinpoint specific functions of each
region? Yes! - Patterns of change distinguish different types of
functional elements - Specific function ? Selective pressures ?
Patterns of mutation/inse/del - Develop evolutionary signatures characteristic of
each function
41. Evolutionary signature of protein-coding genes
- Revise protein-coding gene catalogue
5Protein-coding evolution vs. nucleotide
conservation
62. Evolutionary signatures of RNA genes
- Typical substitutions
- Compensatory changes
- GC ? GU GU ? AU
- Prediction methodology
- Jakob Pedersen EvoFold with very stringent
parameters
7Reveal novel RNA genes and structures
- Intronic enriched in A-to-I editing, also novel
ncRNAs - Coding A-to-I editing, also translational
regulation - 3UTRs enriched in regulators of mRNA
localization - 5UTRs translational regulation, ribosomal
proteins - - 3 5UTR structures mostly on coding strand
(75 80)
83. Structural and evolutionary signatures of
miRNAs
Discover novel miRNAs
- Recognize miRNA hairpin
- Length of hairpin length of arms
- Fold stability, symm/assym bulges
- Conservation profile highlowhigh
- Pinpoint mature miRNA 5end
- Perfect 8mer conservation at start
- Predominance of 5U (78)
- Number of paired bases is bound
- Complementary to 3UTR motifs
Revise existing miRNAs
94. Evolutionary signatures for regulatory motifs
Known engrailed site (footprint)
D.mel CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
D.sim CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGT
C D.sec CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAG
TC D.yak CAGC--TAGCC-AACTCTCTAATTAGCGACTAAGTC-CAA
GTC D.ere CAGCGGTCGCCAAACTCTCTAATTAGCGACCAAGTC-CA
AGTC D.ana CACTAGTTCCTAGGCACTCTAATTAGCAAGTTAGTCTC
TAGAG
- Motifs discovered
- - Recover known regulators
- - Many novel motifs
- Evidence for novel motifs
- Tissue-specific enrichment
- Functional enrichment
- In promoters enhancers
- Surprises
- Core promoter elements
- miRNA motifs in coding ex.
10Functions of discovered motifs
Positional biases
Tissue-specific enrichment and clustering
miRNA targeting in coding regions
115. Evolutionary signatures of motif instances
- Allow for motif movements
- Sequencing/alignment errors
- Loss, movement, divergence
- Measure branch-length score
- Sum evidence along branches
- Close species little contribution
BLS 25
BLS 83
Mef2YTAWWWWTAR
12Motif confidence selects functional instances
Transcription factor motifs
Confidence
Confidence
Increasing BLS ? Increasing confidence
Confidence selects functional regions
Confidence selects in vivo bound sites
High sensitivity
microRNA motifs
Confidence selects positive strand
Increasing BLS ? Increasing confidence
Confidence selects functional regions
136. Initial regulatory network for an animal genome
- ChIP-grade quality
- Similar functional enrichment
- High sens. High spec.
- Systems-level
- 81 of Transc. Factors
- 86 of microRNAs
- 8k 2k targets
- 46k connections
- Lessons learned
- Pre- and post- are correlated (hihi/lolo)
- Regulators are heavily targeted, feedback loop
14Network captures literature-supported connections
15Network captures co-expression supported edges
Red co-expressed Grey not co-expressed Named
literature-supported Bold literature-supported
167. ChIP vs. conservation similar power /
complementary
- Together best
- ? complementary
- Bound but not conserved reduced enrich.
- ? Selects functional
- All-ChIP vs. All-cons similar enr.
- ? Similar power
- Cons-only vs. ChIP-all similar
- ? Additional sites
17Recovery of regulatory motif instances in mammals
(80 confidence)
11,000 instances
6X
10k
8k
6k
Total branch length of inf. species
miRNA motif instances recovered (80)
4k
2k
HMRD (0.74)
mamm. (4.33)
pl-mam (3.36)
Hnon-mamm. (6.36)
HMRD non-mam (6.96)
All vertebr. (9.66)
6X
- Performance increases with branch length
(requires closely-related species) - Measure number of recovered motif instances at a
fixed confidence (80) / FDR (20) - Discovery power 6-fold higher than HMRD (Branch
length also 6-fold higher) - With 20 currently-aligned mammals
- Transcription factor motifs 47 TFs
16,000 instances 340 targets on avg - microRNA motifs 21 miRNAs 11,000 instances
523 targets on avg - An initial regulatory network for mammalian
genomes
18New insights into animal biology
191.Large-scale evidence of translational
read-through
Continued protein-coding conservation
Protein-coding conservation
No more conservation
Stop codon read through
- New mechanism of post-transcriptional control.
- Hundreds of fly genes, handful of human genes.
- Enriched in brain proteins, ion channels.
- Experiments show ADAR necessary sufficient
(Reenan Lab). - Many questions remain
- A-to-I editing of stop codon TAGTGATAA ? TGG
- Cryptic splice sites? RNA secondary structure?
202. Stop codon read-through in mammals
Four candidates found GPX2, OPRK1, OPRL1,
GRIA2, mostly neuronal
A look at FOXP2 Possible 3UTR function (not in
fish, yes in frog)
213. New insights into miRNA regulation miRNA
function
- Both miRNA arms can be functional
- High scores, abundant processing, conserved
targets - Hox miRNAs miR-10 and miR-iab-4 as master Hox
regulators
224. New insights into miRNA regulation miR-AS
function
- A single miRNA locus transcribed from both
strands - Both processed to mature miRNAs mir-iab-4,
miR-iab-4AS (anti-sense) - The two miRNAs show distinct expression domains
(mutually exclusive) - The two show distinct Hox targets another Hox
master regulator
235. New insights into miRNA regulation miR-AS
function
?wing w/bristles
Sensory bristles
haltere
haltere
?wing
WT
Note C,D,E same magnification
?wing
sense
Antisense
- Mis-expression of mir-iab-4S AS alteres?wings
homeotic transform. - Stronger phenotype for AS miRNA
- Sense/anti-sense pairs as general building blocks
for miRNA regulation - 9 new anti-sense miRNAs in mouse
24Summary of Contributions
- Evolutionary signatures specific to each function
- Protein-coding genes Revised catalogue affects
10 of genes - RNA hundreds of new high-confidence structures
discovered - miRNAs double number of genes, families,
targeting density - Motifs double number of motifs, tissue
positional enrichment - Targets ChIP-grade quality, global scale,
experimental support - New insights on animal biology
- Genes Abundant stop codon read-through in
neuronal proteins - RNA Abundant structures in RNA editing,
translational regulation - Motifs Coding regions show miRNA targeting
- miRNAs miR/miR and sense/anti-sense pairs
building blocks - Networks TF vs. miRNA targets redundancy and
integration - Methods are general, applicable in any species
25Next steps Drosophila and Human ENCODE
- modENCODE White / Ren / Kellis / Posakony
- Hundreds of sequence-specific factors
- Dozens of chromatin / histone modifications
- Dozens of tissues / stages / conditions
- humENCODE Bernstein / Lander / Kellis / Broad
- ChIP-seq for dozens of chromatin modifications
- Follow differentiation lineages activation
inactivation - Discover tissue-specific regulatory motifs
- Many open questions remain
- Dynamics of tissue-specific regulatory networks
- Sequence determinants of chromatin establ.
maint - Global views of pre- post-transcriptional
regulation - Many open positions remain (postdoc/grad/ugrad)
26Acknowledgements
Alex Stark
Mike Lin
Pouya Kheradpour
Matt Rasmussen
Genes FlyBase, BDGP, Bill Gelbart, Sue Celniker,
Lynn Crosby miRNAs Leo Parts, Julius Brennecke,
Greg Hannon, David Bartel iab-4AS Natascha
Bushati, Steve Cohen, Julius, Greg
Hannon 12-flies Andy Clark, Mike Eisen, Bill
Gelbart, Doug Smith 24 mammals Sante Gnerre,
Michele Clamp, Manuel Garber, Eric Lander