Title: Identification of Transposable Elements Using Multiple Alignments of Related Genomes
1Identification of Transposable Elements Using
Multiple Alignments of Related Genomes
- I690 Paper Presentation
- Yin Wu
2Transposable Elements
- Transposable Elements (TEs) are the chief cause
of gapped regions in up to 10 of currently
sequenced genomes. - Alignment gaps which have little or no alignment
to other genomes lead to signatures within
multiple alignments that can be used to identify
TEs.
3Multiple Alignments BTW Related Genomes
By aligning genomes of related species, it is
possible to identify TEs. Consider a speciation
event causing the recent divergence of genomes S1
and S2. We expect to see some gaps in the
alignment due to small insertions and deletions.
Those long and repeated gaps are likely to be
TEs. When additional related genomes are added
to the alignments, the chance of mis-alignment
decreases.
4Method
- Multiple alignment of homologous regions of
related genomes to find Insertion Regions (IR) - Local alignment of each set of IRs to find
Repeated Insertion Regions (RIR) - Filter and assemble RIRs.
5Types of Insertions
Micr-satellite (NOT TE)
Tandem Repeats
Nested Repeats
Concatenated Repeats
6Filter and assemble RIRs
- Micro-satellite regions Short (lt20 bp) repeats
with close and sequential hits to self. - Tandem repeats Long (gt30 bp) repeats which
sequentially align to both self and to
subcomponents in other IRs.
7Filter and assemble RIRs (contd)
- Nested repeats Long non-overlapping (gt30 bp)
that sequentially align to other IRs, where there
is no intersection between the set of IRs to
which each subcomponent aligned. - Concatenated repeats IRs within a certain
genomic distance (lt700 bp) that align
sequentially to other insertion regions.
8Case Study
- Case study on four drosophila genomes
- Melanogaster, Yakuba, Pseudoobscura, and Virilis
- The result is compared against the BDBP natural
TE annotation set. (http//www.fruitfly.org/p_disr
upt/TE.html)
9Case Study (contd)
Conserved Region
Insertion Region (gap)
Annotated TEs
10Case Study (contd)
Chr arm BDGP Trans Not in Alignment Not in RIR (false neg) In RIR (true pos)
X 276 8 52 216
2L 305 16 62 227
2R 312 3 52 257
3L 288 13 65 210
3R 288 5 64 219
4 102 21 33 48
1571 66 349 1156
100 4.2 22 74
11Case Study (contd)
- Identification of new instances of known TE
families - 355 instances of recent elements
- 232 instances of ancient elements
- Proposed new families in euchromatin
- Define a cluster of RIRs to be a new family if
the intracluster variability is within certain
threshold - Six new families of TEs are proposed, each
containing more than five instances.
12Limitation
- The proposed method is dependent on the
annotation of homologous regions (to avoid gene
rearrangements). - Inversion and translocation are not detected by
this method.