Title: Differential insertion of transposable elements in Anopheles gambiae M
1Differential insertion of transposable elements
in Anopheles gambiae M S genomes
- Jenica L. Abrudan, Ryan C. Kennedy, Maria F.
Unger, Michael R. Olson, Scott J. Emrich, Frank
H. Collins, Nora J. Besansky - Eck Institute of Global Health, University of
Notre Dame
Abstract
Generating the start up data
Indels between M/S and PEST were determined by
mapping the reads of M/S to the PEST assembly and
comparing the distance between the mate pairs.
Mosquitoes in the Anopheles gambiae species
complex are the major vectors of malaria in
Africa. The original A. gambiae genome sequenced
was the PEST strain, which was later discovered
to be a composite of the A. gambiae M and S
forms. These 2 sympatric forms demonstrate
reproductive isolation and are believed to be
incipient or different species. They have been
individually sequenced recently, so we are
performing computational analysis of the three
genomes to identify sequence differences. We
hypothesize that transposable elements may be
influencing the speciation of A. gambiae. Â
Sequences that are inserted into the S genome
assembly , but not in the M or PEST assembly -
6,767 sequences
Sequences that are inserted into the M genome
assembly , but not in the S or PEST assembly -
6,792 sequences
Sequences that are inserted into the M and PEST
genome assemblies but not in S 1,301 sequences
Insertions of transposable elements have been
associated with alterations in chromosome
structure, recombination, replication, and gene
regulation. Recent studies have indicated the
existence of speciation islands and numerous
genes differentially expressed across multiple
developmental stages between the M and S forms
though many of those genes lie outside of the
speciation islands implying there are more
causal factors to be discovered. We have
identified sequences differently inserted between
the M S genomes relative to PEST. We then
identified the subset of those sequences that
contain transposable elements using a discovery
pipeline we have developed. We are currently
using this subset of data to identify those
sequences that are in close proximity (1kb) to
gene elements, and will perform experiments
designed to measure the expression levels of
those genes. We hope to find a correlation
between the differentially inserted transposons
and the observed gene expression differences.
Sequences that are inserted into the S and PEST
genome assemblies but not in M 2,128
sequences
- The differentially inserted sequences are
computed from the two genome assemblies by
mapping the M and S reads to the PEST genome and
measuring the distance between the mate pairs and
comparing it to where the mate pair would map to
PEST. - Suspicious sequences fell into two categories
- Sequences present in only one assembly (either M
or S) - Sequences present in either M or S and PEST but
not in all three assemblies
Steps in analyzing the potential differential
insertions between M ans S
Putative insertion in M relative to S
blastn
Sequences that were computationally found to be
different between the M and S assemblies were
further analyzed for presence of transposable
elements. For this purpose a database was
computed out of the transposable element known to
be present in the A. gambiae. The sequences were
blasted (blastn) against this database and only
those with an e-value of 10-26 or less were
further considered.
1,075 transposable element related putative
insertions into M relative to S
Transposable elements compiled database
- Input data
- M genome assembly
- S genome assembly
- PEST genome assembly
- DNA sequences from RepBase and TEFam databases
for the known transposable elements in Anopheles
gambiae - Bioperl
1,146 transposable element related putative
insertions into S relative to M
Putative insertion in S relative to M
- Future plans
- Verify that the insertions are fixed between M
S - Look at the insertion site relative to genes
- Look at the possible influence of transposable
elements related sequences on gene expression
The distribution of the families of transposable
elements derived indels between the two genomes
seems to be highly similar
Class I Non-LTR retrotransposons and Class II DNA
transposons seem to have the highest
representation among the potentially different
insertions between the two genomes both as
diversity of sequence and as number of sequences
present.
References D. Lawson, et al., VectorBase a
data resource for invertebrate vector genomics.
Nucleic Acids Research, 37D58307, 2009. Repbase.
http//www.girinst.org/repbase/index.html. VectorB
ase. http//www.vectorbase.org TEfam.
http//tefam.biochem.vt.edu. Bioperl.
http//bioperl.org
While the numbers differ between the two genomes,
Mariner seems to be the most abundant DNA
transposon for both, with 212 sequence for S and
189 for M. The next most abundant element from
the same class in M, Tc1 has a much lower number
for S (151 vs 94)
The VectorBase project is funded by the US
National Institute of Allergy and Infectious
Diseases (NIAID), contract HHSN266200400039C.