Title: Coding Domain Sequence Prediction and Alternative Splicing Detection in Human Malaria Gambiae
1Coding Domain Sequence Prediction and Alternative
Splicing Detection in Human Malaria Gambiae
- Jun Li1, Bing-Bing Wang2, Jose M. Ribeiro3,
Kenneth D. Vernick1,4 - 1. Dept of Microbiology, University of Minnesota,
St. Paul, MN. 2. Pioneer Hi-Bred International,
Johnston, IA. 3. LMVR/NAID, NIH, MD. 4. UGGIV,
Institut Pasteur, Paris, France
2Introduction
- Nearly 2/3 of the worlds population are at risk
for malaria - 1.5 to 2.5 million children die annually
- A. gambiae is the major malaria vector
- Genome-wide research needs good CDS structure
prediction and alternative splicing information. - Current used A. gambiae CDS structures were
predicted based on comparative algorithms that
are too conserve. A lot of genes are missing. - Comparative gene prediction algorithms also have
problems in prediction of terminal exons, thus,
gt40 CDS predicted by this algorithm miss start
and/or stop codons. - The purpose of this work is to create a A.
gambiae specific gene model, fix the incompletion
of CDS, and provide the AS information.
3Combinational Gene Prediction Algorithm
- Gold gene set to train
- GlimmerHMM
- Open-Reading-Frame
- -Selection Algorithm
- Exon-Gene-Union Algorithm
Where x is the basepair, A is ab initio
predicted CDS and P is comparative predicted
CDS C is combinational CDS
4Combinational algorithm improves single algorithm
prediction
Sensi-tivity Speci-ficity Com-plete Rate
GlimmerHMM 95 90 100
ensembl 92 99 60
Combi-national algorithm 96 99 95
Comparison of CDS structure from combinational
algorithm and ensembl.
5Alternative splicing detection in A. gambiae
AS distribution in A. gambiae
Est-aid AS detection algorithm
Align EST to genome, Processing alignments,
extract exon/intron information
Upload to MySQL DB
Quality control, make EST cluster, merge introns
and exons from individual alignments
Compare intron/intron and intron/exon,
find overlapping event, classify AS event.
Conclusion 1512 CDS have alternative splicing,
most of AS happened in CDS region which will
enrich protein structure and function. Manual
curation shows that the false positive (due to
EST contamination) is low (10). The AS type
distribution indicated that mosquito is more
close to plants than mammals.
6Software package and web presentation
The combinational CDS prediction and alternative
splicing detection pipeline have been integrated
into our open-source package (welcome
collaboration). Results is also accessible
through web.