Title: Distribution of Introns among Full Length cDNA
1Distribution of Introns among Full Length cDNA
Bioinformatics Capstone
- By Xin Hong
- Advisor Dr. Michael Lynch and Dr. Sun Kim
2Main Points
- Motivation
- Background
- Data sources
- Method
- Results and discussion
3Motivation
- Genomic sequences
- Full length cDNA project
- Gene predict program does not include UTR
regions. - The UTR structure and Function and NMD theory.
4 Definition of UTRs and Introns
- 5UTR sequences were defined as the mRNA region
spanning from the cap site to the starting codon
(excluded). - 3UTR sequences were defined as the mRNA region
spanning from the stop codon (excluded) to
poly(A) starting site. - The coding region begins with the initiation
codon, which is normally ATG. It ends with one of
three termination codons TAA, TAG or TGA.
Genomic sequence
Pre-mRNA
1
2
3
mRNA
3UTR
5UTR
CDS
5Function of UTRs
- Translational control
- mRNA sub cellular localization
- mRNA stability
Pesole, 2001
6Nonsense-Mediated Decay (NMD)
- An mRNA is immune to NMD if translation
terminates less than 5055 nucleotides upstream
or downstream of the 3'-most exonexon junction,
which is the last intron of cDNA. - NMD is a a mRNA surveillance mechanism that leads
to selective degradation of transcripts
containing premature termination codon.
7Objectives
- To explore introns in the UTR region
- To find the rule about introns distribution among
UTR regions. - To compare the introns distribution between UTRs
and CDS. - To compare the introns distribution rules among
different species.
8Data source
- Full length cDNA sequences
- MGC (Mammalian Gene Collection) - mammalian
- BDGP fruit fly
- KOME plant
- Genomic sequences
- Genbank
- Ensmbal
- CDS prediction (Furuno et al. 2003)
- ProCrest
- rsCDS
- NCBI predictor
- DECODER
- Experiment
Human (hs) 15504 15458
Mouse (mm) 12828 12803
Rat (rn) 641 634
Drosophila melanomas (dm) 9152 9096
Arabidopsis thaliana (at) 18415 18414
9Method
- Do alignment between cDNA sequences and Genomic
sequence - How about gaps, overlapping even polymorphism?
- BLAST, Mega BLAST ..
- sim4, gap2, spidey, BLAT and GeneSeqer
Jim Kent - the Blat Rap
10Steps
- Clear full length cDNA and genomic sequence.
- Parse cDNA to 5UTR, CDS and 3UTR three parts.
- Aligning cDNA to genomic sequence by BLAT
- Parse BLAT result to get locations of exon and
intron. - Get sequences of exon and intron.
- Check if sum of exons equal to cDNA to remove
suspect candidates. - Calculate the average length of the cDNA, the
average number of introns in cDNA, etc. - Compare the intron distribution of 5UTR, CDS and
3UTR regions. - Compare the intron distribution rules among
different species.
11Flow Chart
12Objectives
- To explore introns in the UTR region
- To find the rule about introns distribution among
UTR regions. - To compare the introns distribution between UTRs
and CDS. - To compare the introns distribution rules among
different species.
13Introns Do Exist in UTRs
- Introns do exist in UTRs.
- However, for arabidopsis as an example, 80 of
sequences of 5UTR dont have introns. 90 of
sequences of 3UTR dont have introns.
14Introns in CDS
- 80 of sequences of CDS have introns.
15Introns number UTRs vs. CDS
- Most of CDS sequences have introns, but most of
UTR sequences dont have introns.
Number of sequences
Number of intron
16Objectives
- To explore introns in the UTR region
- To find the rule about introns distribution among
UTR regions - To compare the introns distribution between UTRs
and CDS - To compare the introns distribution rules among
different species
17Introns in UTR
- Introns of 5UTR and 3UTR are overspread, but
not evenly or uniformly distributed. - If evenly distributed, the expected intron
location 1/(number of intron1)
Intron Number
Number of intron
18Introns in UTR
- The number of intron increase, when the length of
sequence increase. - For human 5UTR, on average an intron is present
for each 100nt. - Introns of 3UTR tend to concentrate toward the
center of 3UTR.
Location of introns
Length of sequences
Number of intron
Number of intron
19Objectives
- To explore introns in the UTR region
- To find the rule about introns distribution among
UTR regions. - To compare the introns distribution between UTRs
and CDS. - To compare the introns distribution rules among
different species.
20Introns in CDS
- Introns in CDS are overspread.
- For human, if there are more than one intron, the
interval between 2 introns is about 140nt. (In
other word, the average exon in CDS is 140nt) - Introns are shift toward 5.
21Intron distribution UTRs vs. CDS
- Human as example
- The frequency of introns occurring 5UTR is
higher than that of CDS. - The frequency of introns occurring CDS is higher
than that of 3UTR.
Number of intron
Number of intron
22Intron distribution UTRs vs. CDS
5UTR CDS 3UTR
Interval between 2 introns 100nt 140nt uncertain
Intron frequency Higher than CDS Higher than 3UTR Lowest
distribution evenly Shift toward 5 of CDS Concentrate toward the center of 3UTR
23Objectives
- To explore introns in the UTR region
- To find the rule about introns distribution among
UTR regions. - To compare the introns distribution between UTRs
and CDS. - To compare the introns distribution rules among
different species.
24Different species UTRs vs. CDS
- Number of introns increase with the length of
sequence in both UTRs and CDS. - The sequences of 5UTR less than 100nt dont have
introns for human, mouse, rat, Arabidopsis and
fruit fly. - While the sequences of CDS less than 800nt dont
have introns for human, mouse, Arabidopsis and
fruit fly. For rat this boundary is 500nt. - The fruit fly sequence length increase faster
than the other species in both UTRs and CDS.
Number of intron
Number of intron
25Different species UTRs vs. CDS
- For 5 species, most of UTRs dont have introns.
- For 5 species, most of CDS have introns.
- The intron distribution rule works for human,
mouse, rat, arabidopsis and fruit fly.
Number of sequences
Number of sequences
Number of intron
Number of intron
26Summary
- The introns do exist in UTRs.
- The intron distributions in 5UTR, CDS and 3UTR
are different for same organism. - The intron distribution rules are in common for
human, mouse, rat, Arabidopsis and fruit fly. - The sequences of 5UTR less than 100nt dont have
introns for human, mouse, rat, Arabidopsis and
fruit fly. - While the sequences of CDS less than 800nt dont
have introns for human, mouse, Arabidopsis and
fruit fly except for rat is 500nt. - The fruit fly fl-cDNA sequence length increase
faster than the other species in both UTRs and
CDS.
5UTR CDS 3UTR
Percentage (sequence have introns) 20 80 10
Interval between 2 introns 100nt 140nt uncertain
Intron frequency Higher than CDS Higher than 3UTR Lowest
distribution evenly Shift toward 5 of CDS Concentrate toward the center of 3UTR
27Future work
- NMD widely exists among different species.
- The reason why most UTR dont have introns.
- The reason why intron frequency decrease when
sequence goes from 5 to 3 along the full length
cDNA.
28Reference
- Lynch, Micheal and Kewalramani, Avinash (2003)
Messenger RNA Surveillance and the Evolutioary
Proliferation of introns. Mol.Biol.Evol
20(40)563-571 - Flavio Mignone, Carmela Gissi, Sabino Liunu and
Graziano Pesole (2002) Untranslated regions of
mRNAs. Genome Biology 3(3) revies 0004.1-0004.10 - Pesole G, Grillo G, Larizza A, Liuni S. (2000)
The untranslated regions of eukaryotic mRNAs
Structure, function, evolution and bioinformatics
tools for their analysis. Briefing in
Bioinformatics. 1(3)236-249 - W.James (2002) Kent BLAT The BLAST-Like
Alignment Tool Genome Res. Apr12(4)656-64. - Furuno M, Kasukawa T, Saito R, Adachi J, Suzuki
H, Baldarelli R, Hayashizaki Y, Okazaki Y.(2003)
CDS annotation in full-length cDNA sequence.
Genome Res, Jun 13(6B) 1478-1487 - Strausberg RL et al. (2002) Generation and
initial analysis of more than 15,000 full-length
human and mouse cDNA sequences. Proc Natl Acad
Sci U S A. 2499(26)16899-903. - http//www.ncbi.nlm.nih.gov
29Acknowledgement
- Dr. Micheal Lynch
- Dr. Sun Kim
- Dr. Douglas G. Scofield
30THE END