Title: Alternative splicing: A playground of evolution
1Alternative splicing A playground of evolution
- Mikhail Gelfand
- Research and Training Center for Bioinformatics
- Institute for Information Transmission Problems
2Alternative splicing of human(and mouse) genes
3- Evolution of alternative exon-intron structure
- human-mouse
- Drosophila and Anopheles
- Evolution of alternative splicing sites MAGE-A
family of CT antigens - Evolutionary rate in constitutive and alternative
regions - human-mouse
- human SNPs
- Alternative splicing and protein structure
4Data and Methods (routine)
- known alternative splicing
- HASDB (human, ESTsmRNAs)
- ASMamDB (mouse, mRNAsgenes)
- additional variants
- UniGene (human and mouse EST clusters)
- complete genes and genomic DNA
- GenBank (full-length mouse genes)
- human genome
- TBLASTN (initial identification of orthologs
mRNAs against genomic DNA) - BLASTN (human mRNAs against genome)
- Pro-EST (spliced alignment, ESTs and mRNA against
genomic DNA)
5- Pro-Frame (spliced alignment of proteins against
genomic DNA) - confirmation of orthology
- same exon-intron structure for at least one
isoform - gt70 identity over the entire protein length
- analysis of conservation of human alternative
splicing in the mouse genome align human protein
to mouse genomic DNA the isoform is conserved if - all exons or parts of exons are conserved
- all sites are conserved
- same procedure for mouse proteins and human DNA
- We do not require that the isoform is actually
observed as mRNA or ESTs
6166 gene pairs
Known alternative splicing
human
mouse
42
84
40
126
124
7Elementary alternatives
Cassette exon
Alternative donor site
Alternative acceptor site
Retained intron
8Human genes
mRNA EST
cons. non-cons. cons. non-cons.
Cassette exons 56 25 74 26
Alt. donors 18 7 16 10
Alt. acceptors 13 5 19 15
Retained introns 4 3 5 0
Total 96 30 114 51
Total genes 45 28 41 44
Conserved elementary alternatives 69 (EST) -
76 (mRNA)Genes with all isoforms conserved 57
(45)
9Mouse genes
mRNA EST
cons. non-cons. cons. non-cons.
Cassette exons 70 5 39 9
Alt. donors 24 6 17 6
Alt. acceptors 15 6 16 9
Retained introns 8 7 10 4
Total 117 24 82 28
Total genes 68 22 30 26
Conserved elementary alternatives 75 (EST) -
83 (mRNA)Genes with all isoforms conserved 79
(64)
10Real or aberrant non-conserved AS?
- 24-31 human vs. 17-25 mouse elementary
alternatives are not conserved - 55 human vs 36 mouse genes have at least one
non-conserved variant - denser coverage of human genes by ESTs
- pick up rare (tissue- and stage-specific) gt
younger variants - pick up aberrant (non-functional) variants
- 17-24 mRNA-derived elementary alternatives are
non-conserved (compared to 25-32 EST-derived
ones)
11Comparison to other studies.Modrek and Lee,
2003 skipped exons
- inclusion level is a good predictor of
conservation - 98 constitutive exons are conserved
- 98 major form exons are conserved
- 28 minor form exons are conserved
- inclusion level of conserved exons in human and
mouse is highly correlated - Minor non-conserved form exons are errors? No
- minor form exons are supported by multiple ESTs
- 28 of minor form exons are upregulated in one
specific tissue - 70 of tissue-specific exons are not conserved
- splicing signals of conserved and non-conserved
exons are similar
12- Evolution of alternative exon-intron structure
- human-mouse
- Drosophila and Anopheles
- Evolution of alternative splicing sites MAGE-A
family of CT antigens - Evolutionary rate in constitutive and alternative
regions - human-mouse
- human SNPs
- Alternative splicing and protein structure
13Fruit fly and mosquito
- Technically more difficult than human-mouse
- incomplete genomes
- difficulties in alignment, especially at gene
termini - changes in exon-intron structure irrespective of
alternative splicing (4.7 introns per gene in
Drosophila vs. 3.5 introns per gene in Anopheles)
14Methods
- Pro-Frame Align Dme protein isoforms to Dps and
Aga genes - coding segments regions in Dme genes between Dme
intron shadows - We follow the fate of Dme exons and coding
segments in Dps and Aga genomes - slices regions between all exon-exon junctions
(intron shadows) from all three genomes (Dme,
Dps, Aga) mapped to Dme isoforms - slice is conserved if it aligns with ?35 identity
15Conservation of coding segments
constitutive segments alternative segments
D. melanogaster D. pseudoobscura 97 75-80
D. melanogaster Anopheles gambiae 77 45
16Conservation of D.melanogaster elementary
alternatives in D. pseudoobscura genes
- blue exact
- green divided exons
- yellow joined exon
- orange mixed
- red non-conserved
- retained introns are the least conserved
- mutually exclusive exons are as conserved as
constitutive exons
17Conservation of D.melanogaster elementary
alternatives in Anopheles gambiae genes
- blue exact
- green divided exons
- yellow joined exons
- orange mixed
- red non-conserved
- 30 joined, 10 divided exons (less introns in
Aga) - mutually exclusive exons are conserved exactly
- cassette exons are the least conserved
18CG1517 cassette exon in Drosophila, alternative
acceptor site in Anopheles
19CG31536 cassette exon in Drosophila, shorter
cassette exon and alternative donor site in
Anopheles
20CG1587 alternative acceptor site in Drosophila,
candidate retained intron in intronless gene of
Anopheles
21- Evolution of alternative exon-intron structure
- human-mouse
- Drosophila and Anopheles
- Evolution of alternative splicing sites MAGE-A
family of CT antigens - Evolutionary rate in constitutive and alternative
regions - human-mouse
- human SNPs
- Alternative splicing and protein structure
22Alternative splicing in a multigene family the
MAGEA family of cancer/testis specific antigens
- A locus at the X chromosome containing eleven
recently duplicated genes two subfamilies of
four genes each and three single genes - Retrogene one protein-coding exon, multiple
different 5-UTR exons - Mutations create new splicing sites or disrupt
existing sites
23Birth of donor sites (new GT in alternative
intial exon 5)
24Birth of an acceptor site (new AG and polyY
tract in MAGEA8-specific cassette exon 3)
25Birth of an alternative donor site (enhanced
match to the consensus (AG) in cassette exon 2)
26Birth of an alternative acceptor site (enhanced
polyY tract in cassette exon 4)
27- Evolution of alternative exon-intron structure
- human-mouse
- Drosophila and Anopheles
- Evolution of alternative splicing sites MAGE-A
family of CT antigens - Evolutionary rate in constitutive and alternative
regions - human-mouse
- human SNPs
- Alternative splicing and protein structure
28Concatenates of constitutive and alternative
regions in all genes different evolutionary rates
- Relatively more non-synonimous substitutions in
alternative regions (higher dN/dS ratio)
- Less amino acid identity in alternative regions
- Columns (left-to-right) (1) constitutive
regions - (24) alternative regions N-end, internal, C-end
29Individual genes the rate of non-synonymous to
synonymous substitutions dn/ds tends to be larger
in alternative regions (vertical acis) than in
constitutive regions (horizontal acis)
30dn/ds (con) dn/ds (alt)
complete genes
N-terminal regions
internal regions
C-terminal regions
31- Evolution of alternative exon-intron structure
- human-mouse
- Drosophila and Anopheles
- Evolution of alternative splicing sites MAGE-A
family of CT antigens - Evolutionary rate in constitutive and alternative
regions - human-mouse
- human SNPs
- Alternative splicing and protein structure
32Na/Ns (alternative) gt Na/Ns (constitutive)for
all evidence levels
33- Evolution of alternative exon-intron structure
- human-mouse
- Drosophila and Anopheles
- Evolution of alternative splicing sites MAGE-A
family of CT antigens - Evolutionary rate in constitutive and alternative
regions - human-mouse
- human SNPs
- Alternative splicing and protein structure
34Alternative splicing avoids disrupting domains
(and non-domain units)
Control fix the domain structure randomly place
alternative regions
35 and this is not simply a consequence of the
(disputed) exon-domain correlation
36Positive selection towards domain shuffling (not
simply avoidance of disrupting domains)
37Short (lt50 aa) alternative splicing events within
domains target protein functional sites
c)
FT
positions
affected
FT
positions
unaffected
Prosite
patterns
affected
Prosite
patterns
unaffected
Expected
Observed
38An attempt of integration
- AS is often young (as opposed to degenerating)
- young AS isoforms are often minor and
tissue-specific - but still functional
- although unique isoforms may be result of
aberrant splicing - AS often arises from duplication of exons
- or point mutations creating splicing sites
- or intron insertions
- AS regions show evidence for positive selection
- excess non-synonymous and damaging SNPs
- excess non-synonymous codon substitutions
- AS tends to shuffle exons and target functional
sites in proteins - Thus AS may serve as a testing ground for new
functions without sacrificing old ones
39Acknowledgements
- Discussions
- Vsevolod Makeev (GosNIIGenetika)
- Eugene Koonin (NCBI)
- Igor Rogozin (NCBI)
- Dmitry Petrov (Stanford)
- Dmitry Frishman (GSF, TUM)
- Data
- King Jordan (NCBI)
- Support
- Ludwig Institute of Cancer Research
- Howard Hughes Medical Institute
- Russian Academy of Sciences (program Molecular
and Cellular Biology) - Russian Fund of Basic Research
40Authors
- Andrei Mironov (Moscow State University)
spliced alignment - Ramil Nurtdinov (Moscow State University)
human/mouse, data - Irena Artamonova (GSF/MIPS) human/mouse,
MAGE-A - Dmitry Malko (GosNIIGenetika, Moscow)
mosquito/drosophila - Ekaterina Ermakova (Moscow State University)
evolution of alternative/constitutive regions - Vasily Ramensky (Institute of Molecular Biology,
Moscow) SNPs - Shamil Sunyaev (EMBL, now Harvard University
Medical School) protein structure - Eugenia Kriventseva (EBI, now EMBL) protein
structure