Title: The gene family play and the chromosomal theater
1The gene family play and the chromosomal theater
- Todd Vision
- Department of Biology
- University of North Carolina at Chapel Hill
2Outline
- Large-scale duplication and loss of genes in the
angiosperms - Looking into the future of plant phylogenomics
- A case study in gene family demography
- Duplication and functional divergence
3(No Transcript)
4Arabidopsis as a hub for plant comparative maps
data from Arumuganathan Earle (1991)Plant Mol
Biol Rep 9208-218
5Tomato-Arabidopsis synteny
Bancroft (2001) TIG 17, 89 after Ku et al (2000)
PNAS 97, 9121
6Duplicated genes in Arabidopsis
7Modes of gene duplication
- Tandem (T)
- unequal crossing-over
- mostly young
- Dispersed (D)
- transposition
- all ages
- Segmental (S)
- polyploidy
- all old
8Paleotetraploidy?
The Arabidopsis Genome Initiative. 2000. Nature
408796
9Vision et al. (2000) Science 2902114-7.
10Microsynteny within blocks
11distribution of dA
in blocks
not in blocks
- Problems
- proteins diverge at different rates
- high dA is difficult to estimate
- Solution
- average dA within blocks
12discrete duplication events
13the 2-4 complex(one ancestral segment broken up
by 4 large inversions)
14coefficient of variation 0.67
coefficient of variation 0.53
15Rice-Arabidopsis microsynteny
Mayer et al. (2001) Genome Res. 11, 1167
16Blanc, Hokamp, Wolfe (2003) Genome Res. 13,
137-144.
17(No Transcript)
18Block 37 after Asterid-Rosid split
Block 57 before monocot-dicot divergence
Raes, Vandepoele, Saeys, Simillion, Van de Peer
(2003) J. Struct. Func. Genomics 3, 117-129
19Divergence among duplicated genes in rice
Goff et al. (2002) Science 296 92
20Hidden syntenies
Simillion, Vandepoele, Van Montagu, Zabeau, Van
de Peer (2002) PNAS 99, 13627
21Interspecies comparison can reveal hidden
syntenies
Vandepoele, Simillion, Van de Peer (2002) TIG 18,
606-608
22Comparative mapping in a phylogenetic context
23Major plant genome datasets
- Family Genus genome EST
map - Aizoaceae Mesembryanthemum crystallinum
X - Brassicaceae Arabidopsis thaliana
X X X - Brassica spp.
X - Fabaceae Glycine max
X X - Medicago truncatula
X X - Phaseolus spp.
X - Malvaceae Gossypium arboreum
X X - Solanaceae Capsicum annuum
X - Lycopersicon esculentum
X X - Solanum tuberosum
X X - Poaceae Hordeum vulgare
X X - Oryza sativa
X X X - Sorghum bicolor/propinguim
X X - Triticum aestivum
X X - Zea mays
X X - Other Beta vulgaris
X - Chlamydomonas reinhardtii
X X - Pinus taeda
X X
24Plant unigene datasets
- species TIGR PlantGDB
- barley 49885 74621
- beet na 13565
- chlamydomonas 30296 na
- citrus na 4266
- coffee na 392
- cotton 24350 27854
- grape 49885 74621
- iceplant 8455 8945
- lettuce 21960 na
- lotus 11025 na
- maize 55063 71655
- marchantia na 1059
- medicago 36976 43384
- oat na 361
- onion 11726 na
- pine 26882 24668
- poplar na 20935
- potato 24275 24839
25Wikström et al (2001) Proc R Soc Lond B 268, 2211
26Plant phylogenomics Phytome
- The goal is to integrate
- Organismal phylogeny
- Gene family
- sequence
- alignment
- phylogeny
- Genetic and physical maps
27Some uses for Phytome
- Starting with a chromosome segment
- Identify homologous segments
- Predict unobserved gene content (candidate QTL)
- Starting with a gene family
- Resolve orthology/paralogy relationships
- Identify coevolving families
- Starting with a species
- Explore lineage-specific diversification
- Guide comparative mapping wet-work
28Current pipeline
Homolog identification
Protein sequence prediction
Unigene collections
Protein family clustering
Annotations
Multiple sequence alignment
Phytome
Phylogenetic inference
29(No Transcript)
30Lineage specific diversification
Arabidopsis
1033
436
173
Cotton
334
836
696
Medicago
715
Tomato
919
Rice
152 genes are single copy in all four species
31A tale of two sisters the ARF and the Aux/IAA
gene families
- Modulate whole plant response to auxin
- Interact via dimerization
- ARFs are transcription factors
- Aux/IAAs bind and repress ARFs in the absence of
auxin
32The chromosomal context
33Diversification of ARFs
34Diversification of the Aux/IAAs
35(No Transcript)
36Why the different patterns of diversification?
- 12 (ARF) vs 40 (Aux/IAA) segmental duplications
- Presumably reflects differential retention
- Possible explanations
- Dosage requirements
- Coevolution with other interacting genes
- Regional transcriptional regulation
37Divergence of duplicated genes
Divergence in expression profile
Age of duplication
38Duplicate pairs in yeast and human (Gu et al.
2002, Makova and Li 2003)
- Appx. 50 of pairs diverge very rapidly
- Proportion of divergent pairs increases with Ks
and Ka - Plateaus at Ka 0.3 in human
- In humans,
- Immune response genes over-represented among
young, divergent pairs - Distantly related pairs with conserved expression
tend to be either ubiquitous or very tissue
specific
39Retention of duplicated genes
- Nonfunctionalization, or loss of one copy
- The fate of most pairs
- Neofunctionalization (NF)
- Positive selection on a new mutation can maintain
the pair - Subfunctionalization (SF)
- Mutations that increase the specificity of
duplicates can fix due to drift provided that,
combined, the two copies provide the
functionality of the ancestral gene. Once SF
happens, both copies are indispensable and are
retained. - One prediction of the model is that SF more
likely for tandem than dispersed pairs (due to
linkage)
40Digital expression profiling
- Massively Parallel Signature Sequencing (MPSS)
- Count occurrence of 17-20 bp mRNA signatures
- Cloning and sequencing is done on microbeads
- Similar to Serial Analysis of Gene Expression
(SAGE) - Bar-code counting reduces concerns of
- cross-hybridization
- probe affinity
- background hybridization
- Advantages
- Accurate counts of low expression genes
- Can distinguish expression profiles of duplicate
genes
41MPSS library construction
Brenner et al., PNAS 971665-70.
GATC
42MPSS library construction
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
Brenner et al., PNAS 971665-70.
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
Sort by FACS to remove empty beads
The result of the library construction is a set
of microbeads. Each bead contains many DNA
molecules, all derived from the 3 end of a
single transcript. Beads are loaded in a
monolayer on a microscope slide for the
sequencing of 17 20 bp from the 5 end.
43MPSS Sequencing
Brenner et al., Nat. Biotech. 18630-4.
44 MPSS Sequencing
Each bead provides a signature of 17-20 bp
Signature Sequence
of Beads (Frequency)
Tag
GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCT
TTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCA
AGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTA
CCAGAACTCGG . . GATCGGACCGATCGACT
2 53 212 349 417 561 672 702 814 . . 2,935
1 2 3 4 5 6 7 8 9 . . 30,285
Total of tags gt1,000,000
Two sets of signatures are generated from each
sample in different reading frames staggered by
two bases
45Classifying signatures
Typical signatures
46Core Arabidopsis MPSS librariessequenced by Lynx
for Blake Meyers, U. of Delaware
Signatures Distinct Library sequenced signatur
es Root 3,645,414 48,102 Shoot 2,885,229 53,396
Flower 1,791,460 37,754 Callus 1,963,474 40,903
Silique 2,018,785 38,503 TOTAL 12,304,362 133,37
7
47http//www.dbi.udel.edu/mpss
- Query by
- Sequence
- Arabidopsis gene identifier
- chromosomal position
- BAC clone ID
- MPSS signature
- Library comparison
- Site includes
- Library and tissue information
- FAQs and help pages
48Genome-wide MPSS profile in Arabidopsis
Of the 29,084 gene models, 17,849 match
unambiguous, expressed class 1 and/or 2 signatures
49 Dataset of duplicate pairs
- Gene families of size two in Arabidopsis
classified as - Dispersed (280)
- Segmental (149)
- Tandem (63)
- For each pair
- Measure similarity/distance in expression profile
- Estimate of Ks and KA
50Expression distance
51- The number of genes with gt5 ppm expression in a
given number of libraries among the 984 genes in
pairs analyzed and among all Arabidopsis genes
with MPSS profiles. - Libraries Genes in pairs All genes
- 0 153 (15.5) 4160 (23.3)
- 1 124 (12.6) 2643 (14.8)
- 2 73 (7.4) 1727 (9.6)
- 3 93 (9.5) 1777 (10.0)
- 4 109 (11.1) 1930 (10.8)
- 5 432 (43.9) 5612 (31.4)
52Asymmetry in levels of expression among libraries
within pairs
- Symmetry of divergence
- Type of Pair A B C D
- __________________________________________________
______________ - Young
- Dispersed (Ks?0.5) 14 61 8 6
- 15.7 68.5 9.0 6.7
- Tandem (Ks?0.5) 8 29 10 9
- 14.3 51.8 17.9 16.1
- Old
- Dispersed (Ksgt0.5) 35 111 24 21
- 18.3 58.1 12.6 11.0
- Segmental (All) 31 104 7 7
- 20.8 69.8 4.7 4.7
- A Each copy has higher expression in at least
one library
53dN 0.480.37? KA, plt0.0001
54(No Transcript)
55Pairs with small Ks but dissimilar expression
profiles.
- Ks Ka dup gene pair callus flower leaf root sili
que - 0.03 lt0.01 D AT1G80700 71 59 11 140 94
- AT1G80980 0 0 1 8 17
- 0.17 0.05 T AT2G46280 246 210 160 308 80
- AT2G46290 28 29 1 29 16
- 0.20 0.06 T AT2G15400 4 14 5 5 34
- AT2G15430 42 128 14 136 18
- 0.22 0.05 D AT1G36280 1 3 9 13 10
- AT4G18440 40 87 69 69 51
- 0.26 0.05 T AT1G71270 88 56 44 52 107
- AT1G71300 0 0 0 0 1
- 0.27 0.07 T AT3G13290 20 22 1 1 6
- AT3G13300 246 245 72 192 77
56Pairs with large Ks but similar expression
profiles.
- Ks Ka dup gene pair callus flower leaf root sili
que - 0.87 0.28 T AT3G16220 16 10 57 3 19
- AT3G16230 21 12 35 13 13
- 0.89 0.13 D AT3G03660 14 0 0 0 0
- AT5G17810 71 0 0 0 0
- 0.95 0.29 D AT2G41180 57 14 78 4 29
- AT3G56710 75 15 39 3 14
- 0.97 0.28 D AT1G31814 2 39 4 3 0
- AT5G16320 0 55 10 19 8
- 0.98 0.23 D AT5G07230 0 344 0 0 0
- AT5G62080 0 288 0 0 0
- 0.99 0.26 D AT3G22160 86 6 10 4 4
- AT4G15120 34 2 0 0 0
57A closing thought
- 1965
- The Ecological Theater and the Evolutionary Play,
G. E. Hutchison - 2004
- The Chromosomal Theater and the Gene Family Play
- Phylogenetics has a great deal to contribute to
understanding the evolutionary interplay of
genome structure and function
58Dan Brown Brandon Gaut Steven Tanksley Liqing
Zhang Jason Phillips Dihui Lu David
Remington Jason Reed Tom Guilfoyle Blake
Meyers NSF