Title: The evolution of expression patterns in the Arabidopsis genome
1The evolution of expression patterns in the
Arabidopsis genome
- Todd Vision
- Department of Biology
- University of North Carolina at Chapel Hill
2Driving forces in genome evolution
- Proximate vs. ultimate explanations
- Deleterious mutations are frequent and selection
cannot effectively act on all of them - Substitutions
- Insertions and deletions
- Duplications
- Transpositions
- Cellular processes will be affected by this rain
of mutations - At the molecular level, we must entertain
ultimate explanations that do not invoke adaption
3An example Codon bias
- Genes differ in the frequency that they use the
preferred codon for a given amino acid, thereby
affecting - Translational efficiency
- Translational accuracy
- The strongest codon bias is typically seen in
short, highly expressed genes under strong
purifying selection - Realized codon bias is a balance between
selection for preferred codons and a continual
rain of mutations toward unpreferred codons
4What are the consequences of mutational rain on
the regulatory networks that modulate gene
expression?
5Outline
- Arabidopsis gene expression (MPSS)
- Two evolutionary issues in the evolution of
expression profiles - Physical clustering of co-expressed genes
- Divergence of duplicated genes
6Digital expression profiling
- Bar-code counting raises fewer concerns about
cross-hybridization, probe selection, background
hybridization, etc. - Serial Analysis of Gene Expression (SAGE)
- Count occurrence of 10-12 bp mRNA signatures
- Long SAGE 21-22 bp signatures
- Uses conventional sequencing technology
- Massively Parallel Signature Sequencing (MPSS)
- Count occurrence of 17-20 bp mRNA signatures
- Cloning and sequencing is done on microbeads
- Commercialized by Lynx Therapeutics
7MPSS library construction
Brenner et al., PNAS 971665-70.
GATC
8MPSS library construction
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
Brenner et al., PNAS 971665-70.
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
Sort by FACS to remove empty beads
The result of the library construction is a set
of microbeads. Each bead contains many DNA
molecules, all derived from the 3 end of a
single transcript. Beads are loaded in a
monolayer on a microscope slide for the
sequencing of 17 20 bp from the 5 end.
9MPSS Sequencing
Brenner et al., Nat. Biotech. 18630-4.
10 MPSS Sequencing
Each bead provides a signature of 17-20 bp
Signature Sequence
of Beads (Frequency)
Tag
GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCT
TTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCA
AGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTA
CCAGAACTCGG . . GATCGGACCGATCGACT
2 53 212 349 417 561 672 702 814 . . 2,935
1 2 3 4 5 6 7 8 9 . . 30,285
Total of tags gt1,000,000
Two sets of signatures are generated from each
sample in different reading frames staggered by
two bases
11A catalog of signatures in the Arabidopsis genome
Hits At genome of total Random
1 748204 87.407 845057 2 88392 10.326 6134 3 11
019 1.287 21 4 3512 0.410 0 5 1452 0.170 0 6 87
4 0.102 0 7 470 0.055 0 8 326 0.038 0 9 237 0.0
28 0 10 192 0.022 0 11 158 0.018 0 12-20 707 0.
083 0 21-30 247 0.029 0 31-50 124 0.014 0 gt
50 86 0.010 0 Total 851,212 851,212
All potential signatures (GATC 13 bp) are
identified on both strands of the genomic
sequence. There is one potential signature
appx. every 293 bp on each strand of genome A
signature is classified according to its position
relative to the 29,084 genes pseudogenes in the
TIGR annotation Signatures may not be unique.
The number of hits in the genome is recorded
12Classifying signatures
Typical signatures
13Arabidopsis signatures
Based on TIGR annotation (release 3.0, July 2002)
Class in genome of total 1 sense
exonic 203,174 24.0 2 3UTR, lt500 bp
44,202 5.2 3 anti-sense exonic
197,065 23.3 4 inter-genic 288,109 34.0 5
intronic 60,817 7.2 6 anti-sense intronic
57,845 6.8 TOTAL 851,212 100.5
355 genes lack potential Class 1 or 2 signatures
(undetectable) On average, there are 8.5 class 1
2 signatures per gene 8422 genomic signatures
have secondary classes due to overlap or near
overlap of two genes in the TIGR annotation.
14Core Arabidopsis MPSS librariessequenced by Lynx
for Blake Meyers, U. of Delaware
Signatures Distinct Library sequenced signatur
es Root 3,645,414 48,102 Shoot 2,885,229 53,396
Flower 1,791,460 37,754 Callus 1,963,474 40,903
Silique 2,018,785 38,503 TOTAL 12,304,362 133,37
7
15Catalog of expressed signatures
Class Position Count 1 or 2 Exon or
3UTR 25,568 3 through 6 Elsewhere in
genome 14,424 0 No match in genome! 10,871
Counting only signatures with abundance 4 PPM
in at least one library. Total is for for 7
libraries (core 1 new root flower library)
16Genome-wide expression profiling Arabidopsis
Of the 29,084 gene models, 14,674 match unique,
expressed signatures
17http//www.dbi.udel.edu/mpss
- Query by
- Sequence
- Arabidopsis gene identifier
- chromosomal position
- BAC clone ID
- MPSS signature
- Library comparison
- Site includes
- Library and tissue information
- FAQs and help pages
18Outline
- Arabidopsis gene expression (MPSS)
- Two evolutionary issues in the evolution of
expression profiles - Physical clustering of co-expressed genes
- Divergence of duplicated genes
19Physical clustering of co-expression
- Caenorhabditis elegans Roy et al., (2002) Nature
418, 975 - Lercher et al (2003) Genome Research 13, 238
- Drosophila melanogaster Boutanaev et al (2002)
Nature 420, 666 - Spellman and Rubin (2002) J Biology 1, 5
- Homo sapiens Caron et al (2001) Science 291,
1289 - Lercher et al (2002) Nature Genetics 31, 180
- Saccharomyces cerevisiae Cohen et al (2000)
Nature Genetics 26, 183 - Hurst et al (2002) Trends in Genetics 18, 604
- Mannila et al (2002) Bioinformatics 18, 482
-
- What are the proximate explanations?
- shared cis-regulatory elements
- chromatin packaging, etc.
- What are the ultimate explanations?
- Adaptive greater transcriptional
efficiency/accuracy? - Maladaptive mutational rain chipping away at
insulators and other mechanisms that over-ride
regional controllers of gene expression?
20Measuring expression distance
21Clustering of tissue-specific expression
Chromosome 1
Flower (red)Silique (violet)Leaf (green)Root
(blue)Callus (white)
22Statistical tests of coexpression clustering
- Measured median pairwise expression distance
(MPED) in non-overlapping windows of 20 genes - Summed unique class 1 and 2 signatures for each
gene - Only one gene within each tandemly arrayed family
was counted - Out of 100 shuffles of gene order
- Zero shuffles had as many windows with small MPED
(less than 1.5) as the unshuffled data - Zero shuffles had as large a variance in MPED
among windows as the unshuffled data
23Coexpression in Arabidopsis
24Coexpression in Arabidopsis
25Coexpression in Arabidopsis
26Selection and recombination
- In regions of low recombination
- deleterious mutations can hitch-hike to high
frequency along with favorable ones - favorable mutations are kept at low frequency by
linkage to deleterious ones - Therefore, the effectiveness of natural selection
is causally related to recombination rate - Are clusters more concentrated in regions of
- high recombination (i.e. are they adaptive)
- low (i.e. are they maladaptive)?
27Measuring recombination rate
Chromosome 1
28Co-expression is greater in low recombination
regions
29Co-expression clusters
- MPSS data provides evidence for clusters of
co-expression among non-related genes in
Arabidopsis - Co-expression is greater in regions of low
recombination - Thus, co-expression clusters may be maladapative,
at least on average
30Outline
- Arabidopsis gene expression (MPSS)
- Two evolutionary issues in the evolution of
expression profiles - Physical clustering of co-expressed genes
- Divergence of duplicated genes
31Divergence of duplicated genes
Expression distance
Age of duplication
32Duplicated genes in Arabidopsis
33Modes of gene duplication
- Tandem (unequal crossing-over)
- Dispersed (transposition)
- Segmental (polyploidy)
34Divergence of duplicated genes
- All gene families of size 2 in Arabidopsis were
classified as dispersed, segmental or
tandem - Expression distance was calculated for each
- The number of silent (i.e. synonymous)
substitutions per site was calculated for each
(as a proxy for age since duplication)
35Divergence and mode of duplication
36Divergence of duplicated genes
- Almost all expression divergence occurs during
(or immediately following) duplication - Initial expression divergence is more extreme for
tandem than dispersed duplicates - Tandem and dispersed duplicates with the most
divergent expression profiles are quickly lost - Segmental duplicates plateau at a lower level of
expression divergence than dispersed duplicates - The average divergence in relative expression
level in each tissue is about 8-fold.
37Lessons learned
- Clusters of co-expression in Arabidopsis may be
largely the result of a rain of weakly
deleterious mutations that homogenize the
expression profiles of neighboring genes - Divergence in expression profile between
duplicated genes is dependent on the nature of
the mutation that gave rise to the duplication
38Thanks!
- UNC Chapel Hill
- Jianhua Hu
- University of Delaware
- Blake Meyers
-
- NSF Plant Genome Research Program
- DBI-01103267 (TJV)
- DBI-0110528 (BCM)
39(No Transcript)