Expressional and - PowerPoint PPT Presentation

1 / 82
About This Presentation
Title:

Expressional and

Description:

Structural proteins: Amelogenin, karetin, skeletal muscle protein, etc. ... Relationship between protein distance and fitness effect of deletion ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 83
Provided by: wenhsi
Category:

less

Transcript and Presenter's Notes

Title: Expressional and


1
Expressional and Functional Divergences
between Duplicate Genes
Wen-Hsiung Li University of Chicago
2
Why study duplicate genes?
  • Gene duplication is the primary source of genetic
    novelties (Ohno 1970).
  • Genomic (including functional and proteomic) data
    provide excellent materials for studying the
    mode, tempo, and mechanisms of the evolution of
    duplicate genes.

3
Origin of trichromacy Traditional View
4
Evolutionary Fates of Duplicate Genes (1)
In the vast majority of cases the extra copy
resulted from a gene duplication will become
nonfunctional (a pseudogene) because deleterious
mutation occurs much more often than advantageous
mutation.
5
Evolutionary Fates of Duplicate Genes (2)
One way for both copies to be retained in the
genome is to diverge in function. The first
step for divergence in function is commonly
believed to be divergence in expression.
6
Divergence in Expression between Duplicate Genes
at the Genomic Level Trends in Genetics, 2002
Zhenglong Gu, Dan Nicolae, Henry Lu and
Wen-Hsiung Li
7
Markert, Clement L. (1964) Isozymes
Enzymes from duplicate genes Differences in
expression among tissues. Protein
electrophoresis.
8
S. Ohno (1970) proposed Expression
divergence A major mechanism for retaining
duplicate genes in a genome. The first step
in functional divergence. But how often and how
fast do duplicate genes diverge in expression?
9
Past studies Limited number of gene
families, providing no general picture of the
tempo of expression divergence between duplicate
genes in a genome. Microarray gene
expression technology and complete genome
sequencing a general picture The Yeast
Genome
10
Similarity between expression patterns of two
genes
R the correlation coefficient of the
expression levels of the two genes over different
time points of an experiment (a physiological
process)
11
Wagner, 2000
12
Wagner (2000) PNAS Protein sequence
divergence and expression divergence decoupled
It does not imply that expression divergence
and evolutionary time are decoupled because
protein distance may not be a good proxy of
divergence time.
13
Although a protein may evolve at an
approximately constant rate over time, the rate
of amino acid substitution varies tremendously
among proteins, so that a single distance cannot
be applied to date the divergence times of
different protein (or gene) pairs.
14
In comparison, the rate of synonymous
substitution is more uniform among genes and so
synonymous distance (KS) would be a better proxy
of divergence time. We therefore rely more on KS
than on protein distance or KA (non-synonymous
distance).
15
Detection of Duplicate Genes
Gu et al. , MBE 2001 Two proteins belong to the
same family (1) if their similarity (including
gaps) is gt 30, and (2) if the total length of
the alignable regions is gt 80 of the longer
protein.
16
Selection of Duplicate Genes (1)
To avoid using correlated data points, we select
independent pairs of duplicate genes in the yeast
genome. For each gene family our selection
proceeds with increasing KS, because gene pairs
with a small KS are fewer than those with a large
KS and can more accurately reflect the time
course of expression divergence.
17
Selection of Duplicate Genes (2)
We require that both duplicate genes do not show
strong codon usage bias, which can retard the
increase of KS so as to make KS a poor proxy of
divergence time.
18
Linear regression analysis
Since R is bounded by 1 and 1, the
transformation ln((1R)/(1-R)) was used. The
normal linear regression was then carried out
between KS (KA) and the transformed R .
19
Data cDNA microarray expression data 208 points
20
R -0.47, P 3.19e-5
21
R 0.02, P 0.78
R -0.52, P 5.45e-12
22
Data Affymetrix data 79 points
23
R -0.42, P 0.005
24
R -0.42, P 3.17e-6
R -0.07, P 0.37
25
Conclusion
A significant negative correlation (-47,
P lt 2 10-5) between R and KS. So,
expression divergence increases with KS and
evolutionary time. Expression divergence and KA
are initially coupled to some extent.
26
In the above analysis all experiments were
considered together, that is, the correlation
coefficient R was calculated over all data
points. This pooling of data may obscure the
relationship between expression divergence and
sequence divergence because a pair of duplicate
genes may be involved in only some but not all of
the physiological processes tested.
27
Note that if a gene pair is not involved in a
process, it is unlikely to evolve expression
divergence in that process. We now consider R
separately for each of the 14 independent tests
that we can obtain from current data.
28
Definition of divergent
expression Two duplicate genes are said to have
diverged in expression if n or more negative Rs
in the 14 processes used are observed. We
considered n 1 and 2.
29
A sliding window analysis was used when the 14
processes used were treated separately. For the
gene pairs within the KS (0.25) or KA (0.05)
window of each studied duplicate gene pair, the
proportion of gene pairs with divergent
expression is calculated.
30
a
b
Figure 2
31
Figure 2a Over 60 of the pairs studied show
divergent expression even when KS is 0.10.
The proportion of divergent expression increases
with KS and becomes almost 1 when KS increases to
1. Even if we define divergent expression as
having 2 Rs in the 14 tests, still over 50 of
the duplicate pairs meet this definition when KS
is 0.10.
32
Clearly, expression divergence has occurred
rather quickly in many of the gene pairs studied.
This is also seen in Fig. 2b, where the
proportion of pairs with diverged expression
increases rapidly with KA and reaches a plateau
when KA is 0.15.
33
Expression divergence Two duplicate genes
have diverged in expression, if the correlation
coefficient (?) of their expression levels over
time points is 0.5 or smaller.
34
Data cDNA microarray expression data 208 points
35
Test procedure We consider 9 processes. For
each process we compute the correlation
coefficient (R) of the expression levels over
time points. Consider the two smallest Rs. We
require that the probability of observing the two
smallest Rs among the 9 processes is lt 0.05.
36
For each of the 9 processes with 8 or more data
points available, the correlation coefficient of
gene expression between duplicate genes was
calculated.
37
Non-parametric bootstrapping Good for a single
process (experiment) But difficult for more than
one process. Parametric bootstrapping
38
For each process, bootstrap a sample with n
pseudo-data points Z zi i1, , n from a
bivariate normal distribution with means and
covariance matrix
Compute R, the correlation coefficient from the
bootstrap sample Z
39
Repeating the pseudosampling procedure B times,
we observe R1, , RB. The empirical
distribution of R1, , RB is used to
approximate the distribution of R. In
particular,
I? an indicator function whose value is 1 when
the event is true and 0 otherwise.
40
Suppose that m processes are studied and there
are nj pairs of observations for process j, j
1, , m. From the above approximation, we can
evaluate the probability of
41
Then, we can find out the probability that the
two smallest Rs are smaller than c1 and c2,
respectively, with c2 lt c1
42
(No Transcript)
43
(No Transcript)
44
Conclusions 1. Expression divergence between
duplicate genes is significantly correlated with
their synonymous divergence (KS) 2. Expression
divergence and KA are initially coupled
45
3. A large proportion of duplicate genes have
diverged quickly in expression and the vast
majority of gene pairs eventually become
divergent in expression.
46
Divergence in the Spatial Pattern of Gene
Expression between Human Duplicate Genes Genome
Research, 2003
Kateryna Makova and Wen-Hsiung Li
47
Expression Data
  • The expression data for 25 human tissues were
    retrieved from Su et al. (2002, PNAS). Expression
    values were averaged among replicates.

48
Advantages of human data over yeast data
  • 1. Affymetrix oligonucleotide array data instead
    of cDNA array data a lower chance of
    cross-hybridization

2. Multiple tissues spatial expression
divergence vs. temporal divergence
3. Better definition of divergence
4. A larger data set (1230 pairs of duplicate
genes vs. 400 pairs)
49
Definition of Expression of a gene in a tissue
  • Expressed in a tissue If the average difference
    (AD) is gt 200 this corresponds to 3 to 5 copies
    of mRNA per cell.
  • Not expressed If AD lt 100.
  • Marginally expressed
  • If 100 lt AD lt 200.

50
Definition of Expression Divergence in a tissue
  • Two duplicate genes are said to have
    diverged in gene expression in a tissue, if one
    is expressed in the tissue while the other is
    not.

51
Definition is Conservative
  • It neglects the case where both genes are
    expressed in the same tissue but at different
    levels and the case where one is expressed (or
    not expressed) while the other is marginally
    expressed.

52
Definition of Expression Divergence
  • Two duplicate genes are said to have
    diverged in gene expression If they show
    diverged expression
  • (1) in at least one tissue
  • (2) in at least two tissues

53
Proportion of gene pairs with diverged expression
vs. Synonymous divergence
54
Rapid divergence
  • 73 of the gene pairs with an average Ks of only
    0.06 already have diverged in one of the 25
    tissues studied, and 57 of these genes have
    diverged in expression in at least two tissues.
  • These percentages increase to 90 and 73,
    respectively, when Ks is 1.2.

55
Proportion of gene pairs with diverged expression
vs. Nonsynonymous divergence
56
Rapid divergence
  • For Ka 0.04, 78 of the gene pairs have
    diverged in expression in at least one tissue and
    60 have diverged in at least two tissues.
  • For Ka 0.21, 98 of the gene pairs have
    diverged in expression in at least one tissue and
    88 have diverged in at least two tissues.

57
KS and the correlation coefficient of gene
expression (both genes are expressed in at least
five tissues)
58
KA and the correlation coefficient of gene
expression (both genes are expressed in at least
five tissues)
59
Conclusions
  • Human duplicate genes diverge rapidly in
    expression among tissues.
  • The results support the conclusion in yeast. In
    fact, in terms of generation time human duplicate
    genes seem to diverge in expression faster than
    yeast duplicate genes.

60
Gene Pairs with Rapid Divergence
  • Ks lt 0.3
  • Diverged in expression in at least 50 of the
    tissues studied.
  • Or R lt 0.5.
  • The genes in the two groups largely overlap.

61
Functions of Gene Pairs with Rapid Divergence
  • Enzymes Oxidoreductases, hydrolases,
    transferases, and an isomerase
  • Immune system Lymphocyte antigens, cytokine
    gro-beta, MHC proteins, and immunoglobulins.
  • Transcription factors
  • Structural proteins Amelogenin, karetin,
    skeletal muscle protein, etc.

62
Functions of Gene Pairs with Rapid Divergence
  • A significantly higher proportion of immune
    response genes among gene pairs with rapid
    expression divergence in comparison with other
    gene pairs in our study P lt 0.009 for gene pairs
    with KS lt 0.5 and diverged expression in at least
    50 of studied tissues P lt 0.001 for gene pairs
    with KS lt 0.5 and R lt 0.5.

63
Role of Duplicate Genes in Genetic Robustness
against Loss-of-Function Mutations
Nature, January 2003
Z. Gu and Wen-Hsiung Li Ecology
Evolution University of Chicago Lars Steinmates
and Ron Davis Stanford University
64
How does an organism compensate for null
mutations?
1. Duplicate genes Deletion of a gene is
compensated by another member of the same gene
family. 2. Stability of genetic
networks Alternative metabolic pathways or
regulatory gene networks (unrelated
genes) Which is more important?
65
Data we used
Gene deletion and parallel analysis of 6,000
genes in the yeast genome 1. Delete one gene 2.
Measure the relative growth rate (fi) of the
mutant to a reference population (the
growth rate of the pooled mutants) in 5
different media conditions.
66
Data
Singletons 1,275 genes Does not hit any other
genes in FASTA search with E value 0.1. Selected
genes that had been studied Duplicates 1,147
genes As defined in Gu et al. (2002) Real genes
avoid pseudogene
67
Classification of fitness effects

Weak or no effect fmin gt 0.95 Moderate effect
0.8 lt fmin lt 0.95 Strong effect 0 lt fminlt
0.8 Lethal fmin 0
68
Discrete distributions of fitness effect for
duplicate genes and singletons
69
Cumulative distributions of fitness effect for
duplicate gene and singleton under the YPD
growth condition
70
Conclusion 1 Singleton and duplicate genes
differ significantly in the distribution of
growth rate effects of gene deletion
71
Hypothesis
Duplicate genes have more similar fitness effects
than singletons
72
Dij difference in fitness effect between genes
i and j. Compare the mean Dij for duplicate
genes and the dist. of Dij for randomly selected
100,000 sets of singleton pairs
73
  • Hypothesis Genes with closer homologs should be
    compensated more often
  • Divide duplicate genes into different groups
    using the KA value of each duplicate gene to its
    most similar homolog in the genome.
  • 2. Calculate the distribution of fitness effect
    in each KA interval.

74
Relationship between protein distance and fitness
effect of deletion
75
Does the deletion of a
duplicate with a higher expression level have a
more severe fitness effect than the deletion of
the other copy?
76
For duplicate gene pairs with different fitness
effect (2(F1-F2)/(F1F2) gt0.05), the gene with
higher level of expression has a stronger fitness
of gene deletion
77
Relative contribution of duplicate genes to
genetic robustness Lower bound (23 ) The
extra proportion of duplicate genes with weak or
no effects compared to that for singletons is due
to genetic redundancy. 284 genes are compensated
due to gene duplication 1,147 duplicates ?
(64.3 for duplicates 39.5 for
singletons) Altogether 1,241 genes are
compensated 1,147 duplicates ? 64.3 1,275
singletons ? 39.5
78
Discrete distributions of fitness effect for
duplicate genes and singletons
79
Upper bound (59 ) All the duplicate genes in
the class of weak or no effect are due to genetic
redundancy. 738 duplicate genes (1,147
duplicates ? 64.3) and 503 singleton genes
(1,275 singletons ? 39.5) show weak or no effect
after deletion 738/(738 503) 59
80
Conclusions
  • Duplicate genes contribute at least 25 to the
    genetic robustness against null mutations in the
    yeast genome
  • 2. Duplicate genes have more similar fitness
    effects of gene deletion than singletons

81
Conclusions
3. Duplicate genes with closer homologs have a
higher probability to be compensated 4. The
duplicate copy with a higher expression level has
a stronger fitness effect of deletion
82
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com