Formation of novel protein-coding genes - PowerPoint PPT Presentation

About This Presentation
Title:

Formation of novel protein-coding genes

Description:

... signal and 3' UTR) and de novo amplification of a 9bp Thr-Ala-Ala motif. Arctic cod also have a Thr-Ala-Ala tripeptide repeat-based AFGP but this has no ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 21
Provided by: jimpr8
Category:

less

Transcript and Presenter's Notes

Title: Formation of novel protein-coding genes


1
Formation of novel protein-coding genes
  • Level 3 Molecular Evolution and Bioinformatics
  • Jim Provan

Patthy Chapter 6
2
De-novo formation of novel protein-coding genes
  • Creation of simple structural elements such as
    a-helices, b-sheets and reverse turns seems to be
    rather trivial
  • So many alternative ways of forming these
    structures
  • Have been invented independently several times
  • Proteins with repetitive structure are most
    likely to arise de novo
  • Repetitive oligonucleotide sequences can expand,
    forming periodic protein structures
  • Collagen-like
  • Leucine-rich repeat (LRR)
  • Probably arose several times during evolution

3
Evolution of serum antifreeze glycoproteins
  • Fish that live in polar waters have serum
    antifreeze glycoproteins (AFGPs) which allow them
    to tolerate temperatures of as low as 1.9C
  • It has been shown that fish from the north and
    south poles have evolved very similar AFGPs
    independently
  • AFGP of Antarctic fish, made up of a simple
    tripeptide repeat evolved by recruitment of the
    5 and 3 ends of an ancestral trypsinogen gene
    (secretory signal and 3 UTR) and de novo
    amplification of a 9bp Thr-Ala-Ala motif
  • Arctic cod also have a Thr-Ala-Ala tripeptide
    repeat-based AFGP but this has no relationship
    with the trypsinogen gene
  • Threonines are O-linked to galactosyl-N-acetylgala
    ctosamine and periodicity of repeats matches
    periodicity of water molecules
  • Convergent evolution of the tripeptide-based AFGP

4
De novo creation of complex proteins
  • Probability of de novo creation of more complex,
    globular proteins is inversely proportional to
    complexity
  • Those that consist of a single supersecondary
    structure element (TIM barrel proteins) have
    higher probability of independent creation
  • TIM barrel structure likely to have evolved
    several times
  • Easier to remodel replicas of old protein folds
    than to invent them from scratch
  • Creation of first folded proteins was probably
    the rate-limiting step in protein-based life
  • All extant proteins probably arose from a limited
    number of ancestral folds through divergence

5
Evolutionary convergence
  • Previous examples highlight convergence to
    similar primary, secondary or tertiary structure
    (structural convergence)
  • Unlike structural convergence, functional
    convergence and mechanistic convergence are
    relatively common
  • Several types of proteinases that have similar
    function (i.e. they cleave proteins) but have
    different structures and catalytic mechanisms and
    have evolved independently
  • Example of mechanistic convergence is the serine
    proteases of the subtilisin and trypsin families
  • Similar active sites and catalytic mechanisms but
    no sequence or conformational homology
  • Catalytic triad residues (His, Asp, Ser) occur in
    different order in primary structures
  • Difficult to prove structural convergence

6
Gene duplications
  • Evolutionary significance of gene duplication is
    that it gives rise to a redundant duplication of
    a gene
  • Duplicated gene may acquire divergent mutations
    and eventually emerge as a new gene
  • Gene duplication is the predominant and most
    important mechanism by which new genes arise
  • Genes derived by a duplication event are said to
    be paralogous and are found in different loci of
    the chromosome
  • Different from orthologous genes gained by
    speciation events, which are found in different
    loci of the corresponding species

7
Types of DNA duplications
  • An increase in the number of copies of a DNA
    segment can be brought about by several types of
    DNA duplication
  • Partial, intragenic or internal gene duplication
    only an internal segment of a protein-coding
    gene is duplicated
  • Complete gene duplication, including flanking
    regions necessary for expression
  • Partial chromosome duplication several adjacent
    genes are duplicated
  • Chromosomal duplication (aneuploidy)
  • Genome duplication (polyploidy)

8
Mechanisms of gene duplication
  • Major mechanisms for short intragenic
    duplications is disengagement of the DNA
    polymerase from the strand that is being copied
    and reattachment at the wrong point (slipped
    strand mispairing)
  • Major mechanism for larger duplications involves
    unequal crossing over
  • Involves mistaken pairing and recombination
    between homologous chromosomes
  • Most likely in already-duplicated regions
  • Allows rapid expansion of repeats within genes
    and expansion of gene families
  • May facilitate homogenisation of gene sequences
    and thus slow down divergence (concerted
    evolution)

9
Unequal crossing-over
10
Gene duplications in lysozyme
  • In ruminants, lysozyme gene has been duplicated
    10 times and is expressed less in
    extra-intestinal tissues
  • In mice, intestinal lysozyme is expressed from
    lysP gene, whereas in other tissues it is encoded
    by the lysM gene
  • Original gene duplication through unequal
    crossing-over in Alu-like B2 middle repetitive
    elements

11
Retrosequences
  • Copies of protein-coding genes may be produced by
    duplicative transposition
  • DNA is transcribed into RNA, which is
    reverse-transcribed into a cDNA (retroposition)
  • During re-insertion, small segments of host DNA
    (4-12bp) are duplicated, forming direct repeats
  • Significant diagnostic features of
    retrosequences
  • Lack introns (where parent gene would have
    introns)
  • Lack upstream promoter elements of parent gene
  • Contain poly(A) stretches at 3 end
  • Flanked by short, direct repeats
  • Different chromosomal location from original gene

12
Functionality of retrosequences
  • Depending on whether the copied gene is
    functional or not, we can distinguish processed
    genes (retrogenes) and processed pseudogenes
    (retropseudogenes)
  • Several reasons why functional retrogenes are
    unlikely
  • Process of reverse-transcription is very
    inaccurate
  • Lacks necessary regulatory elements
  • Generally truncated at 5 end (reverse
    transcriptase failure)
  • May be inserted in genomic region unsuitable for
    expression
  • More likely to form retropseudogenes
  • Some examples of processed functional genes have
    been found e.g. human phosphoglycerate kinase
  • X-linked gene has 11 exons and 10 introns
  • Autosomal PGK gene has no introns and a poly(A)
    tail

13
Alu elements
  • Processed pseudogenes of the RNA gene specifying
    7SL RNA which cuts signal sequences of secreted
    proteins
  • About 300 bp long
  • Around 500,000 copies in the human genome (5-6)
  • Named after characteristic AluI restriction site
  • Derived from functional 7SL sequence by
    duplication, two deletions and many mutations
  • Play a key role in genome plasticity since they
    facilitate unequal crossing-over
  • Gene duplication
  • Exon shuffling

14
Fate of duplicated genes
  • Determined by functional consequences of having
    extra copies of same gene and increased amounts
    of protein
  • Duplications can be advantageous, deleterious or
    neutral
  • If an organism is exposed to a toxic environment,
    there may be an advantage in overproduction of
    detoxifying enzymes
  • Disadvantage will result of overproduction of
    protein upsets regulatory balance
  • Most duplications are neutral fate determined
    by selection and drift
  • Duplicated gene is unlikely to be fixed unless it
    acquires a novel and useful function
  • May specialise in different subfunctions of
    ancestral gene
  • May acquire drastically different functions
    (hepatocyte growth factor vs. plasminogen)

15
Formation of gene families
  • Recently duplicated gene families are generally
    found in close proximity on the same chromosome
  • Some multigene families contain invariant
    repeated genes
  • Common when large quantities of protein product
    are required
  • Histones have to be synthesised at a high rate
    during a well-defined, short period of cell
    division
  • Some members of multigene families serve the same
    function but differ in tissue specificity,
    developmental regulation or biochemical
    properties e.g. isozymes

16
Concerted evolution in multigene families
  • Paralogous members of multigene families are very
    similar to each other within one species although
    orthologous members of the same family may differ
    greatly between even closely related species
  • Suggests that mechanisms exist which cause gene
    families to evolve together as a unit (concerted
    evolution)
  • Process of concerted evolution of multigene
    families under the effects of random genetic
    drift is known as molecular drive
  • Gene correction mechanisms may homogenise genes
    difficult to trace true evolutionary history of
    many multigene families

17
Dating gene duplications
  • Assuming duplicated genes diverge at a constant
    rate, we can estimate the date of a gene
    duplication, TD, that gave rise to two paralogous
    genes (A and B) if we have sequences of these
    paralogues from two different species (1 and 2)
    and we know the time of speciation TS
  • If genes evolved at a constant rate then
  • Average number of substitutions per site (KA
    KB/2) in the two orthologue comparisons (A1 vs.
    A2, B1 vs. B2) is proportional to TS
  • Average number of substitutions per site KAB in
    the four paralogous comparisons (A1 vs. B1, A2
    vs. B2, A1 vs. B2, A2 vs. B1) is proportional to
    the time since duplication TD
  • Thus, the following equation holds
  • TD/TS 2KAB/KA KB

18
Dating gene duplications (continued)
  • All vertebrates have both myoglobin and
    haemoglobin
  • Myoglobin differs from both the a and b subunits
    of haemoglobin more than they differ from each
    other
  • Myoglobin diverged (TD 600-800 mya) before the
    a and b genes arose (TD 500 mya)
  • Mammals, reptiles, birds, amphibians and bony
    fish all have distinct a and b subunits, whereas
    the most primitive vertebrates, the Agnatha
    (jawless fish), contain only one type of
    haemoglobin subunit
  • Myoglobin and haemoglobin diverged prior to the
    separation of agnathans and jawed vertebrates
  • Duplication giving rise to a and b subunits
    occurred in the ancestor of all jawed vertebrates
    following its divergence from agnathans

19
Evolutionary history and linkage patterns in a-
and b-globin clusters
  • In humans, gene cluster of the a-globin family
    (c16) consists of four functional genes (z, a1,
    a2, q1) and three unprocessed pseudogenes (Yz,
    Ya1, Ya2)
  • Embryonic type z is most divergent (estimated TD
    gt 300 mya)
  • q1 is less divergent (estimated TD 260 mya)
  • Genes a1 and a2 produce identical polypeptide and
    have near-identical nucleotide sequence,
    suggesting recent divergence
  • b-globin family (c11) contains five functional
    genes and Yb
  • Adult types (b and d) diverged from non-adult
    types (Gg, Ag and e) around 155-200 mya
  • Ancestor of both g genes diverged from e about
    100-140 mya
  • Duplication that formed Gg and Ag occurred after
    separation of human lineage from New World
    monkeys (35 mya)
  • Divergence of adult genes (b and d) occurred
    about 80 mya

20
Intergene distance and time since duplication
Write a Comment
User Comments (0)
About PowerShow.com