Comparative Genomics and Evolution - PowerPoint PPT Presentation

1 / 82
About This Presentation
Title:

Comparative Genomics and Evolution

Description:

McLean, C., and Bejerano, G., Dispensability of Mammalian DNA. ... (quotes from 'Ultraconserved elements in the human genome' by Bejerano et al. ... – PowerPoint PPT presentation

Number of Views:243
Avg rating:3.0/5.0
Slides: 83
Provided by: Lapt339
Category:

less

Transcript and Presenter's Notes

Title: Comparative Genomics and Evolution


1
  • Comparative Genomics and Evolution

Pollard, K.S., et al., Forces Shaping the Fastest
Evolving Regions in the Human Genome. PLoS
Genetics 2(10), 2006.
McLean, C., and Bejerano, G., Dispensability of
Mammalian DNA. Genome Research 18, 1743-1751
(2008).
Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA. Genome Research
18, 1743-1751 (2008).
Image source http//mbbnet.umn.edu
2
Forces shaping the fastest evolving regions in
the human genome by Katherine S. Pollard et al.
3
  • Whats the difference?

Image sources http//pro.corbis.com,
http//www.science.psu.edu
4
  • Whats the difference?
  • Humans have higher brainpower
  • Examples creativity, problem solving, language
  • What part of the genome is the cause?

Image source http//www.spaceflight.esa.int
5
  • Whats the difference?
  • Human and chimpanzee DNA is 98 similar
  • The 2 difference is 29 million bases (mostly in
    non-coding DNA)

Image source http//en.wikipedia.org
6
  • Comparative Genomics
  • Human and rodent genomes are often compared to
    identify conserved (presumably functional)
    elements.
  • Humans and chimpanzees are compared to
    understand what is uniquely human about our
    genome.

Image source http//genome.ucsc.edu
7
  • Comparative Genomics
  • Look at HARs in human genome
  • HAR - human accelerated region. High rate of
    nucleotide substitution in humans, low in other
    vertebrates.
  • Fastest is HAR1 novel RNA gene expressed in
    development of neocortex (language, conscious
    thought).

8
  • HARs
  • 100 bp, mostly non-coding
  • Function is likely to be gene regulation.
  • Seem to have been under strong negative
    selection up to common ancestor of chimp and
    human.
  • Rapid positive selection then started in humans
    only.

Image source http//www.shutterstock.com
9
  • Finding HARs

Branch lengths given in substitutions per base,
or in millions of years
Evolution of vertebrates
  • Evolutionary tree based on the comparison of
    conserved regions in whole-genome alignments
    between species.

Image from Pollard, K.S., et al., Forces Shaping
the Fastest Evolving Regions of the Human Genome.
10
  • Finding HARs
  • Find HARs by using LRT, the likelihood ratio
    test.
  • In statistical hypothesis testing, the
    likelihood ratio (?) is the ratio of the maximum
    probability of a result under a null hypothesis
    and alternative hypothesis.
  • The LRT decides between the two hypothesis
    based on the value of the likelihood ratio.

11
  • Finding HARs
  • Two models were used for genomic LRT.
  • Model 1 human substitution rate is held
    proportional to the other substitution rates in
    the evolutionary tree.
  • Model 2 human substitution rate can be
    accelerated relative to the rates in the rest of
    the tree.

12
  • Finding HARs

. . .
Human
. . .
Another vertebrate
.
.
.
.
.
.
.
.
.
All the conserved alignments
13
  • Finding HARs

Model 1
. . .
Human
. . .
Another vertebrate
.
.
.
.
.
.
.
.
.
Determine 1st set of rates
Determine 2nd set of rates
Determine 3rd set of rates
Scale all by the same amount
14
  • Finding HARs

Model 2
. . .
Human
. . .
Another vertebrate
.
.
.
.
.
.
.
.
.
Scale all by the same amount
Scale the human rates separately
15
Identify regions conserved between human and
other vertebrates (34,498 of them)
16
Identify regions conserved between human and
other vertebrates (34,498 of them)
For all regions, fit model 1 and determine the
proportional rates that maximize the likelihood
of the tree
Obtain P1
(max probability 1)
17
Identify regions conserved between human and
other vertebrates (34,498 of them)
For all regions, fit model 1 and determine the
proportional rates that maximize the likelihood
of the tree
Obtain P1
(max probability 1)
Loop over all conserved regions. For each region,
do
18
Identify regions conserved between human and
other vertebrates (34,498 of them)
For all regions, fit model 1 and determine the
proportional rates that maximize the likelihood
of the tree
Obtain P1
(max probability 1)
Loop over all conserved regions. For each region,
do
Calculate LRT for the region as ? log(P2 / P1)
Fit model 2 to the region in human, find
acceleration for that region that maximizes the
likelihood of the tree
Obtain P2
(max probability 2)
19
  • Finding HARs
  • Big LRT value indicates an HAR. How big is big?
  • Do 1 million simulations of the 34,498 conserved
    alignments.
  • To create each simulation, use the model 1
    proportional rates.
  • Repeat the LRT calculation for each simulation.
  • Then for each region, find proportion of
    simulated LRTs that are bigger than its original
    LRT.
  • That proportion is a p-value that tells if the
    region is an HAR.

20
  • Finding HARs
  • Note on methods vertebrates that were used in
    selecting the conserved regions (chimp, macaque,
    mouse, rat, rabbit) were omitted from any LRT
    analysis.
  • This ensured that the LRT test is independent of
    the method used to select the conserved regions.

21
  • Finding HARs
  • Result 202 HARs were found in the human genome.

Image source http//www.3dscience.com
22
  • Results for Conserved Elements
  • 80.4 of the 34,498 conserved regions are
    non-coding.
  • 45.4 of non-coding regions are intronic, 31
    are intergenic,
  • Non-coding regions are enriched for
    transcription factors, DNA-binding proteins,
    regulators of nucleic acid metabolism

23
  • Results for HARs
  • 202 HARs have p lt 0.1, 49 of them have p lt 0.05
  • HAR1 through HAR5 have p lt 4.5e-4, very
    accelerated
  • Most HARs are non-coding
  • 66.3 are intergenic, 31.7 are intronic, only
    1.5 are coding
  • Results support the hypothesis (King and Wilson)
    that most chimp-human differences are regulatory.

24
  • Results Confirming Accelerated Selection in HARs

Negative selection
Positive selection
  • Are the HARs just due to relaxation of negative
    selection?
  • No. Compare to neutral rate for 4D sites to see.

Image source http//cs273a.stanford.edu
Bejerano Aut 08/09
25
Genome-wide neutral rate for 4D sites in human
and chimp in chromosome end bands
Genome-wide neutral rate for 4D sites in human
and chimp
The chimp rates in all five elements fall well
below the human rates, which exceed the
background rates by as much as an order of
magnitude. H, human C, chimp.
Image from K.S. Pollard et al., Forces Shaping
the Fastest Evolving Regions of the Human Genome.
26
  • Results W ? S Bias in HARs

AT ? GC substitution bias in HARs
HAR1 HAR5
AT ? GC
HAR6 HAR49
GC ? AT
HAR50 HAR202
Rest of 34000 conserved elements
  • Dramatic AT ? GC bias was observed in HARs.

Image from Pollard, K.S., et al., Forces Shaping
the Fastest Evolving Regions of the Human Genome.
27
  • Results W ? S Bias in HARs
  • Top 49 HARs are 2.7 times as likely to be
    located near final chromosomal bands as the other
    conserved elements
  • Interestingly, HAR1 and HAR5 are also in end
    regions in other mammals, but are not accelerated.

Image source http//www.intelihealth.com
28
  • Results W ? S Bias in HARs
  • HARs tend to be located in regions of high
    recombination in humans.
  • All of this evidence points to biased gene
    conversion (BGC) as the driving force behind HARs.

29
  • Genetic Recombination
  • Paired chromosomes can exchange homologous
    pieces
  • Typically occurs during meiosis

30
Meiosis
diploid germ cell
paternal chromosome A
maternal chromosome A
31
Meiosis
diploid germ cell
paternal chromosome A
maternal chromosome A
DNA replication
centromere
sister chromatids
32
Meiosis
diploid germ cell
paternal chromosome A
maternal chromosome A
DNA replication
centromere
sister chromatids
Recombination
33
Meiosis
diploid germ cell
paternal chromosome A
maternal chromosome A
DNA replication
centromere
sister chromatids
Recombination
Segregation
34
Meiosis
diploid germ cell
paternal chromosome A
maternal chromosome A
DNA replication
centromere
sister chromatids
Recombination
Segregation
haploid gametes
35
Recombination hotspot
Recombination
36
  • Genetic Recombination

duplex 1
duplex 2
Formation of Holliday Junction intermediate
Horizontal resolution with gene conversion
Vertical resolution with crossover
Mismatch repair
or
Image source http//www.sanger.ac.uk
37
  • Genetic Recombination
  • Chromosomal Crossover

Homologous chromosomes
Recombinant chromatids
  • Chromosomal crossover results in exchange of DNA
    pieces

Image source http//www.emc.maricopa.edu
38
  • Genetic Recombination
  • Gene Conversion

Mismatch repair causes DNA to revert back to its
original form
Recombinant chromatids
  • Gene conversion results in nonreciprocal
    transfer of DNA

Image source http//www.emc.maricopa.edu
39
  • Genetic Recombination
  • Gene Conversion

haploid gametes
  • The result is a nonstandard ratio of alleles,
    such as 31
  • This causes homogenization of a species gene
    pool

Image source http//www.emc.maricopa.edu
40
  • Biased Gene Conversion

A - T is a weak pairing
G - C is a strong pairing
  • DNA repair machinery likes to replace weak
    pairings with strong pairings during gene
    conversion.

Image source http//commons.wikimedia.org
41
Biased Gene Conversion
Recombinant chromatids
A T replaced by G C during mismatch repair
  • Biased gene conversion results in G C
    enrichment of a species gene pool (in addition
    to causing homogenization)

42
  • HARs and Recombination Hotspots
  • HARs tend to be located near recombination
    hotspots in humans

43
  • Recombination Hotspots
  • Mysterious
  • Extremely different between chimps and humans
    (change rapidly during evolution)
  • Not caused by the local DNA sequence (it is the
    same in human and chimp)

44
  • Some
  • HARs

Recombination hotspots
?
45
  • Possible Conclusion
  • Recombination-caused BGC (often seen negatively)
    played a big role in the development of our
    species.

46
  • Alternative Explanation

HAR
HAR
Isochore
  • Isochore DNA region (100 kb) with high gene
    concentration
  • Isochores are stabilized by many strong (GC)
    pairings

47
  • Alternative Explanation
  • Theory (Bernardi et al.) that weakly deleterious
    changes drive isochore to a critical point of
    destabilization
  • At critical point, GC content cannot decrease
    otherwise isochore becomes unstable
  • AT ? GC substitution in the isochore suddenly
    gains selective advantage and sweeps through the
    population

48
  • Alternative Explanation
  • Isochore selective sweep theory vs. the BGC
    theory.
  • Isochore sweep has a different DNA signature
    than BGC

Isochore selective sweep
GC
GC
GC
GC
GC
GC
GC
100 kb
Biased gene conversion
GC
GC
GC
GC
GC
GC
GC
100 bases
49
  • Alternative Explanation
  • Evidence so far favors the BGC explanation for
    HARs
  • However, the results are not yet conclusive

50
Dispensability of Mammalian DNA by Gill
Bejerano and Cory McLean
51
  • Are mammalian CNEs dispensable?
  • CNE conserved non-exonic element
  • Examples cis-regulatory DNA, ultraconserved DNA

?
Image source http//apps.co.marion.or.us
52
  • Cis-regulatory DNA elements

promoter or inhibitor
Image source http//cnx.org
53
  • Cis-regulatory DNA elements

Image source http//cnx.org
54
  • Ultraconserved elements
  • 200 bp and up, many seem to be regulatory
  • 100 identity with no insertions or deletions
    between orthologous regions of the human, rat,
    and mouse genomes.
  • Nearly all of these segments are also conserved
    in the chicken and dog genomes, with an average
    of 95 and 99 identity, respectively. Many are
    also significantly conserved in fish.
  • (quotes from Ultraconserved elements in the
    human genome by Bejerano et al.)

55
  • Are mammalian CNEs dispensable?
  • About 20 of gene knockout experiments,
    including cis-regulatory and ultraconserved
    knockouts, produce no phenotype measurable in lab
    settings.

Image source http//www.sciencedaily.com
56
  • Are mammalian CNEs dispensable?

Do CNEs have functional redundancy?
OR
Are CNEs indispensable, but in a way that cannot
be observed in the lab?
  • Approach look at CNEs lost in rodents due to
    evolution

57
  • Finding CNEs lost by rodents

Computational Pipeline
Identify conserved mammalian sequences
Pick out the ones absent in rodents
Remove artifacts due to assembly, alignment,
structural RNA migration
58
Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
59
Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
60
To avoid assembly artifacts
Use UCSC chains and nets
Ignore multi-level nets
Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
61
Identify lost DNA
Validate quality of results
Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
62
  • Identifying DNA lost by rodents

Different bases between primates and dog
primates
primates
A
dog
dog
G
rodents
Look at the aligned orthologous sequences in
primates (human, macaque), dog, and rodents
(mouse, rat).
63
  • Identifying DNA lost by rodents

100 bp window
primates
primates
A
dog
dog
G
rodents
Compute primate-dog id (percentage of identical
alignment columns)
64
  • Identifying DNA lost by rodents

primates
primates
A
dog
dog
G
rodents
Compute primate-dog id
65
  • Identifying DNA lost by rodents

primates
primates
A
dog
dog
G
rodents
!
Compute primate-dog id
Deletion in rodents
66
  • Identifying DNA lost by rodents

primates
primates
A
dog
dog
G
rodents
Ultraconserved-like element between primates-dog
67
  • Identifying DNA lost by rodents

primates
primates
A
dog
dog
G
rodents
!
Ultraconserved-like element that was lost in
rodents
68
  • Results for non-exonic ultras
  • 1,691,090 bp of ultraconserved-like sequences
    were found
  • 1147 bp of these sequences were lost in rodents
  • Thus only 0.086 of ultras is lost in rodents
  • In comparison, ¼ of neutrally-evolving DNA
    (50id 65id) is lost in rodents
  • Thus ultraconserved-like sequences are 300 times
    more indispensable than neutrally-evolving DNA

69
  • Results for neutral DNA
  • Expected uniform rate of lost neutrally-evolving
    DNA
  • Observed that less conserved sequences are more
    retained

Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
70
  • Results for neutral DNA
  • Phenomenon due to poorly conserved sequences
    being adjacent to exons, and thus shielded from
    being lost
  • Larger deletions are biased away from gene
    structures

Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
71
  • Separating DNA under selection from neutral DNA
  • Moving away from 100id, there is a mixing of
    DNA under purifying selection and neutrally
    evolving DNA

Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
72
  • Separating DNA under selection from neutral DNA
  • To distinguish neutral DNA from conserved DNA in
    the mix, use longer evolutionary tree branch
    lengths

Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
73
  • Separating DNA under selection from neutral DNA
  • Example human-dog-horse alignment has longer
    cumulative branch length than human-macaque-dog

Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
74
  • Separating DNA under selection from neutral DNA
  • Example human-dog-horse alignment has longer
    cumulative branch length than human-macaque-dog

Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
75
  • Separating DNA under selection from neutral DNA
  • Thus human-dog-horse alignment has lower id for
    neutral DNA than human-macaque-dog
  • This shifts the neutral DNA curve shifts to the
    right

Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
76
  • Results for DNA under purifying selection

Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
77
  • Results for DNA under purifying selection
  • 80id to 100id identified as DNA under
    purifying selection
  • As is visible from the figure, practically none
    of this DNA is lost in the primates (only 0.154
    of bases are lost)

Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
78
  • Results for DNA under purifying selection
  • The previous results were for CNEs
  • Those results compare to the numbers for lost
    coding DNA
  • Fraction of lost CNEs 0 at 100id, 0.00122 at
    80id
  • Fraction of lost exons 0 at 100id, 0.0000861
    at 80id

Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
79
  • Results for DNA under purifying selection
  • Thus CNEs under purifying selection are
    indispensable, similarly to coding elements.

80
  • CNE dispensability ranking

Deepest in vertebrate tree, so corresponds to the
most indispensable CNEs
In primates
In rodents
Region of high conservation (CNEs)
  • Left plot explanation (right plot is similar)
    take the h-m-d alignments, find their
    conservation id in each of the shown species.
    Then for each of those species, plot the fraction
    of DNA lost in rodents vs the id.

Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
81
  • CNE dispensability ranking

Image from McLean, C., and Bejerano, G.,
Dispensability of Mammalian DNA.
82
  • Conclusion
  • Many mammalian CNE knockouts produce no
    observable phenotype in the lab, suggesting great
    functional redundancy.
  • However, evolutionary analysis shows that the
    CNEs, and particularly ultraconserved regions,
    are indispensable.
  • Seems like the phenotype in knockouts is subtle,
    but very important.

Image source http//apps.co.marion.or.us
Write a Comment
User Comments (0)
About PowerShow.com