Title: Human Genome: Mapping, sequencing, Techniques Diseases
1Human Genome Mapping, sequencing, Techniques
Diseases
Lecture 5 BINF 7580
1
2Genome Comparison Finds Chimps, Humans Very
Similar at the DNA Level This is the conclusion
in the paper published in Nature in 2005
This genomic comparison make us closer to the
answer of a most fundamental question What
makes us human?
The book was published in 2003
I have to notice that the value 98 is changed
every year, see paper in Nature 2005
2
3A paper in the journal Nature 2005 describes a
comprehensive comparison of the genetic
blueprints of humans and our closest living
evolutionary relatives, chimpanzees and shows
that genomes share perfect identity with 96. (so
our similarity with our relatives became a little
less) This similarity between two genomes is
intriguing because when DNA sequence is highly
similar between two species, the biological
function of that DNA is predictably similar as
well. The DNA used to sequence the chimp genome
came from the blood of a male chimpanzee named
Clint. He died last year from heart failure at
the relatively young age of 24, but two cell
lines from the primate have been preserved at the
Coriell Institute for Medical Research on the
campus of our University -UMDNJ, Camden, N.J.
The important step of the research is to find the
differences between two genomes. Because it will
help us to answer the question
What makes us human?
3
4The DNA sequence that was directly compared
between the two genomes is almost 99 percent
identical.
However, when DNA insertions and
deletions are taken into account, humans and
chimps still share 96 percent of their sequence.
At the protein level, 29 percent
of genes code for the same amino sequences in
chimps and humans.
In fact, a
typical human protein has accumulated just one
unique change since chimps and humans diverged
from a common ancestor about 6 million years ago.
4
5The number of genetic differences
between
humans and chimps is approximately 60 times less
than between human and mouse and about 10 times
less than between mouse and rat.
On the other hand, the number of
genetic differences between a human and a chimp
is about 10 times more than
between any two human and human.
Both humans and chimps genomes contains about 3
billion base pairs.
About 35 million
DNA base pairs differ between the shared portions
of the two humans and chimps genomes In
addition, there are another 5 million sites that
differ because of an insertion or deletion in one
of the lineages, About 3 million of these
differences may lie in crucial protein-coding
genes or other functional areas of the genome.
Others lie in what is believed to be DNA of
little or no function.
5
6Comparison of the humans and chimps genomes
revealed that more than 50 genes present in the
human genome are missing or partially deleted
from the chimp genome. For example, three key
genes involved in inflammation appear to be
deleted in the chimp genome, possibly explaining
some of the known differences between chimps and
humans in respect to immune and inflammatory
response. On the other hand, humans appear to
have lost the function of the caspase-12 gene,
which produces an enzyme that may help protect
other animals against Alzheimer's disease.
6
7The researchers discovered that a few classes of
genes are changing unusually quickly in both
humans and chimps compared with other mammals.
These classes
include genes involved in perception of sound,
transmission of nerve signals, production of
sperm and cellular transport of ions. .
As the sequences of other mammals and primates
emerge in the next couple of years, we will be
able to determine what DNA sequence changes are
specific to the human lineage, said the study's
lead author of the paper in Nature
H.A. Why does he think so? Explain his idea.
7
8However, despite DNA resemblance surprising
finding was reported in the paper in Science
(2005) Comparison of Fine-Scale Recombination
Rates in Humans and Chimpanzees.
The locations of DNA swapping
between chromosomes, known as recombination
hotspots, are almost entirely different. To
understand why this finding is so surprising let
us, at first, discuss What's So Hot about
Recombination Hotspots? published in the journal
PLoS Biology in 2004 Are you familiar with the
concept recombination hotspot? If not, or not
very well, let us to answer to the very important
question
Is
recombination something that happens to DNA
generally or in particular sequences?
The answer It partly happens in particular
sequences.
8
9Bacteria have the specific sequence fragment of
eight base pairs that stimulate the action of
proteins that bring about recombination.
Similarly, the immunoglobulin genes of mammals
have recombination signal sequences that are
involved in V-J joining linking together a
variable gene segment and a joining segment to
form an immunoglobulin gene. The investigation
in mammals (mice and humans) showed that normal
meiotic recombinationd depend on the local DNA
sequence? Thus analysis of recombination or
crossing over (the process when two homologous
chromosomes exchanging large portions of their
DNA). showed that chromosomes have local
recombination hotspots (rh).
H.A. Do you know why V-J joining is crucial for
Ig gene?
RH are local regions (about 1.5 to 2.0 kb in
width) of chromosomes, in which crossing over is
much more likely to occur than in other places on
the chromosome. The rate within hotspots can be
hundreds or thousands times that of the
surrounding regions, so called coldspots .
9
10The important question are there sequences for
specific RH? As you understand the
positive answer give us markers for RH. It is
known that directed mutagenesis of single
nucleotides can disrupt hotspot activity, and
different alleles of the same locus can show
differences in recombination. So there is a
strong sequence specificity of RH. However, no
sequence motif has been identified as causing
recombination hotspots. Recent studies of these
hotspots (Genome Biology 2004, 5242 )show that
they do not share common sequence motifs No
single factor was consistently associated with
the presence of hotspots - neither GC content,
the frequency of CpG dinucleotides, the presence
of (AC)n repeats, nor any primary DNA sequence
motif that had previously been hypothesized to
influence the existence of hotspots.
10
11However, signals for RH may serve positions
preferable for double-stranded breaks and regions
of a non-B form of DNA (such as Z-DNA). What we
know about these sequence patterns?
From my knowledge not so much. Z-DNA is the
most stable for alternating GCGC sequences and
less stable for alternating GTGTACAC. and no
information about sequence patterns specific for
double-stranded breaks".
H.A. The data about sequence patterns of RH
referred to the paper in 2004. Please, check
the papers about RH of last years to find new
results if any about its sequence patterns.
11
12The important question How many RH in a human
genome? Statistical analyses of genetic
variation data, and patterns in linkage
disequilibrium to identify over 25,000 hotspots
in the human genome. (Science 2005 321 ).
linkage disequilibrium ? very important
concept in genetics. At first, What it is linkage
equilibrium ? describes the situation in which
the haplotype frequencies in a population have
the same value that they would have if the genes
at each locus were combined at random. It is
easiest to illustrate it by thinking of two genes
each with two alleles. If there is nothing
particular going on in the population then what
we would expect is the association between the
two alleles? They will not be any more likely to
co occur than you would expect on the basis of
their separate frequencies in the population -
that is known as linkage equilibrium. The
opposite is linkage disequilibrium where there is
a non-random association between the two alleles
at different loci.
12
13Linkage disequilibrium describes a situation in
which some combinations of alleles in any
positions in a genome, not necessarily on the
same chromosome, occur more or less frequently in
a population than would be expected from a random
formation of haplotypes from alleles based on
their frequencies. These non-random associations
are measured by the degree of linkage
disequilibrium (LD).
Why LD is useful?
Population genomics
Linkage disequilibrium holds the key Current
Biology, 2001 For single-locus simple diseases it
is possible to determine the rough genomic
position of the causal mutation.
Unfortunately, the common diseases responsible
for the vast majority of mortality and morbidity
are anything but simple. Most cancers and
cardiovascular, neuro-psychiatric, respiratory
and infectious diseases are influenced by
variation at multiple loci and show complicated
dependence on environmental factors.
13
14To see how linkage disequilibrium could be used
to map disease genes, consider a locus M, with
two alleles M1 and M2, and an unknown causative
locus B, with one allele B1 that is a risk factor
for high blood pressure relative to the other
allele B2. Now imagine that the M and B loci
are in disequilibrium, i.e. specifically that the
M1 allele is more often found with the B1 allele
than the B2 allele. In that case, not only B1 is
elevated in cases, but M1 is too because of its
association with B1. (Why? Because of
disequilibrium) Thus if we did not know about
the B locus, but typed the M locus, it could lead
us to the B locus. But what exactly does such an
association indicate?
14
15LD arises as a consequence of three features of
life a) the
physical structure of chromosomes
b) the inherent
mutations that occur at random during DNA
replication c) the rate of recombination between
any two given loci. Example Consider a locus
with two alleles A and a which is located in a
non-coding or regulatory region. During the
replication of DNA prior to meiosis a mutation
occurs at a locus 200kb away, and results in the
conversion of a B allele to a b allele on the
chromosome that carried the A allele. No
recombination occurs between these two loci
during meiosis, so the pairs of alleles that are
passed on are AB on one chromosome and ab on the
other chromosome. At this point in time the
alleles are said to be in complete linkage
disequilibrium.
15
16The basis of measuring Linkage Disequilibrium is
the difference between the observed and expected
frequencies of pairs of alleles. For example the
two loci described above there are four possible
combinations of alleles
Allele A a B AB aB b
Ab ab
How fast a pair of alleles at a given locus
approach linkage equilibrium is simply a function
of the recombination rate between the two loci,
which is in itself determined by the physical and
genetic distance between the loci (these are
often highly correlated. The frequency of the
haplotype AB (PAB) and Ab (PAb) and so on.
16
17But if there is a variation throughout the
genome, (i.e. some sites have different rates of
recombination (recombination hotspots) and the
non-random association of alleles is observed -
linkage DISequilibrium is observed. It follows
that the frequency of the AB allele may not be
determined as a simple product of frequencies of
the haplotype . Then the frequency with which
any pair of alleles will be present together in
the next generation (P'AB) is equal to PAB -
frequency of AB haplotype in the current
generation PA -
frequency of allele A in the current generation.
PB -frequency of allele B in
the current generation,
? - recombination fraction ,
so (1 ?) - probability that recombination does
not occur between the loci
P'AB ? PAPB (1 - ?)PAB
17
18The important question Why the knowledge of (RH)
location is of strong interest? The paper
What's So Hot about two Recombination Hotspots
gives two answers to this questions. The first
answer The existence of recombination hotspots
offers a way to learn what other processes are
associated with recombination.
For example, we know that homologous
crossovers are initiated by the cleavage of
single chromosomes, called double-strand breaks
a break in both strands of a DNA. It turns out
that because of this causal linkage, the hotspots
for double-strand breaks and the hotspots for
recombination are one and the same.
18
19The second answer. DNA sequence patterns of RH
could be used to map the position of alleles that
cause disease. ??? When multiple copies of the
DNA sequence of a gene are aligned, they reveal
the location and distribution of variation at
individual nucleotide positionssingle nucleotide
polymorphisms (SNPs).
It has long known that SNPs that are adjacent or
near each other tend to be highly correlated in
their pattern and to exhibit strong LD.
H.A. Try to think and to answer Why the
knowledge of RH location is of strong interest?
Here there are 2 answers, but maybe you will
suggest more.
19
20It is this LD that enables scientists to map the
locations of mutations that cause heritable
genetic diseases. If alleles that cause a disease
have the same kind of LD nearby SNPs as SNPs
generally have with each other, then one could
search for genes with disease alleles by looking
for a pattern of SNPs that is found only in
people who have the disease. This general method
for mapping disease alleles is called association
mapping.
H.A. Please, explain this idea.
20
21(No Transcript)
22Sequences of nucleotides at known polymorphic
sites are very long, so to find LD we need
developed statistical methods.
See for example, The Mathematical
Genetics Group in Oxford http//mathgen.stats.ox.a
c.uk/mathgenindex.html
H.A. From this figure find SNP related with
affected cases.
23Now time to return to the slide However, despite
DNA resemblance surprising finding was reported
in the paper in Science (2005) Comparison of
Fine-Scale Recombination Rates in Humans and
Chimpanzees. The mapping of recombination
hotspots in the human and chimpanzee genomes
reveals that despite 99 identity between human
and chimpanzee DNA sequences, the locations of
recombination hotspots, are almost entirely
different.
This difference is intriguing because in most
cases, when DNA sequence is highly similar
between two species, the biological function of
that DNA is predictably similar as well. Thus the
same we can expect for human and chimp but this
new finding shows that it not so for closest
relatives. For example, Hotspots in the human
beta-globin and human leukocyte antigen gene
regions were found to be absent in chimpanzees.
23
24Why these hotspots occur, and what triggers the
swapping of DNA at those particular points, is a
mystery. One theory was that the DNA code either
side of hotspots controlled the activity.
However, when the researchers compared chimps and
humans for the new study, they were startled to
find that despite being so genetically similar,
the species have totally different RH. This
difference is intriguing because If chimps and
humans do not share these RH, then it means
something other than the surrounding DNA code
must be controlling the process of recombination
because the surrounding DNA code in chimps and
humans is pretty much identical.
This means that recombination is
even more mysterious than we already thought
what is controlling it, and why does it occur so
often at these particular places?
24
25This difference is intriguing because the
recombination landscape must be evolving
extremely quickly. In humans and chimpanzees, the
genome as a whole is very similar but the
recombination hotspots totally different so
hotspots must be evolving much, much faster than
the rest of the genome. That adds extra mystery
to what drives these hotspots why do they evolve
so quickly? It cause new interest to the
problem Recombination Rates in Humans and
Chimpanzees
25
26This difference is intriguing because it is
possible that hotspot discordance could result
from the substantial differences seen in the
demographic histories or population structures of
humans and chimpanzees. For example, the extent
of genetic differentiation between western and
central chimpanzees is much stronger than what is
seen between human populations. This suggests
that careful attention should be paid to
geographic sampling in studies of chimpanzee
genetic variation.
27This difference is intriguing because lacking
evidence that population history or local DNA
sequence variation can account for hotspot
location. It was suggested in the paper that
epigenetic factors that influence chromatin
configuration (for example, acetylation and
methylation) may be the key.
Epigenetics is a term that described chromatin
and DNA modifications that do not involve changes
in the DNA sequence of the organism. These
epigenetic changes may be cause by chromatin
remodeling like posttranslational modification of
the amino acids that make up histone proteins or
DNA methylation by the addition of methyl groups
to the DNA, at CpG sites, to convert cytosine to
5-methylcytosine Now is widely developed
Computational epigenetics that uses bioinformatic
methods to complement experimental research in
epigenetics.
28This difference is intriguing because It
suggests a number of potentially useful studies
to understand this phenomenon to examine
regional
variations in chromatin accessibility the action
of recombination-related proteins, i.e. proteins
which are involved in the initiation of
double-strand breaks and recombination and
comparisons of these genes in humans and
chimpanzees the transposable element, which
suppresses recombination in nearby
sequences. This difference is intriguing because
it demonstrate the value of comparative genomic
analysis for understanding basic biological
processes such as recombination, and for
potentially improving the design of genetic
association studies.
H.A. Try to find and shortly describe new
researches where different genomes are analyzed.