The Human Genomes - PowerPoint PPT Presentation

About This Presentation
Title:

The Human Genomes

Description:

The Human Genomes Gil McVean, Department of Statistics, Oxford Genetic variation among humans How do we differ? Let me count the ways Single nucleotide ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 49
Provided by: mcv31
Category:

less

Transcript and Presenter's Notes

Title: The Human Genomes


1
The Human Genomes
  • Gil McVean, Department of Statistics, Oxford

2
Genetic variation among humans
http//www.ncbi.nlm.nih.gov/genome/guide/human/
3
How do we differ? Let me count the ways
  • Single nucleotide polymorphisms
  • 1 every few hundred bp, mutation rate 10-9
  • Short indels (insertion/deletion)
  • 1 every few kb, mutation rate v. variable
  • Microsatellite (STR) repeat number
  • 1 every few kb, mutation rate 10-3
  • Minisatellites
  • 1 every few kb, mutation rate 10-1
  • Repeated genes
  • rRNA, histones
  • Large inversions, deletions
  • Rare, e.g. Y chromosome

TGCATTGCGTAGGC TGCATTCCGTAGGC
TGCATT---TAGGC TGCATTCCGTAGGC
TGCTCATCATCATCAGC TGCTCATCA------GC
100bp
1-5kb
per generation
4
Y chromosome variation
  • Non-pathological rearrangements of the AZFc
    region on the Y chromosome

Tyler-Smith and McVean (2003)
5
Serological techniques for detecting variation
Rabbit
Human
A
B
AB
O
A
6
Blood group systems in humans
Blood group systemLocus and chromosomal
location Number of genesFunction of productNumber
of allelesGene alterationsABOABO9q341Enzyme
(glycosyltransferase)102Mutations, insertions,
deletions,gene rearrangementsChido-RodgersC4A,
C4B6p21.32Complement factors7Mutations,
duplications, gene rearrangementsColtonAQP17p141C
hannel7Mutations, insertions, deletionsCromerDAF1
q321Complement binding protein10MutationsDiegoSLC4
A117q21-q22(erythroid non-eryrhroid)1Anion
exchangeradhesion78Mutations, insertions,
deletionsDombrockDO12p12.31Not
known9Mutations,one deletionDuffyFY1q22-q231Recep
tor9Mutations,one deletionGerbich
(Ge)GYPC2q14-q211Cytoskeleton?9Mutations gene
rearrangementsGILAQP39p131Channel2Mutation
splice siteH/h FUT1, FUT2 (pseudogene)19q13.32E
nzymes (glycosyltransferases)27 FUT122
FUT2Mutations, insertions, deletions one
unequal homologous recombination.IGCNT2
(IGnT)6p241Enzyme (glycosyltransferase)7Mutation
s, exon delIndian (IN)CD4411p131Adhesion
molecule2MutationsKell (with Kx blood group
system))KEL7q33,XKXp212 (KEL, XK)KEL
enzymeXK transporter?33 KEL 30 XKMutations
deletions,insertion, gene deletions in
XKKiddSLC14A118q12-q211Transporter8MutationsKnops
CR11q321Receptor24(tentative, because of
multiple mutations and gene rearrangements)Mutatio
nsdeletionsduplicationsLandsteiner WienerICAM4
(LW)19p13.31Adhesion molecule3Mutation one
deletionLewisFUT3(FUT6 ,FUT7 also
includedsame family but do not result in a blood
group phenotype19p1312Enzyme (glycosyltransferase
)Enzymes (glycosyltransferases)1420MutationsMutati
onsone insertionLutheranLU19q13.2-13.31Adhesion
molecule16MutationsMNSGYPA,GYPB,GYPE4q28-313
(GYPA,GYPB,GYPE)Not known43Unequal homologous
recombinations gene conversions
mutationsOKBSG19p13.31Factor adhesion2MutationsP
-related(includes P1 and globoside blood group
systems)A4GALT22q11.2-q13.2B3GALT33q252Enzy
mes (glycosyltransferases)14 A4GALT5
B3GALTMutations, insertions, deletionsRAPH-MER2CD
15111p15.51Adhesion molecule3MutationsRhRHCE,
RHD,1p34-36RHAG6p11-21.1RHBG,RHCG5Transporte
rs116 RHCE, RHD13 RHAGGene conversions,
mutations, deletions for RHCE,RHDrecombinations
for RHD mutations for RHAGSciannaERMAP1p34.11Adh
esion, receptor molecule?4MutationsXgXG, CD99
(MIC2)Xp22-332unknownadhesion moleculeso far
none documentedpolymorphism based on level of
expression?YTACHE7q22.11Enzyme4Mutations one
deletion
  • 28 known systems
  • 39 genes, 643 alleles

System Genes Alleles
ABO ABO 102
Colton C4A, C4B 7
Chido-rodgers AQP1 7
Colton DAF 10
Diego SLC4A1 78
Dombrock DO 9
Duffy FY 9
Gerbich GYPC 9
GIL AQP3 2
H/h FUT1, FUT2 27/22
I GCNT2 7
Indian CD44 2
Kell KEL, XK 33/30
Kidd SLC14A1 8
Knops CR1 24
Landsteiner-Wiener ICAM4 3
Lewis FUT3, FUT6 14/20
Lutheran LU 16
MNS GYPA,GYPB,GYPE 43
OK BSG 2
P-related A4GALT, B3GALT3 14/5
RAPH-MER2 CD151 3
Rh RHCE, RHD, RHAG 129
Scianna ERMAP 4
Xg XG, CD99 -
YT ACHE 4
http//www.bioc.aecom.yu.edu/bgmut/summary.htm
7
HLA diversity at the MHC locus
6p21.3
4 Mbp c. 127 genes
DP DQ DR C4 C2 TNFa,b
HLA-B HLA-C HLA-A
HLA-D
(18 genes)
Class II
Class III
Class I
HLA-A
8
Protein electrophoresis
Starch or agar gel

-
-
-


-
-

-
-
-
-
-
-


-
-
-
Direction of travel
Lewontin and Hubby (1966) Harris (1966)
9
The rise of DNA sequence analysis
  • RFLPs
  • Cann et al 1987
  • Sequencing of small regions
  • Vigilant et al 1991
  • Whole genome sequencing
  • Ingman et al 2000

10
Different, but not that different
  • Humans are one of the least diverse organisms
    (excepting cheetahs)

Species Diversity (percent)
Humans 0.08 - 0.1
Chimpanzees 0.12 - 0.17
Drosophila simulans 2
E. coli 5
HIV1 30
Photos from UN photo gallery www.un.org/av/photo
11
The biological significance of genetic variation
  • Genetic variation must underlie both pathological
    and non-pathological traits that show significant
    heritability
  • How do we locate these variants, and what use is
    finding them?
  • Genetic variation has been influenced by several
    million years of human existence.
  • How have human populations evolved over
    pre-historical times?
  • The distribution of variation is influenced by
    fundamental evolutionary processes
  • How has mutation, selection and recombination
    shaped the human genome?

12
Differences between autosomes, sex chromosomes,
mtDNA
Genome Average pairwise differences / kb Relative copy number (a)
Autosomes 0.5 0.85 1
X chromosome 0.47 3/4
Y chromosome 0.15 1/4
mtDNA 2.8 1/4
TISMWG (2001) Jobling, Hurles, Tyler-Smith (2004)
  • Under very simple models of populations, average
    pairwise differences is predicted by the formula
  • If m 1.5x10-9 per site per generation, this
    implies that the human population is lt 15,000
  • Population geneticists refer to this number as
    the effective population size

13
Demographic factors affecting diversity
  • Diversity is influenced by demographic factors
    such as
  • Variance in reproductive success
  • Differences in variance of success between males
    and females
  • Heritability of reproductive success
  • Changes in population size (growth, bottlenecks,
    natural fluctuations)
  • Which effects are most important?
  • Iceland faster drift in matrilines due to
    shorter generation interval, but no differences
    between the sexes (Helgason et al 2003)
  • Quebec heritability of reproductive success
    reduces diversity by more than an order in
    magnitude (Austerlitz and Heyer 1998)
  • The effective population size (Ne) is an
    approximation that allows simple mathematical
    models of populations to be applied to real data
  • Neltlt N

14
Diversity is not randomly distributed across the
genome
Chromosome 6
TISMWG (2001)
15
Correlates and determinants of diversity
  • There is systematic variation in the mutation
    rate along chromosomes
  • Wolfe and Sharp (1987), Lercher et al (2001)
  • Levels of diversity correlate with recombination
    rates
  • Nachman et al (1998)
  • Diversity and the allele frequency spectrum of
    SNPs are influenced by the local GC content
    (above CpG frequency)
  • Eyre-Walker (1999), Smith and Eyre-Walker (2001)
    Lercher et al (2002)
  • Recombination rates are correlated (to some
    degree) with GC content
  • Eyre-Walker (1993), Fullerton et al (2001), Kong
    et al (2002)

Lercher and Hurst 2002
Lercher et al (2001)
16
What is the link between recombination and
diversity?
  • A positive correlation between recombination rate
    and diversity could mean
  • Recombination is mutagenic
  • Diversity promotes recombination
  • Recombination and mutation are linked by a third
    factor (chromatin accessibility, transcription,
    Hill-Robertson effects)

Mutation
Hellmann et al 2003
Hitch-hiking
17
Diversity is not evenly distributed across genes I
  • Adaptive evolution wipes out diversity nearby
    due to the hitch-hiking effects of a selective
    sweep
  • e.g. Duffy-null locus in sub-Saharn africa,
    protects against P. vivax
  • Hamblin and Di Rienzo (2000)

FYO mutation
African
Pop1
Pop2
European
Ancestral allele
Derived allele
Missing data
18
Diversity is not evenly distributed across genes
II
  • Purifying selection eliminates deleterious
    mutations and reduces diversity in regions of
    strong functional constraint

Zhao et al (2003)
19
Diversity is not evenly distributed across genes
III
  • Some genes are under balancing or diversifying
    selection, where diversity is actively selected
    for
  • MHC complex heterozygote advantage and
    frequency-dependent selection driven by
    recognition of pathogens

Horton et al (1998)
20
Diversity is not evenly distributed across
populations I
  • African populations are more diverse than
    non-African populations
  • More polymorphisms
  • Polymorphisms at less skewed frequencies
  • Why?
  • Out-of-Africa event associated with a bottleneck?
  • Selection on genome in adaptation to novel
    habitats?

Population Segregating sites per kb (n 30) Diversity per kb Tajima D statistic
Hausa (African) 4.8 0.11 -0.33
Italian 3.2 0.10 1.18
Chinese 3.0 0.07 1.19
Frisse et al (2001)
21
The Tajima D statistic
  • Measures departure from neutral coalescent
    expectations in allele frequency distribution
  • ve values indicate excess of intermediate
    frequency variants
  • -ve values indicate excess of low-frequency
    variants
  • E.g. human mtDNA

No. sites
Observed
Expected
Rare allele frequency
Data from Ingman et al (2000)
22
Diversity is not evenly distributed across
populations II
  • Small, isolated populations often have skewed
    allele frequencies (ve Tajima D) due to founder
    effects and high degree of genetic drift
  • Marginal populations (Evenki, Saami)
  • Island populations (Iceland, Sardinia)

Finns
Saami
Swedes
Evenki
Minor allele frequencies at 50 SNPs (Kaessmann et
al 2002)
23
The second dimension of human diversity!
  • The distribution of alleles at different loci are
    not independent
  • Correlations between SNPs are particularly strong
    for those lt50kb
  • These correlations indicate shared evolutionary
    history

Chromosomes
Sites
Chromosome 22 1Mb 57 Europeans
Lipoprotein Lipase 10kb 48 African Americans
Xq13 10kb 69 worldwide
24
Correlations between SNPs are measured by linkage
disequilibrium
Linkage equilibrium
Linkage disequilibrium
25
Why are SNPs correlated?
. . .
. . .
. . .
The mutation arises on a particular genetic
background
If the mutation increases in frequency by drift
(or selection) the associated haplotype will also
increase in frequency
Over time the association between the new
mutation and linked mutations will decay by
recombination
26
What generates and destroys LD?
  • Genetic drift
  • Stochastic sampling process in finite population
  • Population structure and admixture
  • Correlations between mutations arising through
    shared population history
  • Natural selection
  • Combinations of favoured/unfavoured alleles (weak
    force)
  • Recombination is the ONLY force which breaks down
    LD
  • LD is a balance between recombination and other
    forces

27
Empirical patterns of LD
  • Large-scale surveys of LD in humans
  • e.g. Huttley et al. (1999), Abecasis et al.
    (2001), Reich et al. (2001)
  • LD extends over considerable distance (gtgt10kb) in
    most populations

D
Kruglyak prediction
1 5 10 20 40 80 160
unlinked
Distance (kb)
Reich et al. (2001)
28
Differences between populations
r2
  • African populations show less LD than European
    populations (e.g. Frisse et al. 2001)
  • Small, isolated populations (e.g. Saami, Evenki)
    show increased LD (Kaessmann et al 2002)
  • Founder populations (e.g. Finland, Sardinia) do
    not always show increased LD (e.g. Eaves et al.
    2000)

29
Assessing the contribution of structure to LD
  • Rosenberg et al. (2002)
  • Population differences in allele frequency exist,
    but many markers/loci are required in order to
    estimate ethnic origin with accuracy
  • Admixture between populations has played an
    important historical role

Oceania
America
Asia
Middle east
Europe
Africa
30
Differences between genomic regions
Average D
Dawson et al (2002)
Reich et al (2001)
  • Evidence for heterogeneity in LD along/between
    chromosomes
  • Taillon-Miller et al (2000), Jeffreys et al
    (2001), Daly et al (2001), Patil et al (2001),
    Reich et al (2001), Reich et al (2002), Gabriel
    et al (2002), Dawson et al (2002), Phillips et al
    (2002)

31
Differences within genomic regions
Jeffreys et al (2001)
32
Recombination hot-spots in the MHC region
Jeffreys et al (2001)
  • Other genes with recombination hot-spots
  • B-globin
  • PAR/SHOX
  • MS32
  • (Chi sequences)

33
In an ideal block world...
Pääbo (2003)
  • Blocks extend many (gt100) kbs.
  • All alleles within blocks are in strong
    associations.
  • There are no associations between blocks.
  • In each block, only a few (4-5) haplotypes
    account for the majority (gt90) of variation.
  • In each block, only a few SNPs are required to
    map the majority of haplotype variation.
  • Blocks correspond to recombination hot-spots.

Association studies suddenly look much less
difficult... Goldstein (2001)
34
The international Hapmap project
  • International partnership of scientists and
    funding agencies from Canada, China, Japan,
    Nigeria, the United Kingdom and the United States
    to develop a public resource that will help
    researchers find genes associated with human
    disease and response to pharmaceuticals
  • Gibbs et al (2003)
  • Aims to survey variation across entire human
    genome at 1 SNP per 5kb or less, in three
    populations (CEPH Europeans, Chinese/Japanese,
    Yoruban Africans). More than 600,000 SNPs with
    MAFgt5
  • http//www.hapmap.org/
  • All data is public access and available through
    the Data Coordination Center (DCC)

35
How are blocks defined?
  • Incompatibility through the four-gamete test
  • Wang et al. (2002)
  • Regions with consistently high pairwise LD
    measures
  • Gabriel et al. (2002)
  • Dynamic programming solutions based on
  • Measures of pairwise LD structure - Zhang et al.
    (2002)
  • Minimum description length (information theoretic
    principles) Koivisto et al. (2002), Anderson et
    al (2003)

36
Empirical block pattern
Blocks
Length
match
frequencies
Daly et al (2001)
37
Problems with blocks
  • Block definitions depend on marker spacing,
    allele frequency and algorithm.
  • Blocks (as defined by some algorithms) may not
    reflect variation in the recombination rate

All reported mean block lengths consistent with
uniform recombination ( 1 SD)
Phillips et al (2003)
38
Do we need haplotype blocks?
  • The key determinant of LD is recombination
  • True haplotype blocks are formed by regions of
    low recombination separated by recombination
    hotspots
  • If we knew the fine-scale (ltltMb) structure of
    recombination-rate variation, blocks would not be
    necessary
  • Genetic maps estimated from pedigree studies show
    recombination rate variation
  • BUT do not have the resolution to define
    recombination hotspots

Chromosome 3 Kong et al (2000)
39
Learning about recombination from diversity
  • We can estimate the fine-scale structure of
    recombination rates from patterns of genetic
    variation

Rate estimates from sperm (Jeffreys et al 2001)
Genes
n50 unrelated European genotypes
40
Comparison with pedigree-based maps
  • Summing fine-scale estimates over 2Mb intervals
    accurately recovers variation in recombination
    rate detected by pedigree studies

Chromosome 22
Chromosome 19
Markers for pedigree-based map
Sex-averaged recombination rate (cM/Mb)
Position (kb)
Position (kb)
Pedigree Population genetic
41
A chromosomal view of recombination rate variation
  • 10Mb of Chromosome 20, 96 CEPH genotypes, 4337
    SNPs

Sex-averaged recombination rate (cM/Mb)
Position
42
  • What is the probability that there exists a SNP
    in this region that is NOT in LD with currently
    observed SNPs?

?
43
(No Transcript)
44
The answer depends on recombination
Recombination rate
If recombination is high, the untyped SNP is
unlikely to be in association
45
If recombination is low, the untyped SNP is
likely to be in association
Recombination rate
We can use population genetic methods to estimate
the recombination rate and predict the
distribution of the untyped SNP
46
Hapmap challenges
  • Prediction
  • Do the SNPs currently genotyped provide an
    accurate representation of variation at linked
    SNPs in other samples from the same population?
  • Selection of tagging SNPs
  • What is the smallest number of SNPs I need type
    in order to achieve a given level of power?
  • Demography
  • Are the results from one population transferable
    to other populations?

47
Suggested reading
  • Jobling MA, Hurles ME and Tyler-Smith C. 2004.
    Human Evolutionary Genetics Origins, Peoples
    Disease. Garland Science
  • Balding DJ, Bishop M and Cannings C. 2001.
    Handbook of Statistical Genetics. John Wiley and
    Sons Ltd.
  • Li W-H. 2001. Molecular evolution. Sinauer.

48
References 1. E. C. Anderson and J. Novembre,
Am.J.Hum.Genet. 73, 336-354 (2003). 2. F.
Austerlitz and E. Heyer, Proc.Natl.Acad.Sci.U.S.A
95, 15140-15144 (1998). 3. R. L. Cann, M.
Stoneking, A. C. Wilson, Nature 325, 31-36
(1987). 4. M. J. Daly, J. D. Rioux, S. F.
Schaffner, T. J. Hudson, E. S. Lander, Nat.Genet.
29, 229-232 (2001). 5. E. Dawson et al., Nature
418, 544-548 (2002). 6. I. A. Eaves et al.,
Nat.Genet. 25, 320-323 (2000). 7. A.
Eyre-Walker, Proc.R.Soc.Lond B Biol.Sci. 252,
237-243 (1993). 8. A. Eyre-Walker, Genetics 152,
675-683 (1999). 9. L. Frisse et al.,
Am.J.Hum.Genet. 69, 831-843 (2001). 10. S. M.
Fullerton, C. A. Bernardo, A. G. Clark,
Mol.Biol.Evol. 18, 1139-1142 (2001). 11. S. B.
Gabriel et al., Science 296, 2225-2229
(2002). 12. R. A. Gibbs et al., Nature 426,
789-796 (2003). 13. D. B. Goldstein, Nat.Genet.
29, 109-111 (2001). 14. M. T. Hamblin and A. Di
Rienzo, Am.J.Hum.Genet. 66, 1669-1679 (2000). 15.
A. Helgason, B. Hrafnkelsson, J. R. Gulcher, R.
Ward, K. Stefansson, Am.J.Hum.Genet. 72,
1370-1388 (2003). 16. I. Hellmann, I.
Ebersberger, S. E. Ptak, S. Paabo, M. Przeworski,
Am.J.Hum.Genet. 72, 1527-1535 (2003). 17. R.
Horton et al., J.Mol.Biol. 282, 71-97 (1998). 18.
M. Ingman, H. Kaessmann, S. Paabo, U.
Gyllensten, Nature 408, 708-713 (2000). 19. A.
J. Jeffreys, L. Kauppi, R. Neumann, Nat.Genet.
29, 217-222 (2001). 20. M. A. Jobling, M. E.
Hurles, C. Tyler-Smith, Human Evolutionary
Genetics Origins, Peoples Disease (Garland
Science, New York, 2004). 21. H. Kaessmann et
al., Am.J.Hum.Genet. 70, 673-685 (2002). 22. M.
Koivisto et al., Pac.Symp.Biocomput. 502-513
(2003). 23. A. Kong et al., Nat.Genet. 31,
241-247 (2002). 24. M. J. Lercher and L. D.
Hurst, Trends Genet. 18, 337-340 (2002). 25. M.
J. Lercher, N. G. Smith, A. Eyre-Walker, L. D.
Hurst, Genetics 162, 1805-1810 (2002). 26. M. J.
Lercher, E. J. Williams, L. D. Hurst,
Mol.Biol.Evol. 18, 2032-2039 (2001). 27. M. W.
Nachman, V. L. Bauer, S. L. Crowell, C. F.
Aquadro, Genetics 150, 1133-1141 (1998). 28. S.
Paabo, Nature 421, 409-412 (2003). 29. N. Patil
et al., Science 294, 1719-1723 (2001). 30. M. S.
Phillips et al., Nat.Genet. 33, 382-387
(2003). 31. D. E. Reich et al., Nature 411,
199-204 (2001). 32. D. E. Reich et al.,
Nat.Genet. 32, 135-142 (2002). 33. N. A.
Rosenberg et al., Science 298, 2381-2385
(2002). 34. R. Sachidanandam et al., Nature 409,
928-933 (2001). 35. N. G. Smith and A.
Eyre-Walker, Mol.Biol.Evol. 18, 982-986
(2001). 36. P. Taillon-Miller et al., Nat.Genet.
25, 324-328 (2000). 37. C. Tyler-Smith and G.
McVean, Nat.Genet. 35, 201-202 (2003). 38. L.
Vigilant, M. Stoneking, H. Harpending, K. Hawkes,
A. C. Wilson, Science 253, 1503-1507 (1991). 39.
N. Wang, J. M. Akey, K. Zhang, R. Chakraborty,
L. Jin, Am.J.Hum.Genet. 71, 1227-1234 (2002). 40.
K. Zhang, M. Deng, T. Chen, M. S. Waterman, F.
Sun, Proc.Natl.Acad.Sci.U.S.A 99, 7335-7339
(2002). 41. Z. Zhao, Y. X. Fu, D. Hewett-Emmett,
E. Boerwinkle, Gene 312, 207-213 (2003).
Write a Comment
User Comments (0)
About PowerShow.com