Title: Module 1
1Module 1
- Basic principles in population and quantitative
genetics
1
2Genes and Genomes
2
3Quick review Genes and genomes
- In eukaryotes, DNA is found in the...
- Nucleus
- Mitochondria
- Chloroplasts (plants)
- Organelle inheritance is often uniparental,
making it powerful for certain types of
applications - For this workshop, well focus almost exclusively
on nuclear genes
Plant Cell
3
4Chromosomes
- Linear strands of DNA and associated proteins in
the nucleus of eukaryotic cells - Chromosomes carry the genes and function in the
transmission of hereditary information - Diploid cells have two copies of each chromosome
- One copy comes from each parent
- Paternal and maternal chromosomes may have
different alleles
4
5Alleles
- Alternative forms of a gene
- Alleles arise through mutation
- A diploid cell has two copies of each gene (i.e.
two alleles) at each locus - Alleles on homologous chromosomes may be the same
or different (homozygous vs. heterozygous)
5
6Genes
- Units of information on heritable traits
- In eukaryotes, genes are distributed along
chromosomes - Each gene has a particular physical location a
locus - Genes encompass regulatory switches and include
both coding and non-coding regions - Genes are separated by intergenic regions whose
function is not understood
6
7The genome
- An individuals complete genetic complement
- For eukaryotes, a haploid set of chromosomes
- For bacteria, often a single chromosome
- For viruses, one or a few DNA or RNA molecules
- Genome size is typically reported as the number
of base pairs in one genome complement (i.e.
haploid for eukaryotes) - Until recently, we studied genes and alleles one
or a few at a time (genetics) - Aided by high throughput technologies we can now
study 100s to 1000s of genes simultaneously
(genomics).
7
8Genome size
- Lambda phage 4.8 x 103 bp
- E. coli 4.6 x 106 bp
- Arabidopsis 1.6 x 108 bp
- Cottonwood 4.8 x 108 bp
- Chestnut 9.6 x 108 bp
- Humans 3.0 x 109 bp
- Pines 3 x 1010 bp
- Fritillaria 1.3 x 1011 bp
- Amoeba dubia 6.7 x 1011 bp
Douglas-fir, Pseudotsuga menziesii, has a
chromosome number of 26. Its diploid, so that
means that n13. Most Conifers have n12.
8
9Genes and genomes Using what we've learned
- In this workshop, we will convey many ways that
our knowledge of genetics and genomics can be
used by breeders and land managers - Our emphasis will be on the study of methods that
can be used to characterize or dissect complex
(quantitatively inherited) traits and associate
phenotype with genotype, leading to marker
informed applications - To do this, we will need to review several
sub-disciplines of the science of genetics
9
10Genetics
- To understand marker-informed breeding, we will
first set the stage by briefly reviewing - Mendelian genetics describes inheritance from
parents to offspring - discrete qualitative traits (including genetic
markers) - predicts frequencies of offspring given specific
matings - Population genetics describes allele and genotype
frequencies over space and time, including - changes in allele frequencies between generations
- environmental factors contributing to fitness
- models are limited to a small number of genes
- analyzes variation within and among populations
- Quantitative genetics describes variation in
traits influenced by multiple genes (continuous
rather than discrete attributes) - relies on statistical tools describing
correlations among relatives - each of many genes may have little influence on a
specific trait
10
11Mendelian Genetics(and the basis of genetic
markers)
11
12Mendelian inheritance
- We begin with family resemblance Like begets
like. - How do we explain it?
www.madeyoulaugh.com
12
13Traits tend to run in families
13
14Genotype and phenotype
- Genotype refers to the particular gene or genes
an individual carries - Phenotype refers to an individuals observable
traits - Only rarely can we determine genotype by
observing phenotype - Genomics offers tools to better understand the
relationship between genotype and phenotype - Individual genetic markers behave as Mendelian
traits, so understanding Mendelian traits is key
to understanding markers
14
15Single-gene traits in trees are rareHeres one
in alder (f. pinnatisecta)
15
16Mendels insights were amazing, and yet...
- Knowledge of biological processes was
rudimentary, including - cell division (mitosis or meiosis)
- chromosomes were not yet known
- With the discovery of chromosomes, we realized
- That genes are packaged on chromosomes
- That genes on the same chromosome are associated
(genetic linkage). Very important! We will
explore this a great deal in future modules
16
17Markers reflect genetic polymorphisms that are
inherited in a Mendelian fashion
- DNA markers 'mark' locations where DNA varies
(sequence or size) - Such polymorphisms can vary within and among
individuals (e.g. heterozygotes vs. homozygotes)
and populations - Markers may be located in genes or elsewhere in
the genome - Historically, we've had too few markers to inform
breeding - Genomics tools provide an almost unlimited supply
of markers - Todays marker applications were only imagined a
few years ago
17
18DNA markers reflect sequence variation
18
19Markers track inheritance
19
20Trait dissection using markers
Hypothetical genes (QTLs) affecting economic
traits, mapped using genetic markers a-m
20
21Single Nucleotide Polymorphisms(SNPs) embedded
within a DNA sequence
- DNA sequences are aligned
- Polymorphic sites are identified
- Haplotypes (closely linked markers of a specific
configuration) are deduced by direct observation
or statistical inference
atggctacctgaactggt
caactcatcgaaagctaa 1 atggctacctgaactggtcaactcatcg
aaagctaa 1 atgcctacctgaactggtcaactcatcgaaagctaa
2 atgcctacctgaactggtcaactcatcgaaggctaa
3 atgcctacctgaactggtcaacacatcgaaggctaa 4
21
22Genetic linkage and recombination
- Genes on different chromosomes are inherited
independently - Genes located on the same chromosome tend to be
inherited together because they are physically
linkedexcept that widely separated genes behave
as if they are unlinked. - Recombination during gametogenesis breaks up
parental configurations into new (recombinant)
classes - The relative frequency of parental and
recombinant gametes reflects the degree of
genetic linkage - Genetic mapping is the process of determining the
order and relative distance between genes or
markers (to be discussed in Module 3)
22
23Genetic linkage and recombination
23
24Population Genetics
24
25Population genetics
- Population genetics is the study of genetic
differences within and among populations of
individuals, and how these differences change
across generations - It extends Mendelian genetics to include
population dynamics and chance events such as - survival
- frequency of specific matings
- random sampling from populations, and
- mutation
25
25
26Population genetics
- Over time, changes among populations can lead to
genetic isolation and speciation - Population genetics describes the mechanics of
how evolution takes place - As we discuss genetic differences, keep in mind
that - polymorphisms reflect differences among
individuals within a species - divergence reflects differences between species
- Well discuss more specifics on these processes
later on... - (D. L. Hartl 2000. A Primer of Population
Genetics)
26
27What do population geneticists measure?
- Studies limited to simply-inherited traits
- historically, this involved morphological or
biochemical markers - shifted to allozymes in 1960s to 1980s
- DNA markers became more common beginning in later
1980s - many types of DNA markers have been developed
- well re-visit markers in Module 3
- Key points
- population geneticists measure discrete
(Mendelian) traits - quantitative geneticists measure continuous
traits controlled by multiple genes (well talk
about quantitative genetics later in this Module)
27
27
28Why study genes in populations?
- In natural populations
- Adaptation, or the ability to survive and exploit
an environmental niche, involves the response of
populations, not individuals. - In breeding populations
- Genetic gainimproving the average performance of
populations for desired breeding
objectivesdepends on selecting and breeding
parents with the best genetic potential
28
29Population genetics Key questions
- Population genetics provides empirical models to
predict genetic behavior for these and other
situations - What genotypes are present in a population and at
what frequency? - Are all genotypes equally likely to survive
and reproduce? - Are mating frequencies independent of genotype?
- Is the population stratified in some way, e.g. by
proximity, size, or the timing of
natural events? - To what extent does mating occur with individuals
outside outside the immediate
area? - To what extent are environmental conditions
stable across generations? - Etc
29
29
30Population genetics
- Provides empirical models to predict genetic
behavior - What genotypes are present in a population and at
what frequencies? - Are all genotypes equally likely to survive and
- reproduce?
- Are mating frequencies independent of genotype?
- Is the population stratified in some way, e.g. by
- proximity, size, or the timing of natural
events? - To what extent does mating occur with individuals
- outside the immediate area?
- To what extent are environmental conditions
stable across generations?
30
30
31Population genetics
- How genetically diverse is a species or
population? - contrast diversity in populations that differ in
life-history traits, pop size, breeding
structure, etc - Are different populations closely related to one
another? - monitor diversity for conservation purposes
- What is the potential for inbreeding depression?
- what is the minimum viable population size from a
genetic standpoint? - How is genetic variation maintained?
- Identify genes/alleles responsible for phenotypic
variation - Phylogenetic and biogeographic questions
31
31
32Populations are groups of individuals whose
relatedness is usually unknown
32
33The Hardy-Weinberg Principle
- The frequencies of alleles and genotypes in a
population will remain constant over time (given
certain assumptions) describing a static, or
non-evolving population - The frequencies of alleles and genotypes can be
described mathematically, where p and q are the
frequencies of the alleles A1 and A2
33
34HW proportions
- Predict frequencies of all genotypes based on
allele frequencies - Provide a quantitative measure of variation among
populations differing in allele frequencies - Provide a measure of within-population,
heterozygosity - Expected heterozygosity (He) is the combined
frequency of all heterozygotes calculated from
allele frequencies
34
35Random mating restores HW proportions each
generation
White et al. 2007
35
36HW and random sampling
- Strictly speaking, Hardy-Weinberg proportions
require certain assumptions, such as - an infinitely large population (translation
sampling with replacement) - mating is at random (translation all possible
pairings of mates is equally likely) - no selection (which biases genotype frequencies)
- no migration (since all alleles must be sampled
from the same pool) - no mutation (which introduces new variants)
- These conditions represent an ideal population
that is rarely (if ever) never fully realized
36
37HW and random sampling
- Minor violations of assumptions usually have
little impact - In practice
- HW proportions apply for many natural populations
- breeding populations are different
- population sizes can be small
- individuals chosen for breeding may represent a
subset of relatives - matings are often non-random
37
38HW Non-random mating
- When individual genotypes do not mate randomly,
then HW proportions are not observed among the
offspring - Well look at two kinds of non-random mating
- population substructure/admixture
- inbreeding (mating among related individuals)
38
39HW Population admixture
- Consider mixing individuals from
non-interbreeding subpopulations (e.g. Offshore
salmon from different runs) - Even if each subpopulation is in HW, the admixed
group is not (p1 ? p2) - The admixed group will appear to have too many
homozygotes - This situation is called Wahlunds effect
Hartl, 2000, Fig. 2.6
39
40Population structure Wahlunds effect
- Larger populations may be subdivided into smaller
groups, which may be difficult to delineate - sub-population can have different allele
frequencies - each sub-population may show HW proportions
- A biologist may unknowingly sample individuals
from different subpopulations and group them
together. What would you observe? - HW proportions in the entire sample, or
- more heterozygous individuals than predicted from
HW expectations, or - more homozygous individuals than predicted from
HW expectations?
40
41Population structure Wahlunds effect
- Wahlunds effect As long as allele frequencies
vary among subpopulations, even if each
subpopulation exhibits HW proportions, then more
homozygotes will be observed than would be
expected based on the allele frequency of the
metapopulation - The relative increase in homozygosity is
proportional to the variance in allele
frequencies among subpopulations, as measured by
F (where 0 F 1). - There are many versions of F, formulated in
different ways. Each is a measure of increased
genetic relatedness
41
42Inbreeding
- Inbreeding (mating among relatives) increases
homozygosity relative to HW - rate is proportional to degree of relationship
- distant cousin lt first cousin lt half-sib lt
full-sib lt self - Recurrent inbreeding leads to a build-up of
homozygosity, and a corresponding reduction in
heterozygosity - Inbreeding affects genotype frequencies, but not
allele frequencies - How does inbreeding affect deleterious recessive
alleles?
42
43Inbreeding and homozygosity
- F reflects a proportional reduction in
heterozygosity, and a build-up of genetic
relatedness. HW implies F 0. With recurrent
selfing, F goes to 1
White et al. 2007, Fig. 5.6
43
44Inbreeding depression
- Inbreeding often leads to reduced vitality
(growth, fitness) - Deleterious recessive alleles are made homozygous
- Outcrossing species are more likely to suffer
higher inbreeding depression
White et al. 2007, Fig. 5.7
44
45Evolutionary forces change allele frequencies
- Mutation ? a random heritable change in the
genetic material (DNA) - ultimate source of all
new alleles - Migration (gene flow) ? the introduction of new
alleles into a population via seeds, pollen, or
vegetative propagules - Random genetic drift ? the random process
whereby some alleles are not included in the next
generation by chance alone - Natural selection ? the differential, non-random
reproductive success of individuals that differ
in hereditary characteristics
45
45
46Mutation
- Heritable changes in DNA sequence alter allele
frequencies as new alleles are formed - Mutations at any one locus are rare, but with
sufficient time, cumulative effects can be large - Mutations are the ultimate source of genetic
variation on which other evolutionary forces act
(e.g., natural selection) - Effects on populations Mutations promote
differentiation (but effects are gradual in the
absence of other evolutionary forces)
46
46
47Gene flow Migration of alleles
- Gene Flow the movement of alleles among
populations - Movement may occur by individuals (via seed) or
gametes (via pollen) - Effects on populations gene flow hinders
differentiation. It is a cohesive force tends to
bind populations together
47
48Genetic drift
- Drift reflects sampling in small populations
- Subgroups follow independent paths
- Allele frequencies vary among subgroups
- Frequencies in the metapopulation remain
relatively stable - How does F behave?
Hartl Jones, 2004.
48
49Random genetic drift
- Genetic bottleneck An extreme form of genetic
drift that occurs when a population is severely
reduced in size such that the surviving
population is no longer genetically
representative of the original population - Effects on populations Drift promotes
differentiation
49
50Natural selection
- Natural selection ? First proposed by Charles
Darwin in mid-1800s. The differential
reproductive success of individuals that differ
in hereditary characteristics - not all offspring survive and reproduce
- some individuals produce more offspring than
others (mortality, disease, bad luck, etc) - offspring differ in hereditary characteristics
affecting their survival (genotype and
reproduction are correlated) - individuals that reproduce pass along their
hereditary characteristics to the next generation - favorable characteristics become more frequent in
successive generations - Effects on populations
- Promotes differentiation between populations that
inhabit dissimilar environments - Hinders differentiation between populations that
inhabit similar environments
50
51Selection Numerical example
White et al. 2007, Table 5.3
51
52Selection Equations
White et al. 2007, Table 5.4
52
53Relative fitness Key considerations
- Which genotype has the largest relative fitness?
- determines the direction in which allele
frequencies will change - Are fitness differences large or small?
- determines rate of change over generationsfast
or slow - What is the fitness of the heterozygote compared
to either homozygote? - reflects dominance
- complete (heterozygote identical to either
homozygote) - no dominance (additive, heterozygote is
intermediate) - partial (heterozygote more closely resembles one
homozygote) - dominance influences how selection sees
heterozygotes - affects rate of change across generations
53
54Natural selection
- Fitness the relative contribution an individual
makes to the gene pool of the next generation
54
55Gene action Additive vs. dominance
Jennifer Kling, OSU
55
56Dominance and rate of change
Hartl, 2000
56
57What if selection is weak or absent?
- Weve already seen that mutation can supply new
variation that selection may act upon - Most mutations are deleterious and are lost, but
rarely, advantageous mutations can occur - What about mutations that cause no effect either
way? - Neutrality theory pertains to alleles that confer
no difference in relative fitnessas if selection
is oblivious to them - Well revisit the behavior of neutral alleles
later on
57
58Measuring population structure
- Generically speaking, population structure
measures the degree to which allele frequencies
vary among subpopulations - This can be thought of in several ways
- variance among subpopulations
- heterozygosity among pairs of alleles drawn at
random - Recall, expected heterozygosity measures
- the frequency of heterozygous genotypes in a HW
population - which equals the frequency of random pairs of
haploid gametes with different alleles - Whenever allele frequencies vary among
subpopulations (regardless of the cause), the
variance in allele frequencies can be measured by
F - Well revisit this in Module 4
F (He Ho)/He
58
59Population genetics A final concept
- Linkage disequilibrium (LD, also called gametic
phase disequilibrium) - ConceptuallyLD is a correlation in allelic state
among loci - Numerically
- expected haplotype (gamete) frequency is the
product of the two allele frequencies, i.e. f(AB)
f(A) x f (B) - if f(AB) f(A) x f (B), then LD 0
- if f(AB) ? f(A) x f (B), then LD ? 0
- LD may arise from factors such as
- recent mutations
- historical selection (hitchhiking effect)
- population admixture
- Recombination causes LD to decay over generations
- LD plays a major role in association genetics.
We will revisit!
59
60A numeric example of LD
- determine allele frequencies
- ask whether f(A) x f(B) f(AB)
- repeat for f(Ab), f(aB), and f(ab)
- linkage disequilibrium (LD) reflects this
difference
Gamete Frequency
Gamete Type (linked)
No LD Higher LD Lower LD
0.42 0.60 0.55
0.28 0.10 0.15
0.18 -- 0.05
0.12 0.30 0.25
f(A) 0.7 f(a) 0.3
f(B) 0.6 f(b) 0.4
Allele Frequencies
60
61Summary Population Genetics
- Population genetics extends Mendelian genetics to
describe how allele and genotype frequencies can
be predicted given certain dynamic population
processes - For populations in Hardy-Weinberg (HW)
proportions, genotype frequencies are easily
calculated given allele frequencies - HW proportions are used as a comparative baseline
- Population genetics questions include
- How much genetic diversity (heterozygosity) are
in populations? - How is genetic diversity distributed?
- What mechanisms have shaped the diversity we
observe? - Our challenge How can we measure, interpret,
and utilize genetic diversity?
61
62Population Diversity
62
63Locus, allele, and allele frequencies
- Locus A fixed position on a chromosome (e.g.,
position of gene or marker) - Allele Variant of a specific locus
- Allele frequency Proportion of a certain allele
within a population
63
64How to define the distribution of genotypes?
- For two alleles we have three
diploid genotypes - Let
- p the frequency of A1 , and
- q the frequency of A2
- (so that p q 1)
- What is the frequency of
64
65Hardy-Weinberg Equilibrium (HWE)
p2 2pq q2 1 (p q)2 1
65
66Assumptions of HWE (or it works when)
- Random mating
- No mutation
- No migration
- No selection
- Infinite population size
- But does it really work?!
66
6767
68Insights provided by HWE
- Genotypes are transient, broken up every
generation, and reconstituted each generation as
zygotes - HW equilibrium Implies that allele and genotype
frequencies are constantthey remain unchanged
across generations - Even if populations with different allele
frequencies are brought together, these
non-equilibrium populations reach equilibrium
in single generation (at least for individual
loci!) - Rare alleles mostly occur as heterozygotes
- HWE also means that genotypic frequencies can be
calculated from allele frequencies, so allele
frequencies alone are sufficient parameters for
population genetic models
68
69Null hypothesis My sample is in HWE
- How do we test this null hypothesis?
- Estimate allele frequencies (p, q)
- Generate expected HW genotypic frequencies
- p2, 2pq, q2
- Compare observed vs. HWE genotypic frequencies
- ?2 goodness-of-fit, G-test (likelihood method),
exact tests (small samples) - Power to reject hypothesis depends on
- Actual difference
- Sample size
- Complexity of model (degrees freedom)
- Programs GENALEX, FSTAT, GenePop, Arlequin,
others
69
70Testing for HWE An example using c2
- Australian aborigine sample
- Assume N 2000
MM MN NN f(M) f(N)
OBS 48 608 1344 0.176 0.824
EXP 62 580 1358
c2 (48-62)2/62 (608-580)2/580
(1344-1358)2/1358 3.161 1.352 0.144 4.66
(significant at Plt 0.05)
70
71How Diversity is Organized
71
72Population structure A deviation from HWE
- F the fixation index
- Another measure of departure from HW
- F is a measure of within population inbreeding
F (He - Ho )/ He
72
73F statistics can be extended to hierarchical
populations
- The hierarchical F statistics (also called
Wrights F statistics) - Provide a way to distinguish within-population
inbreeding from among-population divergence - Provide a measure of the proportionate
distribution of variation (or population
structure)
73
74Partitioning variation in populations Wrights F
- F-statistics can be interpreted in many ways,
e.g. - as a measure of inbreeding
- identity by descent from common ancestor
- identity in allelic state (homozygosity) or not
(heterozygosity) - From a sampling perspective, ask What are the
chances of drawing two alleles having the same
allelic state? - Where the alleles are randomly drawn from either
- Individuals
- Subpopulations
- Total population (metapopulation)
- Individuals
- Subpopulations
- Total
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
74
75Defining Wrights F statistics
- We begin by discussing heterozygosity at
different levels - HI Observed heterozygosity within subpopulations
(similar to Ho above) - HS Expected heterozygosity within subpopulations
(similar to He above) - HT Expected heterozygosity if the combined
population (metapopulation) were random mating.
This would be HT 2pavgqavg (average allele
frequencies over metapopulation)
75
76F statistics are defined in terms of H
- FIS (HS HI)/HS (measuring departures from HW
within subpopulations or local inbreeding) - FST (HT - HS)/HT (measuring departures from HW
due to population differences, which is same as
admixture, or Wahlunds effect) - FIT (HT HI)/HT (includes both local
inbreeding and population structure) - Together, they are related as
- (1 FIS) (1 FST) (1 FIT)
- Of these measures, FIS and FST are the most
meaningful since they partition local inbreeding
vs. population subdivision and describe how
variation is proportioned
76
77Individual to subpopulation Wrights FIS
- A measure of the proportion of variation among
subpopulations - Selfing, mating to relatives, and assortative
mating create "local" deficiency of
heterozygosity (localized inbreeding) - Individual to subpopulation" F
- Scale
- 0 (no inbreeding) to 1 (complete
- inbreeding)
77
78Subpopulation to total Wrights FST
- A measure of the proportion of variation among
populations - Reduction of heterozygosity compared to random
mating - Measure of the probability that two gene copies
chosen at random from different subpopulations
are identical-by-descent. - Scale 0 (heterozygosity identical across
populations) to 1 (populations maximally
different)
78