Title: Mark E. Sorrells and Flavio Breseghello
1Linkage Disequilibrium and Association
MappingIssues Opportunities for the Triticeae
- Mark E. Sorrells and Flavio Breseghello
- Department of Plant Breeding Genetics
- Cornell University
2Overview
- Part I A Genetic Model for Association Mapping
in Plant Breeding Populations - Part II Comparison of Different Plant Breeding
Materials for Association Mapping - Part III Association Mapping of Kernel Size and
Milling Quality in Soft Winter Wheat Cultivars
3A Definition of Association Mapping
- Association analysis, also known as LD mapping
or association mapping, is a population-based
survey used to identify trait-marker
relationships based on linkage disequilibrium - (Flint-Garcia et al. 2003)
4Association Mapping as a Plant Breeding Strategy
AM versus QTL Mapping
- Association Mapping can be conducted directly on
the breeding material, therefore - Direct inference from research to breeding is
possible - Phenotypic variation is observed for most traits
of interest - Marker polymorphism is higher than in biparental
populations - Routine evaluations provide phenotypic data
- Association Mapping provides other useful
information about - Organization of genetic variation
- Polymorphism across the genome
5Association Mapping as a Plant Breeding Strategy
AM versus QTL Mapping
- Type I error (false positives) can be higher
because of - Unaccounted population structure
- Simultaneous selection of combinations of alleles
at different loci - High sampling variance of rare alleles
- Type II error can be higher (low power) because
of - Lower LD than in mapping populations
- Unbalanced design due to differences in allele
frequencies - Serious multiple-testing problem
6A Genetic Model for AM in Plant Breeding
PopulationsAssociation as Conditional
Probabilities
Population genetics theory
(Hedrick 2005)
Gene
Marker
c
Recombination (c) Selection on A or M (w)
New Parent (A,M)
t generations
Pr(A,M)f Pr(a,M)? Pr(a,m)1-f-? Pr(A,m)0
Pr(AM,c,t,f,?,w) Probability of a plant with
marker allele M to have gene allele A, t
generations after the introduction of A
7Recombination x initial frequency of M in the
breeding pool
Freq. new parent f0.05 Relative fitness
w1 Freq. M from original pop ? Freq.
Recombination c
?0
Pr(AM)
A novel marker allele at 10 cM distance can be
more predictive of the QTL allele than one at 1
cM distance that was present in the original pop
at a freq of 0.05
t Generations
8Recombination x selection for M
Freq. new parent f0.05 Relative fitness w
4 (red), 2 (green), 1.25 (blue) Freq. M from
original pop 0 Freq. Recombination c 0.01,
0.05, 0.10
- The generation at which the marker is depleted
depends on the selection intensity applied - The final frequency of A depends on selection
and tightness of linkage between marker and gene.
Pr(AM)
Pr(A)
Generations
9Summary Part I
- In plant breeding populations, the locus most
associated with the trait is not necessarily the
closest locus - Loosely linked markers can still be useful for
MAS if high intensity of selection is applied.
10Overview
- Part I A Genetic Model for Association Mapping
in Plant Breeding Populations - Part II Comparison of Different Plant Breeding
Materials for Association Mapping - Part III Association Mapping of Kernel Size and
Milling Quality in Soft Winter Wheat Cultivars
11Types of Populations
- Germplasm Bank Collection
- A collection of genetic resources including
landraces, exotic material and wild relatives. - Synthetic Populations
- Outcrossing populations (either male-sterile or
manually crossed) synthesized from inbred lines.
May be used for recurrent selection. - Elite Lines
- Inbred lines (and checks) manipulated with the
objective of releasing new varieties in the short
term.
12Characteristics Related to Association
MappingPractical aspects
Aspects of AM Germplasm bank Synthetic Populations Elite Germplasm
Sample Core-collection Segregating progenies Elite lines and checks
Sample turnover Static Ephemeral Gradually substituted
Source of phenotypic data Screenings Progeny tests Yield trials
Type of traits High heritability traits Domestication traits Depends on the evaluation scheme Low heritability traits yield, resistance to abiotic stresses
Type of marker SNP SSR / SNP SSR
13Characteristics Related to Association Mapping
Genetic Expectations
Aspects of AM Germplasm bank Synthetic Populations Elite Germplasm
Linkage Disequilibrium Low Intermediate and fast-decaying High
Population structure Medium Low High
Allele diversity among samples High Intermediate Low
Allele diversity within samples Variable 1 or 2 alleles (diploid species) 1 allele (inbred lines)
14Characteristics Related to Association Mapping
Potential Applications
Aspects Germplasm bank Synthetic Populations Elite Germplasm
Power Low Intermediate and decreasing High could allow genome scan
Resolution High could allow fine mapping Intermediate and increasing Low
Use of significant markers Transfer of new alleles by marker-assisted backcross Incorporation in selection index MAS in progenies (requires validation)
15Summary Part II
- Germplasm bank core-collections could be useful
for allele-mining of candidate genes and
fine-mapped QTLs - Elite lines could be useful to detect genomic
regions associated with traits of interest - Synthetic populations might represent a balance
between power and precision, and have the major
advantage of being unstructured.
16Overview
- Part I A Genetic Model for Association Mapping
in Plant Breeding Populations - Part II Comparison of Different Plant Breeding
Materials for Association Mapping - Part III Association Mapping of Kernel Size and
Milling Quality in Soft Winter Wheat Cultivars
17Previous QTL information
Width 2D
- Doubled-Haploid Population AC Reed x Grandin
- QTL for kernel size (width) near Xwmc18-2D
- Recombinant Inbred Population Synthetic W7984 x
Opata - QTL for kernel size (length) on 5A and 5B
Length 5B
18Plant Material
- 95 cultivars of soft winter wheat from the
Northeast of USA - Mostly recent releases 92gt1990 39gt2000
- Representing 35 seed companies / institutions
- selected from 149 cultivars based on 18 unlinked
SSR markers
19Genotypic Data
- Marker distribution 93 SSR loci
- 33 on chromosome 2D
- 20 on chromosome 5A
- 9 on chromosome 5B
- 31 on 16 other chromosomes
- Data trimming
- rare alleles (freqlt5) were pooled with missing
data, and - considered as missing for LD and population
structure analysis - considered as allele for AM analysis
20Methods Population Structure
- Data 36 unlinked SSR markers
- Program Structure (Pritchard et al., 2000,
Genetics 155 945) - Model without admixture (cultivars discretely
assigned to subpopulations) - Validated subpopulations Resampled subsets of
12, 18, 24 and 30 unlinked loci - Visualization Factorial Correspondence Analysis
(Benzecri, 1973 L' Analyse des correspondances.
Dunod)
21Methods Linkage Disequilibrium
- Statistics r2 , with p-values from 1000
permutations - Program Tassel (maizegenetics.net)
- LD among linked loci
- Scan of entire chromosome 2D
- Scan of pericentromeric region of chromosome 5A
- LD among unlinked loci
- Computed among 36 unlinked loci
22Methods Association Mapping
- Statistical Model Linear mixed-effects model
- marker as fixed effects
- subpopulations as random effects
- Program R package lme (Pinheiro Bates, 2000
Mixed-Effects Models in S and S-PLUS. Springer) - Multiple testing correction 1000 permutations
chromosome-wise - Two-marker models tested by likelihood ratio test
23Population StructureSample Subdivisions
- Subpopulation No. of Varieties Fst
- 19 0.337
- 32 0.111
- 13 0.295
- 31 0.064
- Total 95 0.188
Moderate Population Subdivision
24Population StructureFactorial Correspondence
Analysis
S2
S3
S4
S1
25Population Structure Resampling
Percentage of cultivars assigned to one of 4
subpopulations
Number of unlinked markers used for inference of
population structure
26Linkage DisequilibriumGermplasm Sample Selection
plt.0001
plt.001
plt.01
R2 probability for unlinked SSR markers
- 149 lines genotyped with 18 unlinked SSR markers
- Most similar lines were excluded
- "Normalizing" the sample drastically reduced LD
among unlinked markers
149 lines
95 lines
27Definition of a baseline-LD specific for our
sample
Defined as the 95th percentile of the
distribution of r2 among unlinked loci r2
estimates above this value are probably due to
genetic linkage Baseline LD for this sample r2
0.0654
28Linkage Disequilibrium Chromosome 2D
Consistent LD was below 1 cM
29Linkage Disequilibrium Chromosome 5A
LD extended for 5 cM
30Loci Associated with Kernel Size
(p-values)Chromosome 2D
Agreed with QTL in Reed x Grandin
Kernel Size
Locus Weight Weight Area Area Length Length Width Width
cM Name NY OH NY OH NY OH NY OH
7 Xcfd56 0.069 0.160 0.012 0.119 0.076 0.031 0.000 0.252
11 Xwmc111 0.005 0.020 0.005 0.108 0.003 0.107 0.000 0.000
23 Xgwm261 0.145 0.016 0.019 0.009 0.027 0.009 0.058 0.001
28 Xwmc112 0.012 0.057 0.047 0.120 0.480 0.367 0.001 0.024
64 Xgwm30 0.081 0.862 0.053 0.848 0.312 0.820 0.000 0.212
91 Xgwm539 0.042 0.038 0.030 0.039 0.001 0.005 0.290 0.334
Milling Quality
None of the loci on 2D were significant after
multiple testing correction
31Loci Associated with Kernel Size
(p-values)Chromosome 5A
Agreed with QTL in M6 x Opata
Kernel Size
Locus Weight Weight Area Area Length Length Width Width
cM Name NY OH NY OH NY OH NY OH
55 Xcfa2250 0.021 0.007 0.044 0.014 0.014 0.002 0.637 0.649
55 Xwmc150b 0.002 0.003 0.003 0.005 0.009 0.002 0.093 0.429
56 Xbarc117 0.009 0.002 0.021 0.005 0.118 0.022 0.044 0.039
60 Xbarc141 0.631 0.037 0.232 0.024 0.038 0.002 0.852 0.863
Milling Quality
cM Locus Milling Score Flour Yield ESI Friability Break-Flour Yield
55 Xcfa2250 0.010 0.029 0.047 0.002 0.081
32B.L.U.E. of allele effects Kernel Length
N. of Cultivars 9 5 18 37
9 9 41 45 43 49
33B.L.U.E. of allele effects Kernel Width
N. of Cultivars 41 14 8 15
18 24 5 10 19
34B.L.U.E of allele effects Kernel Weight
N. of Cultivars 41 45 43
49
35Summary Part III
- Linkage Disequilibrium
- LD on chromosome 2D was in the subcentimorgan
scale - LD on chromosome 5A extended for 5 cM, forming an
LD block - Association Mapping
- Loci on chromosome 2D were associated with kernel
width - Loci on chromosome 5A were associated with kernel
length and friability - Favorable and unfavorable marker alleles were
identified - In recurrent selection, markers could be used to
carry information from a good year to a bad
year - In pedigree breeding, markers could carry
information about yield potential from the phase
of replicated field trials to the phase of
singleplant selection
36Acknowledgements
- USDA Soft Wheat Quality Lab, Wooster, OH
- Embrapa
- Technical Support
- David Benscher
- James Tanaka
- Gretchen Salm
37Cornell Small Grains Breeding Genetics Project
James Tanaka
Mike Gifford
David Benscher
Jesse Munkvold
Rob Elshire
Abigail Losh
Grechen Salm