Title: Statistical Aspect of Clonal Progeny Test
1Statistical Aspect of Clonal Progeny Test(Major
Course Seminar)
- ICAR-Indian Agricultural Statistical Research
Institute -
Shashank kshandakar Ph.D. (Agricultural
Statistics), Roll No.10684 I.A.S.R.I. Library
Avenue, New Delhi-110012
2 Contents
1. Introduction 2. Linear mixed model 3.
Covariance structure 4. Estimation of
parameters 5. Model selection 6. Application of
LMM to Clonal Progeny Testing 7. Illustration 8.
Conclusions 9. References
3 Introduction
- Forest tree breeding is the application of
genetic, reproductive biology and economics
principles for the genetic improvement and
management of forest trees - Tree breeders face several challenges when
studying quantitative traits due to the ontogeny
of these traits - Genetic information about trees are not known
- Long generation time of tree
- Indirect selection is not easy
- Seasonal fluctuations
(Zobel et. al.,1984)
4Introduction
- Traditionally, tree breeding programs have relied
on testing full and half-sib progenies generated
by different mating schemes - Advance breeding programs are increasing the use
of clones of selected genotypes for testing and
selection purposes - Progeny of a single plant obtained by asexual
reproduction is known as clone - Clonal progeny test or clonal test is a method of
estimating the breeding value of a plant by the
performance or phenotype of its clone
5Introduction
- Advantages
- Conserve the heterosis for a long period of time
- Increased uniformity
- Reduction in time to get improved material into
production - Disadvantages
- Higher costs
- Reduced genetic diversity
- Data collected from clonal progeny tests can
become more complex to analyze because measures
can be doubly repeated. First, measures of the
same ramet taken at different moments. Second,
measures from two ramets from the same clone are
actually two repeated measures of the same
genotype
6 Linear Mixed Model
y Xß Zu e y n 1vector of observations
ß p 1vector of fixed effects u q 1
vector of random effects e n 1vector of
random residual effects n number of records or
observation q number of levels for random
effects p number of levels or component for
fixed effects X design matrix of order n p
Z design matrix of order n q ??
?? ?? e ???? ?? ?? Var ?? ?? e
?? ???? ?? ????' ?? ?? ?? ?? ?? G ??
?? ?? ?? ?? R ?? ?? ?? ?? ?? V ZGZ
R
7 Covariance Structures
- Explain the patterns of observed correlation
among the repeated measure data - Variance components are a way to assess the
amount of variation in a dependent variable that
is associated with one or more random effects
variables - Overall model fit and the parameter estimates
along with their standard errors is sensitive to
the covariance structure
(Fitzmaurice et. al., 2004) - Modeling the covariance structures reduces the
number of parameters and can improve model
convergence to an estimate - Covariance is also a component of the genetic
variance estimator
8Compound Symmetry (CS)
- Simplest covariance structure with 2 unknown
parameter - Within clone (Subject) correlated errors presumed
to be the same for each set of times
?? ?? ?? ?? ????' . ?? ?? ??
?? ?? ?? ' ?? ????' . . . . ? ? . ??
?? ??
Unstructured (UN)
?? ???? ?? ?? ???????? . ?? ???? ??
?? ?? ?????? . ?? ???????? . . . .
? ? . ?? ???? ??
- Correlation between residual is
comparatively complex - Appropriate when data is balanced and number of
measurement occasions is relatively small
and unknown parameter are ??(????) ??
9 First Order Autoregressive AR (1)
- Homogeneous variances
- Covariance decline exponentially with distance
- The number of unknown parameter is 2
s2 ?? ?? . ?? ?? ?? ??
??-?? . . . . ? ? . ??
Heterogeneous Compound Symmetry
?? ?? ?? ?? ?? ?? ?? ?? . ?? ?? ??
?? ?? ?? ?? ?? ?? ?? ?? ?? ?? . . .
. ? ? . ?? ?? ??
- Correlation is constant
- The number of unknown parameter r 1
10Estimation of Parameter
- Mixed Model Equations
(Henderson, C.R. 1984) - The BLUP methodology is used to predict breeding
value - For BLUP analysis the pedigree data on selected
parents as well as non-selected contemporaries
are included in the analysis - Models can be extended to more complicated
effects, such as (a) Correlated traits (b)
Interactions between environment and genotype (c)
Heterogeneous variance - ??' ?? -?? ?? ??' ?? -?? ?? ??'
?? -?? ?? ??' ?? -?? ?? ?? -?? ?? ??
??' ?? -?? ?? ??' ?? -?? ?? - ?? ?? ??' ?? -?? ?? ??'
?? -?? ?? ??' ?? -?? ?? ?? ' ?? -?? ?? ?? -??
-?? ??' ?? -?? ?? ??' ?? -?? ?? - ?? (?? ?? -?? ??) - ?? ?? -?? ??) ??
GZ ?? -?? (y-X ?? )
11Estimation of parameter
- Maximum Likelihood Method
- Maximum likelihood estimates of the variance
components can be obtained by maximizing the
log-likelihood with respect to each parameter - This means find the values of fixed, random and
residual effects that maximize the likelihood
function over the parameter space - y Xß Zu e
- E(y) Xß Var (y) V ZGZ R
- y N (Xß , ZGZ R)
- Likelihood function is then
- ??(2? ) - 1 2 ?? ?? - ?? ?? exp - ?? ??
(y-Xß)V-1(y-Xß)
12Estimation of Parameter
- Restricted Maximum Likelihood Method
- y Xß Zu e
- Ly LXß LZu Le (LX 0)
- Restrict the data to N - p modified observations,
which are independent of ß then maximize the
likelihood of these restricted modified
observations - REML corrects the bias associated with maximum
likelihood estimates by taking into account the
degrees of freedom used for estimating the fixed
effects - Less numerically intensive than the Maximum
Likelihood Method
13Estimation of Parameter
- An ExpectationMaximization (EM) Algorithm is an
iterative method for finding Maximum Likelihood
estimates of parameters in statistical models,
where the model depends on unobserved (latent)
variables - The EM iteration alternates between performing an
expectation (E) step and maximization (M) step - Expectation (E) step - creates a function for the
expectation of the log likelihood evaluated using
the current estimate for the parameters - Maximization (M) step- computes parameters
maximizing the expected log-likelihood found on
the E step - Iterate steps E and M until convergence
14Estimation of Parameter
- NewtonRaphson procedure
- x f(x)
- x1 x0 ??( ?? 0 ) ??( ?? 0 )
- Then use x1 in place of x0, to obtain a new
update x2 - xn1 xn ??( ?? ?? ) ??( ?? ??1 )
- The process is repeated until a convergence
criterion is reached i.e., xn1 xn d - A NewtonRaphson optimization algorithm is
usually preferred to EM to obtain the ML or REML
estimates of G and R - A disadvantage of EM is that its rate of
convergence can be extremely slow if a lot of
data are missing - (Lindstrom and Bates 1988)
15Model Selection
- Likelihood Ratio Test
- Model selection is the task of selecting
a statistical model from a set of candidate
models for a given data - Let ?? ?? Reduced model and ?? ????
full model be the maximum value of the
likelihood of the data with and without the
additional assumption about parameters
restriction - H0 reduced model is true HA full model is
true - ? ?? ?? ?? (????) LRTS - 2??????
?? ?? 2 (0?1) - Computes ???????? and rejects the assumption
if ???????? is larger than a Chi-Square
percentile with q degrees of freedom
16Model Selection
- How well some specified model fits to the data?
Various Information Criteria cannot tell
anything about the quality of the model in an
absolute sense - Akaikes Information Criteria (AIC)
- AIC -2 ln(L) 2p
- L is the likelihood function
- p the number of free parameters to be estimated
- N is the number of observation
- Bayesian Information Criterion (BIC)
- BIC - 2ln(L) p ln(N)
- HannanQuinn Information Criterion (HQC)
- HQC - 2ln (L) p lnln(N)
17Layout of Clonal Progeny Trials
Family Clone Block
1 2 3 4 r
11 Y111 Y112 Y113 Y114 Y11r
1 12 Y121 Y122 Y123 Y124 Y12r
1n Y1n1 Y1n2 Y1n3 Y1n4 Y1nr
21 Y211 Y212 Y213 Y214 Y21r
2 22 Y221 Y222 Y223 Y224 Y22r
2n Y2n1 Y2n2 Y2n3 Y2n4 Y2nr
G1 Yg11 Yg12 Yg13 Yg14 Yg1r
G G2 Yg21 Yg22 Yg23 Yg24 Yg2r
Gn Ygn1 Ygn2 Ygn3 Ygn4 Ygnr
18Application of LMM to Clonal Testing
- Planting more than one ramet from the same
genotype in the same trial generates correlated
residual effects from different blocks - Linear mixed models (LMM) methodology that is
suitable for the statistical and genetic analyses
of spatially repeated measures collected from
clonal progeny tests - The variance component estimation and the
heterogeneity of the clones propagated within
blocks are of primary interest - Most commonly assumed covariance structures are
Compound symmetry (CS), First-order
autoregressive AR (1) and Unstructured (UN)
(Negash et.al., 2014)
19Layout of Clonal Progeny Trials
Family Clone Block
1 2 3 4 r
11 Y111 Y112 Y113 Y114 Y11r
1 12 Y121 Y122 Y123 Y124 Y12r
1n Y1n1 Y1n2 Y1n3 Y1n4 Y1nr
21 Y211 Y212 Y213 Y214 Y21r
2 22 Y221 Y222 Y223 Y224 Y22r
2n Y2n1 Y2n2 Y2n3 Y2n4 Y2nr
G1 Yg11 Yg12 Yg13 Yg14 Yg1r
G G2 Yg21 Yg22 Yg23 Yg24 Yg2r
Gn Ygn1 Ygn2 Ygn3 Ygn4 Ygnr
20Clonal Progeny Test
- The general linear mixed model that corresponds
to this clonal progeny test is - yijk µ fi cij Bk Iik eijk
- y µ1N Xrßr Zgvg ZnvnZivi e
- where, y is the N1 vector of observations
- Xr and ßr are the known Nr coefficient matrix
and r1 vector of fixed effects respectively - Zg, Zn and ZI are the Ng, N(gn), and N(gr)
coefficient matrices for the random effects
respectively - ?g, ?n and ?I are the vectors of random family,
clone, and interaction effects, respectively - e is the N1 vector of random errors
- Ngnr
(Zamudio et.al.,2008)
21Variance Formulation and Modeling
- The variance of the vector of random effect is
represented as - Var (v) ?????? ( ?? ?? ) ?????? (
?? ??, ?? ?? ) ?????? ( ?? ??,
?? ?? ) ?????? ( ?? ??, ?? ?? ) ?????? ( ??
?? ) ?????? ( ?? ??, ?? ?? ) ?????? ( ??
??, ?? ?? ) ?????? ( ?? ??, ?? ?? )
?????? ( ?? ?? ) - ?? ?? ?? ?? ?? ??
?? ?? ?? ?? ?? ( ?? ?? ? ?? ?? ) ??
?? ?? ?? ?? ?? ( ??
?? ? ?? ?? ) - The following assumptions are involved in the
variance formulation are - Cov (fi, fi) 0
- Cov (cij, cij') Cov (cij, cij)0
- Cov (Iik, Iik) Cov (Iik, Iik') 0
22Variance Formulation and Modeling
- Var(e) Var ?? ???? ?? ???? ? ?? ????
? ?? ???? ?? ???? ? ?? ???? ?
?? ?? ?? ? ?? ? ? ? ?? ?? ?? ??
? ? ? ?? ?? ?? ?? ? ? ? ?? ?? ? ?
? ?? ?? ? ?? ?? ? ? ? ?? ?? ? ??
?? ?? ? ? ? ?? ?? ?? ??
? ? ? ?? ?? ?? ?? ? ? ? ??
?? ? ?? ?? ? ? ?? ?? ? ?? - Var( ?? ???? ) (Ig ? In ? ?e)
- where ?? ???? ?? ????1 ?? ????2 ??
?????? is the vector of residual effects
measured for subject(clone) ij and Se is the
variancecovariance matrix of eij, i.e., residual
effects are correlated within clones
23Variance Formulation and Modeling
- Se must be modeled to estimate variance and
covariance components which can be used in any
further genetic analysis -
- Cov(eij, eij) Cov(eij, eij)0
- E(y) µ Bk
- Var (y)
- Zg Zn ZI ?? ?? ?? ?? ?? ?? ??
?? ?? ?? ?? ( ?? ?? ? ?? ?? ) ?? ??
?? ?? ?? ?? ( ?? ??
? ?? ?? ) ?? ?? ' ?? ?? ' ?? ?? ' - (Ig ? In ? ?e)
-
24Variance Formulation and Modeling
- The variancecovariance matrix for any vector of
measures for clone ij is - Var(yij) Var ?? ????1 ?? ????2 ? ??
?????? ?????? (?? ????1) ??????( ??
????1 ,?? ????2 ) ??????( ?? ????1 ,?? ????2
) . ?????? (?? ????2) ??????( ?? ????1
,?? ????2 ) . . . . ? ? . ?????? (??
??????) - Elements along the diagonal are the total
variance for each clone is - Var (yijk) Var (µ fi cij Bk Iik eijk
) - ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
?? ?? - where, ?? ?? 2 , ?? ?? 2 , ?? ?? 2 and ??
?? 2 are the variances of family, clone within
family, family-by-block, and residual effects,
respectively.
25Variance Formulation and Modeling
The covariance matrix between vectors of two
clone from the same family but different block
?????? ?? ???? ,?? ????'
??????( ?? ?????? ,?? ????'?? ) ??????( ??
?????? ,?? ????'?? ) ??????( ?? ?????? ,??
????'?? ) . ??????( ?? ????'?? ,?? ????'?? ) ?
??????( ?? ?????? ,?? ????'?? ) . . . . ?
? . ??????( ?? ?????? ,?? ????'?? ) Cov
(Yijk ,Yijk) Cov( µ fi cijBk Iik
eijk ),( µfi cijBk Iik eijk)
?? ?? ?? ?? ?? ?? ?? ????' where
Cov(eijk, eijk')see' is the covariance between
residuals of the same clone in two blocks.
26Genotypic Variance
The observational components of variation from
LMM of clonal progeny test are better depicted by
calculating different covariances between two
vegetative copies of the ij-th clone Yijk
gij Esijk where, yijk - Phenotypic value
gij is the genotypic value and Esijk is the
specific environmental effect associated with the
kth block The covariance between two ramets of
the ij-th genotype, planted in blocks kth and
kth is Cov (Yijk ,Yijk) e ( gij Esijk ) (
gij Esijk ) e ( g2ij gijEsijk gij
Esijk Esijk Esijk) VG ?? ?? ?? ?? ??
?? ?? ????'
27Variance Formulation and Modeling
The covariances between from two different clones
from the same family and same block Cov (Yijk ,
Yijk) Cov( µ fi cij Bk Iik eijk , µ
fi cij Bk Iik eijk) ?? ?? ?? ??
?? ?? The covariance between two different
clones from the same family but different
blocks Cov (Yijk ,Yijk) Cov( µ fi cij Bk
Iik eijk , µ fi cij Bk Iik
eijk) ?? ?? ?? The covariance
matrix between vectors of two clone from
different families is zero, i.e., Cov (Yijk ,
Yijk) Cov (Yijk , Yijk)0
28Genotypic Variance
P G E G x E GADI P is
the phenotypic value G is the genotypic value E
is the environmental deviation and GE is the
genotype by environment interaction A is the
sum of the breeding values of all loci that
contribute to the character D is the sum of
dominance deviations within individual loci I
is the interaction or epistatic deviation between
loci.
VP V G V E VG x E
VG VA VD VI VP VA VD VI
V E VG x E
29Estimation of Variance Components
The covariance between two different individuals
(clones) from the same family planted in the same
block is expressed in terms of genetic components
is For two full sibs Cov (Yijk ,Yijk) 1/2VA
1/4VD VEc
For two half sibs Cov (Yijk ,Yijk) 1/4VA
VEc (Zamudio et. al., 2008)
where, VA and VD are the additive and dominance
variances, respectively, and VEc is the variance
of the common microenvironment (block) effects.
30Estimation of Variance Components
The covariance between two different clones from
the same family planted in different blocks is
expressed in terms of genetic components is For
two full sibs Cov (Yijk ,Yijk) 1/2VA 1/4VD
For two half sibs Cov (Yijk
,Yijk) 1/4VA (Zamudio et al., 2008)
Where, VA and VD are the additive and dominance
variances, respectively
31Estimation of Variance Components
Cov (Yijk ,Yi'jk) ?? ?? ?? ?? ?? ??
Cov (Yijk ,Yijk) ?? ?? ?? Half sibs, the
comparison of the same expressions will give us
the following direct estimation ?? ?? ?? ??
?? ?? ?? ?? ?? ?? ?? ?? ?? ????
Full sib families, the comparison of the same
expressions will give us the following direct
estimation ?? ?? ?? ?? ?? ?? ?? ?? ??
?? ?? ?? ?? ?? ?? ?? ?? ????
Regardless of the type of pedigree, the observed
variance of the family-by-block effects will be
an estimator of the variance of common
microenvironment effects
32Estimation of Variance Components
The environmental effect has two components, the
specific microenvironment effect i.e., within the
block (ESij) and the common microenvironment
effect i.e., between to the block (ECk) VE VEs
VEc Yijk gij Eck
EEs gij Eck gij Ees Assuming negligible
genotype-by-microenvironment variances, we have
the following estimation of the phenotypic
variance VG VEs VEc ?? ?? 2 ?? ?? 2
?? ?? 2 ?? ?? 2 Replacing VG and VEc by
their estimators, we have ?? ?? 2 ?? ?? 2
?? ????' VEs ?? ?? 2 ?? ?? 2 ?? ??
2 ?? ?? 2 ?? ?? 2 ?? ????' 2 VEs
?? ?? 2 VEs ?? ?? 2 - ?? ????'
33Heritability
- Genetic parameters are important to understand
the inheritance pattern of traits - Make predictions of response to selection
strategies - Precise rankings of outstanding genotypes
- Heritability
- h2 ?? ?? ?? ?? H2 ?? ?? ?? ??
- Provides an idea to the extent of genetic control
for expression of a particular trait - Reliability of phenotype in predicting its
breeding value - High heritability indicates less environmental
influence in the observed variation
34 Example
Family Clone Height Height Diameter Diameter
Family Clone Block1 Block2 Block1 Block2
1 1 49.63 51.35 0.68 0.46
2 51.13 41.64 0.29 0.42
3 45.00 42.95 0.44 0.29
2 1 47.48 50.41 0.33 0.21
2 51.65 42.75 0.44 0.52
3 43.61 51.52 0.61 0.32
3 1 47.23 45.35 0.50 0.32
2 50.14 43.84 0.37 0.57
3 42.01 41.30 0.42 0.44
4 1 48.14 50.24 0.61 0.50
2 46.81 50.21 0.23 0.68
3 49.52 49.41 0.67 0.28
5 1 51.45 46.50 0.33 0.36
2 46.97 48.61 0.65 0.22
3 42.12 49.43 0.65 0.61
- Tree breeding is the application of genetic,
reproductive biology and economics principles for
the genetic improvement and management of forest
trees - Tree breeders face several challenges when
studying quantitative traits due to the ontogeny
of these traits - Scarcity of Basic genetic information about trees
- Long generation time of tree
- Indirect selection is not easy
- Seasonal fluctuations
35Information Criteria Likelihood Ratio Tests of
Covariance Structures
CS (REML) CS(ML) UN(REML) UN(ML)
-2ln? 154.1 157.5 154 157.1
AIC 162.1 169.5 164 169.1
AICC 163.9 173.1 166.7 172.8
HQIC 157.9 163.2 158.7 162.8
BIC 160.6 167.1 162 166.8
CAIC 164.6 173.1 167 172.8
Diameter CS(REML) CS(ML) UN (REML) UN(ML)
-2ln? -32.3 -22.8 -22.8 -32.3
AIC -22.3 -16.8 -14.8 -20.3
AICC -19.8 -15.8 -13.1 -16.7
HQIC -27.5 -19.9 -19 -26.6
BIC -24.2 -18 -16.4 -22.7
CAIC -19.2 -15 -12.4 -16.7
36Variance Components Genetic Parameters Variance Components Genetic Parameters Variance Components Genetic Parameters
Diameter Height
?? ?? ?? 2.38E-20 0.7204
?? ?? ?? 0.000125 0.2344
?? ?? ?? 0 0
see' 0.00456 2.4974
?? ?? ?? 0.02488 13.7124
VG 0.004685 3.4522
VP 0.025005 14.6672
H2 0.187363 0.235369
VEs 0.02032 11.215
VEc 0 0
?? ?? ?? /VP 9.52E-19 0.049116
?? ?? ?? /VP 0.004999 0.015981
VEs/VP 0.812637 0.764631
VEc/VP 0 0
37 Conclusions
- Correct estimates of genetic parameters for
traits of importance are needed prior to any
genetic evaluation program - Heritability provides an idea to the extent of
genetic control for expression of a particular
trait and the reliability of phenotype in
predicting its breeding value - High heritability indicates less environmental
influence in the observed variation - Estimation of genetic correlation (juvenile
mature plant) is used to evaluate the possibility
to conduct early selection
38 Conclusions
- In clonal progeny test (where most of the traits
are correlated or heterogeneous variance among
residual) covariance is also a component of the
genetic variance estimator and plays a
significant role in accurate estimated of genetic
parameter - Linear mixed model based on ML or REML
approximation for modeling of covariance
structure of repeated measure data can improve
ability to analyze repeated measures data
and suitable for the statistical and genetic
analyses - Linear mixed model methodology not only permits
the presence of heterogeneity of variance in the
linear model but also allows addressing directly
the covariance structure by providing valid
standard errors.
39References
Apiolaza, L.A. and Garrick, D.J. (2001).
Analysis of Longitudinal Data from Progeny Tests
Some Multivariate Approaches. Forest Science,
47(2) 129-140. Callister, A. N. and Collins, S.
L. (2008). Genetic parameter estimates in a
clonally replicated progeny test of teak. Tree
Genetics and Genomes, 4 237245. Clifford, P.
Dutilleul, P. Richardson, S. and Hemon, D.
(1989). Assessing the significance of the
correlation between two spatial processes.
Biometrics, 45 123-134. Falconer, D. S. and
Mackay, T.F.C. (1996). Introduction to
quantitative genetics. Longman Science and
Technology, Harlow, United Kingdom.
40References
Fitzmaurice, G. M. Laird, N. M. and Ware, James
H. (2004). Applied Longitudinal Analysis. John
Wiley Sons, Hoboken, New Jersey. Henderson,
C.R. (1984). Applications of linear models in
animal breeding. University of Guelph. Holland,
J. (2006). Estimating genotypic correlations and
their standard errors using multivariate
restricted maximum likelihood estimation with SAS
Proc MIXED. Crop Science. 46 642654. Ismaili,
A., Karami, F., Akbarpour, O., Nejad, A. R.
(2016). Estimation of genotypic correlation and
heritability of apricot traits, using restricted
maximum likelihood in repeated measures data.
Canadian Journal of Plant Science, 96(3) 439-447.
41References
Lindstrom, M. and Bates, D. (1988).
Newton-Raphson and EM Algorithms for Linear
Mixed-Effects Models for Repeated-Measures Data.
Journal of the American Statistical Association,
83(404) 1014-1022. Meredith, M. P. and Stehman,
S. V. (1991). Repeated measures experiments in
forestry focus on analysis of response curves.
Canadian Journal of Forest Research, 21(7)
957-965. Narain, P. (1990). Statistical genetics.
Wiley, New York. Negash A.W., Mwambi, H.,
Zewotir, T. and Aweke, G. (2014). Mixed model
with spatial variancecovariance structure for
accommodating of local stationary trend and its
influence on multi-environmental crop variety
trial assessment. Spanish Journal of Agricultural
Research, 14(3) 195-205.
42References
Searle, S.R. Casella, G. and McCulloch, C.E.
(1992). Variance components. Wiley, New
York. Wolfinger, R.D. (1996). Heterogeneous
variancecovariance structures for repeated
measures. Journal of Agricultural, Biological and
Environmental Statistics, 1(2)127. Zamudio, F.,
Rozenberg, P., Baettig, R., Vergara, A. Yanez M.
and Gantz, C. (2005). Genetic variation of wood
density components in a radiate pine progeny test
located in the south of Chile. Annals of forest
Science, 62(2)105114. Zamudio, F., Wolfinger,
R., Stanton, B., Guerra, F. (2008). The use of
linear mixed model theory for the genetic
analysis of repeated measures from clonal tests
of forest trees. I. A focus on spatially repeated
data. Tree Genetics and Genomes, 4
299313. Zobel, B. and Talbert. J. (1984).
Applied forest tree improvement. Wiley, New York.
43(No Transcript)