Title: QTL Mapping Using Mx
1QTL Mapping Using Mx
Virginia Institute for Psychiatric and Behavioral
GeneticsVirginia Commonwealth University
2Overview
- Alternative approach
- Linkage as Mixture
- Univariate/Multivariate
- One/more loci
- Practical considerations
- Power
- Pihat vs covs
- Larger Sibships
3Schematic of Genome
QTL
Marker 1
Marker 2
Marker 3
Marker 4
d1
d2
d3
d4
4Genetic Heterogeneity
Sib pairs IBD at a locus, parents AB and CD
AC
AD
BC
BD
AC
2
1
1
0
AD
1
2
0
1
BC
1
0
2
1
BD
0
1
1
2
5Pi hat approach
- 1 Pick a putative QTL location
- 2 Compute p(IBD0) p(IBD1) p(IBD2) given
- marker data Mapmaker/sibs
- 3 Compute p(IBD2) .5p(IBD1)
- 4 Fit model
- Repeat 1-4 as necessary for different locations
B
Elston Stewart
6Major QTL effects
DZ twins
B
.5
1
.25
A1
C1
D1
E1
Q1
Q2
E2
D2
C2
A2
P1
P2
7Normal Theory Likelihood Function
For raw data in Mx
m
ln Li fi ln 3 wj g(xi,ij,Gij)
j1
xi - vector of observed scores on n
subjects ij - vector of predicted means Gij -
matrix of predicted covariances - functions
of parameters
8General Likelihood Function
Things that may differ over subjects
m
ln Li fi ln 3 wij g(xi,ij,Gij)
j1
i 1....n subjects (families)
- Model for Means can differ
- Model for Covariances can differ
- Weights can differ
- Frequencies can differ
9Normal distribution N(ij,Gij)
Likelihood is height of the curve
N
0.5
0.4
0.3
G
0.2
likelihood
0.1
0
0
1
2
3
4
-1
-2
-3
-4
xi
10Weighted mixture of models
Finite mixture distribution
m
ln Li fi ln 3 wij g(xi,ij,Gij)
j1
j 1....m models wij Weight for subject i model
j e.g., Segregation analysis
11Mixture of Normal Distributions
Two normals, propotions w1 w2, different means
g
0.5
0.4
w1 x l1
0.3
0.2
w2 x l2
0.1
0
xi
0
1
2
3
-1
-2
-3
-4
2
1
But Likelihood Ratio not Chi-Squared - what is it?
12Weighted Likelihood Method
- 1 Pick a putative QTL location
- 2 Compute p(IBD0) p(IBD1) p(IBD2) given marker
data - these are "WEIGHTS"
- 3 Compute likelihood of phenotype data under
each of 3 IBD conditions - 4 Maximize weighted likelihood of 3
- Repeat 1-4 as necessary for different locations
13Mixture method
Add them up
.5
.5
1
.25
p(IBD1) x
A1
C1
D1
E1
Q1
Q2
E2
D2
C2
A2
P1
P2
p(IBD2) x
p(IBD0) x
1
0
.5
1
.25
.5
1
.25
A1
C1
D1
E1
Q1
Q2
E2
D2
C2
A2
A1
C1
D1
E1
Q1
Q2
E2
D2
C2
A2
P1
P2
P1
P2
14Dataset structure
Rectangular format
Locus 1 Locus 2
Id sex age P1 P2 IBD0 IBD1 IBD2 IBD0 IBD1 IBD2
1231 1 24 103.5 115.6 .81 .13 .06 .28 .51 .21
1781 0 29 127.4 145.6 .23 .65 .11 .08 .57 .35
1952 1 39 98.5 . .81 .13 .06 .28 .51 .21
2056 1 19 93.5 100.3 . . . .20 .40 .40
Missing data Phenotypes ML
Markers Listwise
15Mx Script
Mixture method
!QTL analysis via Mixture Distribution
method !Using marker1 !Using DZ twins
only !Analysis of LDL !Dutch Adults define nvar
1 !different for multivariate define
nsib 2 !number of
siblings NGroups2
16Mx Script
Mixture part 2
G1 Parameter Estimates Calculation Begin
Matrices X Lower nvar nvar Free !familial
background Z Lower nvar nvar Free !unique
environment L Full 1 1 Free !QTL
effect M Full 1 nvar Free !means H Full 1
1 End Matrices Matrix H .5 Begin Algebra
F XX' !familial variance E
ZZ' !unique environmental
variance Q LL' !variance
due to QTL V FQE !total
variance T FQE !parameters
in one matrix for standardizing S T_at_V
!standardized variance component
estimates End Algebra Labels Row S standest
Labels Col S f2 q2 e2 Labels Row T
unstandest Labels Col T f2 q2 e2 End
17Mx Script
G2 Dizygotic twins include lipiddzmix.dat
Select ibd0m1 ibd1m1 ibd2m1 ldl1 ldl2
Definition ibd0m1 ibd1m1 ibd2m1 Begin Matrices
Group 1 K Full 3 1 !IBD
probabilities (from Merlin) U Unit 3 2 End
Matrices Specify K ibd0m1 ibd1m1 ibd2m1 Means
U_at_M Covariance FQE F _
F FQE _ ! IBD 0
Covariance matrix FQE
F h_at_Q_ Fh_at_Q FQE _
! IBD 1 Covariance matrix
FQE FQ _ FQ
FQE ! IBD 2 Covariance matrix Weights K
! IBD probabilities Start 1
All Start 2.8 M 1 1 1 Option NDecimals3
Option Multiple Issat End
18Mx Script
Mixture part 4
! Test significance of QTL effect Drop L 1 1
1 End
19Output Pihat Method
Summary of VL file data for group 1
Code -3.000 -2.000 -1.000 1.000
2.000 Number 190.000 190.000 190.000
190.000 190.000 Mean 0.234 0.510
0.256 4.927 4.928 Variance 0.104
0.096 0.096 1.092 1.325
MATRIX F This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.898
MATRIX Q This is a FULL matrix of order 1 by
1 1 1 0.540
20Output
QTL Effect Present
Your model has 4 estimated parameters and
950 Observed statistics -2 times
log-likelihood of data gtgtgt 1057.064 Degrees of
freedom gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt 946
QTL Effect Absent
Your model has 3 estimated parameters and
950 Observed statistics -2 times
log-likelihood of data gtgtgt 1059.025 Degrees of
freedom gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt 947
Difference chi-squared 1.961 (1 df)
21Output Pihat Method
QTL Effect Present
Your model has 4 estimated parameters and
950 Observed statistics -2 times log-likelihood
of data gtgtgt 1057.500 Degrees of freedom
gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt 946
QTL Effect Absent
Your model has 3 estimated parameters and
950 Observed statistics -2 times
log-likelihood of data gtgtgt 1059.025 Degrees of
freedom gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt 947
Difference chi-squared 1.525 (1 df)
22Summary
- SEM - QTL direct relationship
- Mx graphical/script approaches
- Mixture vs Pihat
- Multivariate treatment
- Multilocus
- Missing Data
- Ascertainment
23How much more power?
- Large sibships much more powerful
- Dolan et al 1999
- Pihat simple with large sibships
- Solar, Genehunter etc
- Pihat shows substantial bias with missing data
24Expected IBD Frequencies
Sibships of size 2
25Expected IBD Frequencies
Sibships of size 3
26More power in large sibships
Dolan, Neale Boomsma (2000)
Size 2 o Size 3 Size 4
27Number of IBD Combinations
As a function of number of sibs in family
Sibship Size
Number of combinations
2
3
3
10
4
36
5
136
6
528
7
2080
8
7196
28Mixture Approach for Pedigrees
Some ideas
- Iterate configurations within families
- Only use non-zero IBD probabilities
- Set threshold?
- Improves with genotype data
- Allows moderated genotypes
29Strategy 2
- Families within combinations
- Limited of IBD configurations
- Depends on max sibship size
- Usually Faster
- Can do missing data
- Cannot do moderator variables
30Multivariate QTL
Vectors of variables, Matrices of paths Three
component mixture
B
.5
1
.25
Q1
Q2
A2
C2
D2
E2
E1
D1
C1
A1
P1
P2
31Two locus model
B2
B1
1
.25
R1
C1
A1
E1
Q1
Q2
E2
A2
C2
R2
P1
P2
32Two locus model mixture
p(ibd0 R) p(ibd1 R)
p(ibd2 R)
1
.5
0
0
0
0
1
.25
1
.25
1
.25
R1
C1
A1
E1
Q1
Q2
E2
A2
C2
R2
R1
C1
A1
E1
Q1
Q2
E2
A2
C2
R2
R1
C1
A1
E1
Q1
Q2
E2
A2
C2
R2
p(ibd0 Q)
P1
P2
P1
P2
P1
P2
1
.5
0
.5
.5
.5
1
.25
1
.25
1
.25
R1
C1
A1
E1
Q1
Q2
E2
A2
C2
R2
R1
C1
A1
E1
Q1
Q2
E2
A2
C2
R2
R1
C1
A1
E1
Q1
Q2
E2
A2
C2
R2
p(ibd1 Q)
P1
P2
P1
P2
P1
P2
1
.5
0
1
1
1
1
.25
1
.25
1
.25
R1
C1
A1
E1
Q1
Q2
E2
A2
C2
R2
R1
C1
A1
E1
Q1
Q2
E2
A2
C2
R2
R1
C1
A1
E1
Q1
Q2
E2
A2
C2
R2
p(ibd2 Q)
P1
P2
P1
P2
P1
P2
33Multivariate multilocus multipoint
- Eaves Neale Maes 1996
- 10 minutes for 5 phenotypes
- Restart at previous solution
- Only fit null model (q0) once
34Not dead yet
- Latent variable qtls
- Multiple rater
- Comorbidity
- Repeated measures