Summarizing Variation - PowerPoint PPT Presentation

About This Presentation
Title:

Summarizing Variation

Description:

Summarizing Variation. Michael C Neale PhD. Virginia Institute for ... Pascal's friendChevalier de Mere 1654; Huygens 1657; Cardan 1501-1576. 1. 1 1. 1 2 1 ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 49
Provided by: ibgcol
Category:

less

Transcript and Presenter's Notes

Title: Summarizing Variation


1
Summarizing Variation
Michael C Neale PhDVirginia Institute for
Psychiatric and Behavioral GeneticsVirginia
Commonwealth University
2
Overview
  • Mean
  • Variance
  • Covariance
  • Not always necessary/desirable

3
Computing Mean
  • Formula E(xi)/N
  • Can compute with
  • Pencil
  • Calculator
  • SAS
  • SPSS
  • Mx

4
One Coin toss
2 outcomes
Probability
0.6
0.5
0.4
0.3
0.2
0.1
0
Heads
Tails
Outcome
5
Two Coin toss
3 outcomes
Probability
0.6
0.5
0.4
0.3
0.2
0.1
0
HH
HT/TH
TT
Outcome
6
Four Coin toss
5 outcomes
Probability
0.4
0.3
0.2
0.1
0
HHHH
HHHT
HHTT
HTTT
TTTT
Outcome
7
Ten Coin toss
9 outcomes
Probability
0.3
0.25
0.2
0.15
0.1
0.05
0





Outcome
8
Pascal's Triangle
Probability
Frequency
1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 1 6
15 20 15 6 1 1 7 21 35 35 21 7 1
1/1 1/2 1/4 1/8 1/16 1/32 1/64 1/128
Pascal's friendChevalier de Mere 1654 Huygens
1657 Cardan 1501-1576
9
Fort Knox Toss

Infinite outcomes
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
4
-1
-2
-3
-4
Heads-Tails
Series 1
Gauss 1827
10
Variance
  • Measure of Spread
  • Easily calculated
  • Individual differences

11
Average squared deviation
Normal distribution

xi
di
0
1
2
3
-1
-2
-3
Variance G di2/N
12
Measuring Variation
Weighs Means
  • Absolute differences?
  • Squared differences?
  • Absolute cubed?
  • Squared squared?

13
Measuring Variation
Ways Means
  • Squared differences

Fisher (1922) Squared has minimum variance under
normal distribution
14
Covariance
  • Measure of association between two variables
  • Closely related to variance
  • Useful to partition variance

15
Deviations in two dimensions
x














y



















16
Deviations in two dimensions
x
dx

dy
y
17
Measuring Covariation
Area of a rectangle
  • A square, perimeter 4
  • Area 1

1
1
18
Measuring Covariation
Area of a rectangle
  • A skinny rectangle, perimeter 4
  • Area .251.75 .4385

.25
1.75
19
Measuring Covariation
Area of a rectangle
  • Points can contribute negatively
  • Area -.251.75 -.4385

1.75
-.25
20
Measuring Covariation
Covariance Formula
F E(xi - x)(yi - y)
xy
(N-1)
21
Correlation
  • Standardized covariance
  • Lies between -1 and 1

r F
xy
xy
2
2
F F
y
x
22
Summary
Formulae
(Exi)/N
Fx E(xi - )/(N-1)
2
2
Fxy E(xi-x)(yi-y)/(N-1)
r F
xy
xy
2
2
F F
y
x
23
Variance covariance matrix
Several variables
Var(X) Cov(X,Y) Cov(X,Z) Cov(X,Y)
Var(Y) Cov(Y,Z) Cov(X,Z) Cov(Y,Z) Var(Z)
24
Conclusion
  • Means and covariances
  • Conceptual underpinning
  • Easy to compute
  • Can use raw data instead

25
Biometrical Model of QTL
m
d
a
-
a
26
Biometrical model for QTL
Diallelic locus A/a with p as frequency of a
27
Classical Twin Studies
Information and analysis
  • Summary rmz rdz
  • Basic model A C E
  • rmz A C
  • rdz .5A C
  • var A C E
  • Solve equations

28
Contributions to Variance
Single genetic locus
  • Additive QTL variance
  • VA 2p(1-p) a - d(2p-1) 2
  • Dominance QTL variance
  • VD 4p2 (1-p)2 d2
  • Total Genetic Variance due to locus
  • VQ VA VD

29
Origin of Expectations
Regression model
  • P aA cC eE
  • Standardize A C E
  • VP a2 c2 e2
  • Assumes A C E independent

30
Path analysis
Elements of a path diagram
  • Two sorts of variable
  • Observed, in boxes
  • Latent, in circles
  • Two sorts of path
  • Causal (regression), one-headed
  • Correlational, two-headed

31
Rules of path analysis
  • Trace path chains between variables
  • Chains are traced backwards, then forwards, with
    one change of direction at a double headed arrow
  • Predicted covariance due to a chain is the
    product of its paths
  • Predicted total covariance is sum of covariance
    due to all possible chains

32
ACE model
MZ twins reared together
33
ACE model
DZ twins reared together
34
ACE model
DZ twins reared apart
35
Model fitting
  • Takes care of replicate statistics
  • Maximum likelihood estimates
  • Confidence intervals on parameters
  • Overall fit of model
  • Comparison of nested models

36
Fitting models to covariance matrices
  • MZ covariances
  • 3 statistics V1 CMZ V2
  • DZ covariances
  • 3 statistics V1 CDZ V2
  • Parameters a c e
  • Df nstat - npar 6 - 3 3

37
Model fitting to covariance matrices
  • Inherently compares fit to saturated model
  • Difference in fit between A C E model and A E
    model gives likelihood ratio test with df
    difference in number of parameters

38
Confidence intervals
  • Two basic forms
  • covariance matrix of parameters
  • likelihood curve
  • Likelihood-based has some nice properties
    squares of CIs on a give CI's on a2 Meeker
    Escobar 1995 Neale Miller, Behav Genet 1997

39
Multivariate analysis
  • Comorbidity
  • Partition into relevant components
  • Explicit models
  • One disorder or two or three
  • Longitudinal data analysis
  • Partition into new/old
  • Explicit models
  • Markov
  • Growth curves

40
Cholesky Decomposition
Not a model
  • Provides a way to model covariance matrices
  • Always fits perfectly
  • Doesn't predict much else

41
Perverse Universe
A
E
.7
.7
P
NOT!
42
Perverse Universe
A
E
.7
.7
.7
-.7
X
Y
r(X,Y)0 Problem for almost any multivariate
method
43
Analysis of raw data
  • Awesome treatment of missing values
  • More flexible modeling
  • Moderator variables
  • Correction for ascertainment
  • Modeling of means
  • QTL analysis

44
Technicolor Likelihood Function
For raw data in Mx
m
ln Li fi 3 ln wj g(xi,ij,Gij)
j1
xi - vector of observed scores on n
subjects ij - vector of predicted means Gij -
matrix of predicted covariances - functions
of parameters
45
Pihat Linkage Model for Siblings
Each sib pair i has different COVARIANCE
46
Mixture distribution model
Each sib pair i has different set of WEIGHTS
rQ.0
rQ1
rQ.5
weightj x Likelihood under model j
p(IBD2) x P(LDL1 LDL2 rQ 1 )
p(IBD1) x P(LDL1 LDL2 rQ .5 )
p(IBD0) x P(LDL1 LDL2 rQ 0 ) Total
likelihood is product of weighted likelihoods
47
Conclusion
  • Model fitting has a number of advantages
  • Raw data can be analysed with greater flexibility
  • Not limited to continuous normally distributed
    variables

48
Conclusion II
  • Data analysis requires creative application of
    methods
  • Canned analyses are of limited use
  • Try to answer the question!
Write a Comment
User Comments (0)
About PowerShow.com