BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS - PowerPoint PPT Presentation

About This Presentation
Title:

BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS

Description:

... Schull (1954) data on opalescent dentine in random sample of 112 offspring ... For the opalescent dentine data, ignoring the constant term involving n, r which ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 16
Provided by: pamm150
Category:

less

Transcript and Presenter's Notes

Title: BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS


1
BOULDER WORKSHOPSTATISTICS REVIEWED LIKELIHOOD
MODELS
  • Andrew C. Heath

2
PRE-HISTORY (STATISTICS 101)
  • Binomial distribution gives the probabilities
    of various possible numbers of successful
    outcomes in a fixed number of discrete trials,
    where all trials have the same probability of
    success.
  • Probability that X is equal to a particular value
    x (x 0, 1, 2n) is given by
  • Useful for genetics (e.g. transmission versus
    non-transmission of an allele)!

3
LIKELIHOOD
  • Focus on probability (likelihood) of the data,
    as a function of model parameters.
  • E.g. Sham (1998) uses Neel Schull (1954) data
    on opalescent dentine in random sample of 112
    offspring of an affected parent, found 52
    affected, 60 normal. Compatible with hypothesis
    of rare autosomal dominant? Does the observed
    population (.464) differ from expected proportion
    of 0.5?
  • Likelihood function for segregation ratio p is

4
MAXIMUM LIKELIHOOD ESTIMATION
  • Find the maximum value of the likelihood
    function in the range 0 ? p ?1. In more difficult
    problems, usual to maximize the log-likelihood,
    since computationally more convenient, this
    also maximizes the likelihood.
  • In this simple case, maximum likelihood estimate
    (MLE) of p, r/n.
  • For the opalescent dentine data, ignoring the
    constant term involving n, r which does not
    vary as a function of p, log-likelihood function
    is
  • ln L(p) 52 ln(p) 60 ln(1p)

5
Figure 2.1 Log-likelihood function of the
segregation ratio for the opalescent dentine data
(from Sham, 1998)
p
6
LIKELIHOOD-RATIO STATISTIC
  • Likelihood ratio statistic twice the difference
    between the log likelihood of the data at the MLE
    (i.e. ), L1, and the log likelihood of the
    data at the hypothesized value of 0.5, L0 2(ln
    L1 ln L0).
  • For the segregation ratio example,
  • 0.57 for the opalescent dentine example.
  • Likelihood-ratio statistic in this case is
    distributed as chi-square on one degree of
    freedom, hence non-significant.

7
MATRIX ALGEBRA BASICS
  • A1 Inverse of A
  • AT or A' Transpose of A
  • A Determinant of A
  • A ? B A postmultiplied by B
  • r x c c x r conformable for multiplication
    since number of columns of A number of
    rows of B. Resulting matrix is r x r.
  • Tr (A) Trace of matrix A

8
HISTORY (MX INTRODUCTORY WORKSHOP)
  • Maximum-likelihood estimation using linear
    covariance structure models, e.g. fitting models
    to twin data
  • Let p be the number of observed variables, the
    expected covariance matrix be E, and the expected
    vector of means be ?, where E and ? are functions
    of q free parameters to be estimated from the
    data. Let x1, x2xn denote to observed variables.
    Assuming that the observed variables follow a
    multivariate normal distribution, the
    log-likelihood of the observed data is given by
  • This is the formula used for maximum likelihood
    model-fitting to raw continuous data, assuming a
    multivariate normal distribution. (Often, ?2 ln
    L, is estimated). Requires that we provide
  • (a) model for expected covariance matrix e.g.
    in terms of additive genetic, shared and
    non-shared environmental variance components
    that will vary as a function of relationship
  • (b) model for expected means in simplest
    applications we might estimate a separate mean
    that might differ by gender, or possibly by twin
    pair zygosity.

9
EXAMPLE
  • MZ Pairs
  • E
  • ? m , m T
  • DZ Pairs
  • E
  • ? m , mT
  • Where m is estimate of population mean, VA, VC,
    VE and are additive genetic, shared environmental
    and non-shared environmental variances, all
    estimated jointly from the data.
  • Compare e.g. ln L1 for VA VC VE model
  • ln L0 for VC VE model
  • 2(ln L1 ln L0) distributed as chi-square on one
    degree of freedom as before.

10
HISTORY (MX INTRODUCTORY WORKSHOP) - II
  • In practice most applications in the MX
    Introductory Workshop fitted models to summary
    covariance matrices.
  • We can simplify (e.g. Sham, p. 238)
  • (1)



  • (2)
  • Where m is the vector of sample means of the p
    observed variables, S is the observed covariance
    matrix. For the simple applications in the
    Introductory Workshop, we had no predictions for
    the means structure, so we can saturate that
    component of the model (i.e. estimate a separate
    mean for every observed mean), equivalent to
    deleting the term (m??)T E-1 (m??) in (2). Thus
    the log-likelihood of the observed data becomes



  • (3)
  • Under a saturated model, which equates every
    element of E to the corresponding element of S
    (i.e. a perfect fit model) we have for the
    log-likelihood

11
HISTORY (MX INTRODUCTORY WORKSHOP) - II
  • Thus the likelihood-radio test of the fitted
    model against the saturated model becomes
  • (4)
  • For multiple group problem, sum over groups.

12
Analysis of Australian BMI data-young female MZ
twins pairs-MX DYI version
  • Analysis of Australian BMI data-young female MZ
    twins pairs-MX DIY version
  • !
  • DA NG1 NI2 NO0
  • begin matrices
  • A LO 1 1 FR ! Additive genetic variance
  • C LO 1 1 FR ! Shared environmental variance
  • E LO 1 1 FR ! Non-shared environmental variance
  • M FU 2 2 ! This will be observed MZ covariance
    matrix
  • D FU 2 2 ! This will be observed DZ covariance
    matrix
  • g fu 1 1 ! coefficient of 0.5 for DZ pairs
  • n fu 1 1 ! sample size for MZ pairs (female
    in this illustration)
  • k fu 1 1 ! sample size for DZ pairs (female
    in this illustration)
  • p fu 1 1 ! order of matrices (i.e. number of
    variables 2 in this case)
  • end matrices
  • mat g 0.5
  • mat p 2
  • mat m
  • 0.7247 0.5891
  • 0.5891 0.7915

13
Analysis of Australian BMI data-young female MZ
twins pairs-MX DYI version (ctd)
  • BEGIN ALGEBRA
  • tnk ! total sample size
  • U(ACE AC _ AC ACE) ! Expected MZ
    covariance matrix
  • V(ACE gAC _ gAC ACE) ! Expected DZ
    covariance matrix
  • Hn(\ln (\det(U))-\ln (\det(M)) \tr((U
    M))-p) ! fit-function for MZ group
  • Jk(\ln (\det(V))-\ln (\det(D)) \tr((V
    D))-p) ! fit-function for DZ group
  • Fhj
  • END ALGEBRA
  • bo 0.01 1.0 e(1,1)
  • bo 0.0 1.0 c(1,1) a(1,1)
  • CO F
  • option user df6
  • end

14
PRE-HISTORY (STATISTICS 101) - II
  • LINEAR REGRESSION requires weaker assumptions
    than linear covariance structure models. Does not
    assume multivariate normal distribution, only
    homoscedastic residuals. Flexible for handling
    selective sampling schemes where we oversample
    extreme values of predictor variable(s).
  • We can fit linear regression models by maximum
    likelihood e.g. using MX.

15
HISTORY! MX INTRODUCTORY WORKSHOP - III
  • Definition variables an option in MX when
    fitting to raw data, which allows us to model
    effects of some variables as fixed effects,
    modeling their contribution to expected means.
  • Simple example controlling for linear or
    polynomial regression of a quantitative measure
    on age. Dont want to model covariance structure
    with age (which probably has rectangular
    distribution!)
  • Definition variables variables whose values may
    vary from individual to individual, that can be
    read into matrices
  • Important example genotypes at a given locus or
    set of loci.
Write a Comment
User Comments (0)
About PowerShow.com