John R' Stevens - PowerPoint PPT Presentation

About This Presentation
Title:

John R' Stevens

Description:

alternative hypothesis: true difference in means is not equal to 0 ... Positive Definiteness. Probability. Distributions: Multivariate Normal, Binomial, Multinomial ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 39
Provided by: stat50
Learn more at: https://www.usu.edu
Category:

less

Transcript and Presenter's Notes

Title: John R' Stevens


1
Advanced Statistical MethodsBeyond Linear
Regression
  • John R. Stevens
  • Utah State University
  • Notes 2. Statistical Methods I
  • Mathematics Educators Workshop
  • 28 March 2009

1
http//www.stat.usu.edu/jrstevens/pcmi
2
What would your students know to do with these
data?
Obs Flight Temp Damage 1 STS1 66 NO 2 STS9 70 NO 3
STS51B 75 NO 4 STS2 70 YES 5 STS41B 57 YES 6 STS5
1G 70 NO 7 STS3 69 NO 8 STS41C 63 YES 9 STS51F 81
NO 10 STS4 80 11 STS41D 70 YES 12 STS51I 76 NO 13
STS5 68 NO 14 STS41G 78 NO 15 STS51J 79 NO 16 STS
6 67 NO 17 STS51A 67 NO 18 STS61A 75 YES 19 STS7 7
2 NO 20 STS51C 53 YES 21 STS61B 76 NO 22 STS8 73 N
O 23 STS51D 67 NO 24 STS61C 58 YES
3
Two Sample t-test data Temp by Damage
t 3.1032, df 21, p-value
0.005383 alternative hypothesis true difference
in means is not equal to 0 95 percent confidence
interval 2.774344 14.047085 sample
estimates mean in group NO mean in group YES
72.12500 63.71429
4
Does the t-test make sense here?
  • Traditional
  • Treatment Group mean vs. Control Group mean
  • What is the response variable?
  • Temperature? Quantitative, Continuous
  • Damage? Qualitative

5
Traditional Statistical Model 1
  • Linear Regression predict continuous response
    from quantitative predictors
  • Yweight, Xheight
  • Yincome, Xeducation level
  • Yfirst-semester GPA, Xparents income
  • Ytemperature, Xdamage (0no, 1yes)
  • Can also control for other possibly
    categorical factors (covariates)
  • Sex
  • Major
  • State of Origin
  • Number of Siblings

6
Traditional Statistical Model 2
  • Logistic Regression predict binary response from
    quantitative predictors
  • Ygraduate within 5 years0 vs. Ynot1
  • Xfirst-semester GPA
  • Y0 (no damage) vs. Y1 (damage)
  • Xtemperature
  • Y0 (survive) vs. Y1 (death)
  • Xdosage (dose-response model)
  • Can also control for other factors, or
    covariates
  • Race, Sex
  • Genotype
  • p P(Y1 relevant factors) prob. that
    Y1, given state of relevant factors

7
Traditional Dose-Response Model
  • p Probability of death at dose d
  • Look at what affects the shape of the curve, LD50
    (lethal dose for 50 efficacy), etc.

8
Fitting the Dose-Response Model
  • Why logistic regression?
  • ß0 place-holder constant
  • ß1 effect of dosage d
  • To estimate parameters
  • Newton-Raphson iterative process to maximize
    the likelihood of the model
  • Compare Y0 (no damage) with Y1 (damage) groups

9
Likelihood Function (to be maximized)
likelihood for obs. i
multiply probabilities (independence)
10
Estimation by IRLS
  • Iteratively Reweighted Least Squares
  • equivalent Newton-Raphson algorithm for
    iteratively solving score equations

11
Coefficients Estimate Std. Error z
value Pr(gtz) (Intercept) 15.0429 7.3786
2.039 0.0415 Temp -0.2322 0.1082
-2.145 0.0320 --- Signif. codes 0
0.001 0.01 0.05 . 0.1 1
12
(No Transcript)
13
What if the data were even better?
  • Complete separation of points
  • What should happen toour slope estimate?

14
Coefficients Estimate Std. Error z
value Pr(gtz) (Intercept) 928.9 913821.4
0.001 1 Temp -14.4 14106.7
-0.001 1
15
Failure?
  • Shape of likelihood function
  • Large Standard Errors
  • Solution only in 2006
  • Rather than maximizing likelihood, consider a
    penalty

16
Model fitted by Penalized ML Confidence
intervals and p-values by Profile Likelihood
coef se(coef) Chisq
p (Intercept) 30.4129282 16.5145441 11.35235
0.0007535240 Temp -0.4832632 0.2528934
13.06178 0.0003013835
17
Beetle Data
18
Dose-response model
  • Recall simple model
  • pij Pr(Y1 dosage level j and genotype level
    i)
  • But when is genotype (covariate Gi) observed?

19
Coefficients Estimate Std. Error
z value Pr(gtz) (Intercept) -2.657e01
8.901e04 -2.98e-04 1 dose
-7.541e-26 1.596e07 -4.72e-33 1 G1
-3.386e-28 1.064e05 -3.18e-33 1 G2B
-1.344e-14 1.092e05 -1.23e-19
1 G2H -3.349e-28 1.095e05 -3.06e-33
1 doseG1 7.541e-26 1.596e07
4.72e-33 1 doseG2B 3.984e-12
3.075e07 1.30e-19 1 doseG2H
7.754e-26 2.760e07 2.81e-33 1 G1G2B
1.344e-14 1.465e05 9.17e-20
1 G1G2H 3.395e-28 1.327e05 2.56e-33
1 doseG1G2B -3.984e-12 3.098e07
-1.29e-19 1 doseG1G2H -7.756e-26
2.763e07 -2.81e-33 1
Before we fix this, first a little detour
20
A Multivariate Gaussian Mixture
Component j isMVN(µj,Sj) with proportion pj
21
The Maximum Likelihood Approach
22
A Possible Work-Around
  • Keys here
  • the true group memberships ? are unknown (latent)
  • statisticians specialize in unknown quantities

23
A reasonable approach
  • 1. Randomly assign group memberships ?, and
    estimate group means µj , covariance matrices Sj
    , and mixing proportions pj
  • 2. Given those values, calculate (for each obs.)
    ?j E?j? P(obs. in group j)
  • 3. Update estimates for µj , Sj , and pj ,
    weighting each observation by these ?
  • 4. Repeat steps 2 and 3 to convergence

24
Plotting character and color indicate most likely
component
25
The EM (Baum-Welch) Algorithm- maximization made
easier with Zm latent (unobserved) data T
(Z,Zm) complete data
  • Start with initial guesses for parameters
  • Expectation At the kth iteration, compute
  • Maximization Obtain estimateby maximizing
    over
  • Iterate steps 2 and 3 to convergence (?)

26
Beetle Data Notation
  • Observed values
  • Unobserved (latent) values
  • If Nij had been observed
  • How Nij can be latently considered

27
Likelihood Function
  • Parameters ?(p,P) and complete data T(n,N)
  • After simplification
  • Mechanism of missing data suggests EM algorithm

28
Missing at Random (MAR)
  • Necessary assumption for usual EM applications
  • Covariate x is MAR if probability of observing x
    does not depend on x or any other unobserved
    covariate, but may depend on response and other
    observed covariates (Ibrahim 1990)
  • Here genotype is observed only for survivors,
    and for all subjects at zero dosage

29
Initialization Step
  • Two classes of marginal information here
  • For all dosage levels j observe
  • At zero dosage level observe for
    genotype i
  • Allows estimate of Pi
  • Consider marginal distn. of missing categorical
    covariate (genotype)
  • Using zero dosage level
  • This is the key the marginal distribution of
    the missing categorical covariate

30
Expectation Step
  • Dropping constants and
  • Need to evaluate

()
31
Expectation Step
  • Bayes Formula
  • Multinomial ?

()
32
Expectation Step
  • For
  • Not needed for maximization only affects EM
    convergence rate
  • Direct calculation from multinomial distn. is
    possible but computationally prohibitive
  • Need to employ some approximation strategy
  • Second-order Taylor series about ,
    using Binets formula

()
33
Expectation Step
  • Consider Binets formula (like Stirlings)
  • Have
  • Use a second-order Taylor series approximation
    taken about
  • as a function of

()
34
Maximization Step
  • Portion of related to
  • Portion of related to

by Lagrange multipliers
by Newton-Raphson iterations, with some
parameterization
()
35
Convergence
36
Dose Response Curves (log scale)
37
EM Results
test statistic for H0 no dosage effect
separation of points
38
Topics Used Here
  • Calculus
  • Differentiation Integration (including vector
    differentiation)
  • Lagrange Multipliers
  • Taylor Series Expansions
  • Linear Algebra
  • Determinants Eigenvalues
  • Inverting computationally/nearly singular
    Matrices
  • Positive Definiteness
  • Probability
  • Distributions Multivariate Normal, Binomial,
    Multinomial
  • Bayes Formula
  • Statistics
  • Logistic Regression
  • Separation of Points
  • Penalized Likelihood Maximization
  • EM Algorithm
  • Biology a little time and communication
Write a Comment
User Comments (0)
About PowerShow.com