Title: John R' Stevens
1Advanced Statistical MethodsBeyond Linear
Regression
- John R. Stevens
- Utah State University
- Notes 2. Statistical Methods I
- Mathematics Educators Workshop
- 28 March 2009
1
http//www.stat.usu.edu/jrstevens/pcmi
2What would your students know to do with these
data?
Obs Flight Temp Damage 1 STS1 66 NO 2 STS9 70 NO 3
STS51B 75 NO 4 STS2 70 YES 5 STS41B 57 YES 6 STS5
1G 70 NO 7 STS3 69 NO 8 STS41C 63 YES 9 STS51F 81
NO 10 STS4 80 11 STS41D 70 YES 12 STS51I 76 NO 13
STS5 68 NO 14 STS41G 78 NO 15 STS51J 79 NO 16 STS
6 67 NO 17 STS51A 67 NO 18 STS61A 75 YES 19 STS7 7
2 NO 20 STS51C 53 YES 21 STS61B 76 NO 22 STS8 73 N
O 23 STS51D 67 NO 24 STS61C 58 YES
3 Two Sample t-test data Temp by Damage
t 3.1032, df 21, p-value
0.005383 alternative hypothesis true difference
in means is not equal to 0 95 percent confidence
interval 2.774344 14.047085 sample
estimates mean in group NO mean in group YES
72.12500 63.71429
4Does the t-test make sense here?
- Traditional
- Treatment Group mean vs. Control Group mean
- What is the response variable?
- Temperature? Quantitative, Continuous
- Damage? Qualitative
5Traditional Statistical Model 1
- Linear Regression predict continuous response
from quantitative predictors - Yweight, Xheight
- Yincome, Xeducation level
- Yfirst-semester GPA, Xparents income
- Ytemperature, Xdamage (0no, 1yes)
- Can also control for other possibly
categorical factors (covariates) - Sex
- Major
- State of Origin
- Number of Siblings
6Traditional Statistical Model 2
- Logistic Regression predict binary response from
quantitative predictors - Ygraduate within 5 years0 vs. Ynot1
- Xfirst-semester GPA
- Y0 (no damage) vs. Y1 (damage)
- Xtemperature
- Y0 (survive) vs. Y1 (death)
- Xdosage (dose-response model)
- Can also control for other factors, or
covariates - Race, Sex
- Genotype
- p P(Y1 relevant factors) prob. that
Y1, given state of relevant factors
7Traditional Dose-Response Model
- p Probability of death at dose d
- Look at what affects the shape of the curve, LD50
(lethal dose for 50 efficacy), etc.
8Fitting the Dose-Response Model
- Why logistic regression?
- ß0 place-holder constant
- ß1 effect of dosage d
- To estimate parameters
- Newton-Raphson iterative process to maximize
the likelihood of the model - Compare Y0 (no damage) with Y1 (damage) groups
9Likelihood Function (to be maximized)
likelihood for obs. i
multiply probabilities (independence)
10Estimation by IRLS
- Iteratively Reweighted Least Squares
- equivalent Newton-Raphson algorithm for
iteratively solving score equations
11 Coefficients Estimate Std. Error z
value Pr(gtz) (Intercept) 15.0429 7.3786
2.039 0.0415 Temp -0.2322 0.1082
-2.145 0.0320 --- Signif. codes 0
0.001 0.01 0.05 . 0.1 1
12(No Transcript)
13What if the data were even better?
- Complete separation of points
- What should happen toour slope estimate?
14 Coefficients Estimate Std. Error z
value Pr(gtz) (Intercept) 928.9 913821.4
0.001 1 Temp -14.4 14106.7
-0.001 1
15Failure?
- Shape of likelihood function
- Large Standard Errors
- Solution only in 2006
- Rather than maximizing likelihood, consider a
penalty
16 Model fitted by Penalized ML Confidence
intervals and p-values by Profile Likelihood
coef se(coef) Chisq
p (Intercept) 30.4129282 16.5145441 11.35235
0.0007535240 Temp -0.4832632 0.2528934
13.06178 0.0003013835
17Beetle Data
18Dose-response model
- Recall simple model
- pij Pr(Y1 dosage level j and genotype level
i) - But when is genotype (covariate Gi) observed?
19 Coefficients Estimate Std. Error
z value Pr(gtz) (Intercept) -2.657e01
8.901e04 -2.98e-04 1 dose
-7.541e-26 1.596e07 -4.72e-33 1 G1
-3.386e-28 1.064e05 -3.18e-33 1 G2B
-1.344e-14 1.092e05 -1.23e-19
1 G2H -3.349e-28 1.095e05 -3.06e-33
1 doseG1 7.541e-26 1.596e07
4.72e-33 1 doseG2B 3.984e-12
3.075e07 1.30e-19 1 doseG2H
7.754e-26 2.760e07 2.81e-33 1 G1G2B
1.344e-14 1.465e05 9.17e-20
1 G1G2H 3.395e-28 1.327e05 2.56e-33
1 doseG1G2B -3.984e-12 3.098e07
-1.29e-19 1 doseG1G2H -7.756e-26
2.763e07 -2.81e-33 1
Before we fix this, first a little detour
20A Multivariate Gaussian Mixture
Component j isMVN(µj,Sj) with proportion pj
21The Maximum Likelihood Approach
22A Possible Work-Around
- Keys here
- the true group memberships ? are unknown (latent)
- statisticians specialize in unknown quantities
23A reasonable approach
- 1. Randomly assign group memberships ?, and
estimate group means µj , covariance matrices Sj
, and mixing proportions pj - 2. Given those values, calculate (for each obs.)
?j E?j? P(obs. in group j) - 3. Update estimates for µj , Sj , and pj ,
weighting each observation by these ? - 4. Repeat steps 2 and 3 to convergence
24Plotting character and color indicate most likely
component
25The EM (Baum-Welch) Algorithm- maximization made
easier with Zm latent (unobserved) data T
(Z,Zm) complete data
- Start with initial guesses for parameters
- Expectation At the kth iteration, compute
- Maximization Obtain estimateby maximizing
over - Iterate steps 2 and 3 to convergence (?)
26Beetle Data Notation
- Observed values
-
- Unobserved (latent) values
-
- If Nij had been observed
-
- How Nij can be latently considered
27Likelihood Function
- Parameters ?(p,P) and complete data T(n,N)
- After simplification
- Mechanism of missing data suggests EM algorithm
28Missing at Random (MAR)
- Necessary assumption for usual EM applications
- Covariate x is MAR if probability of observing x
does not depend on x or any other unobserved
covariate, but may depend on response and other
observed covariates (Ibrahim 1990) - Here genotype is observed only for survivors,
and for all subjects at zero dosage
29Initialization Step
- Two classes of marginal information here
- For all dosage levels j observe
- At zero dosage level observe for
genotype i - Allows estimate of Pi
- Consider marginal distn. of missing categorical
covariate (genotype) - Using zero dosage level
- This is the key the marginal distribution of
the missing categorical covariate
30Expectation Step
- Dropping constants and
- Need to evaluate
()
31Expectation Step
- Bayes Formula
- Multinomial ?
()
32Expectation Step
- For
- Not needed for maximization only affects EM
convergence rate - Direct calculation from multinomial distn. is
possible but computationally prohibitive - Need to employ some approximation strategy
- Second-order Taylor series about ,
using Binets formula
()
33Expectation Step
- Consider Binets formula (like Stirlings)
- Have
- Use a second-order Taylor series approximation
taken about - as a function of
()
34Maximization Step
- Portion of related to
- Portion of related to
by Lagrange multipliers
by Newton-Raphson iterations, with some
parameterization
()
35Convergence
36Dose Response Curves (log scale)
37EM Results
test statistic for H0 no dosage effect
separation of points
38Topics Used Here
- Calculus
- Differentiation Integration (including vector
differentiation) - Lagrange Multipliers
- Taylor Series Expansions
- Linear Algebra
- Determinants Eigenvalues
- Inverting computationally/nearly singular
Matrices - Positive Definiteness
- Probability
- Distributions Multivariate Normal, Binomial,
Multinomial - Bayes Formula
- Statistics
- Logistic Regression
- Separation of Points
- Penalized Likelihood Maximization
- EM Algorithm
- Biology a little time and communication