Statistical Guidelines for Psychosomatic Medicine: A brief commentary - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical Guidelines for Psychosomatic Medicine: A brief commentary

Description:

Title: Slide 1 Author: Mike Babyak Last modified by: Mike Babyak Created Date: 2/28/2006 5:02:59 PM Document presentation format: Letter Paper (8.5x11 in) – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 18
Provided by: MikeBa154
Learn more at: https://people.duke.edu
Category:

less

Transcript and Presenter's Notes

Title: Statistical Guidelines for Psychosomatic Medicine: A brief commentary


1
Statistical Guidelines for Psychosomatic
MedicineA brief commentary
2
Reporting Results
Lay out analytic plan Explicitly tie analysis to
hypothesis Include the exact model Discuss
assumptions Discuss power Correction for
multiplicity-if not, why not? Tables Report
exact p-values Round, round and round some
more Mention scale in regression tables Model
fit, if relevant Graphics Avoid ducks No 3-d
unless data are 3-d Box or dot plots preferred
to bar charts
3
One-sided (Directional) Hypothesis Tests
  • Controversial
  • Typically preferred because it covers unexpected
    result
  • Argument is that one-sided can be used if
    unexpected result or no difference would not lead
    to different action or suggest risk
  • Need to justify deviation
  • Whats wrong with higher p-value for new ideas?

4
Artificial Categorization of Variables
  • Long literature outlining problems with this
    approach
  • In population, by definition reduces power
  • In samples, can get a lucky cut
  • Does NOT improve reliability
  • Doesnt make measurement sense
  • Hides non-linear relations
  • Can yield spurious results on multivariable
    applications

5
Type I error rates for the relation between x2
and y after dichotomizing two continuous
predictors. Maxwell and Delaney (21) calculated
the effect of dichotomizing two continuous
predictors as a function of the correlation
between them. The true model is y .5x1 0x2
where all variables are continuous. If x1 and x2
are dichotomized, the error rate for the relation
between x2 and y increases as the correlation
between x1 and x2 increases.
Correlation between x1 and x2 Correlation between x1 and x2 Correlation between x1 and x2 Correlation between x1 and x2
N 0 .3 .5 .7
50 .05 .06 .08 .10
100 .05 .08 .12 .18
200 .05 .10 .19 .31
6
Artificial Categorization of Variables
  • If true category, use something like clustering,
    not median splits
  • If expect nonlinearity, use polynomials or
    splines (splitting into quartiles, etc., is
    acceptable, but increases standard errors
    considerably)
  • Clinical cutpoints should not figure into
    statistical modeling until the model is already
    developed with ALL the data

7
author Chatfield, C.,   title  Model
uncertainty, data mining and statistical
inference (with discussion),   journal  JRSSA,
  year     1995,   volume 158,   pages  
419-466,   annote               --bias by
selecting model because it fits the data well
bias in standard errors P. 420 ... need for a
better balance in the literature and in
statistical teaching between techniques and
problem solving strategies.  P. 421 It is well
known' to be logically unsound and practically
misleading' (Zhang, 1992) to make inferences as
if a model is known to be true when it has, in
fact, been selected from the same data to be used
for estimation purposes.  However, although
statisticians may admit this privately (Breiman
(1992) calls it a quiet scandal'), they (we)
continue to ignore the difficulties because it is
not clear what else could or should be done. P.
421 Estimation errors for regression
coefficients are usually smaller than errors from
failing to take into account model specification.
P. 422 Statisticians must stop pretending that
model uncertainty does not exist and begin to
find ways of coping with it.  P. 426 It is
indeed strange that we often admit model
uncertainty by searching for a best model but
then ignore this uncertainty by making inferences
and predictions as if certain that the best
fitting model is actually true.  
8
P. 427 The analyst needs to assess the model
selection process and not just the best fitting
model.  P. 432 The use of subset selection
methods is well known to introduce alarming
biases. P. 433 ... the AIC can be highly biased
in data-driven model selection situations.  P.
434 Prediction intervals will generally be too
narrow. In the discussion, Jamal R. M. Ameen
states that a model should be (a) satisfactory in
performance relative to the stated objective, (b)
logically sound, (c) representative, (d)
questionable and subject to on--line
interrogation, (e) able to accommodate external
or expert information and (f) able to convey
information.
9
Automated Stepwise Selection Procedures
  • Can lead to wildly optimistic models
  • Doesnt deal well with correlated predictors
  • Extremely poor replication unless sample sizes
    are huge
  • Best subset has similar problems

10
SOME of the problems with stepwise variable
selection.
1. It yields R-squared values that are badly
biased high 2. The F and chi-squared tests
quoted next to each variable on the printout do
not have the claimed distribution 3. The method
yields confidence intervals for effects and
predicted values that are falsely narrow 4. It
yields P-values that do not have the proper
meaning and the proper correction for them is a
very difficult problem 5. It gives biased
regression coefficients that need shrinkage (the
coefficients for remaining variables are too
large). 6. It has severe problems in the
presence of collinearity 7. It is based on
methods (e.g. F tests for nested models) that
were intended to be used to test pre-specified
hypotheses. 8. Increasing the sample size
doesn't help very much 9. It allows us to not
think about the problem
11
Simulation results Number of Noise Variables
Included
Sample Size
20 candidate predictors 100 samples
12
Automated Stepwise Selection Procedures
  • If confronted with too many predictors
  • Use theory to delete
  • Combine predictors using clustering or tree
    methods before modeling without looking at Y
  • Use approaches that exploit correlated variables,
    MANOVA, SEM, PLS, Principal Components Regression
  • If you MUST use stepwise
  • Backward preferable
  • Set p to remove high
  • MUST cross-validate

13
Variable Selection in Multivariable Models
  • Fit and p-values for regressions are based on
    assumption of pre-specified model
  • Univariate prescreening requires correction to
    adjust for process
  • P-values should not be sole guidenot a
    hypothesis test!
  • raise model df to reflect all variables searched
  • Cross validation to show level of optimism
  • Use pre-shrinkage
  • Pay attention to effective sample size
  • Too many predictors leads to poor power and
    instability of estimates

14
Simulation results number of events/predictor
ratio
From Peduzzi et al. J Clin Epidemiol. 1996
Dec49(12)1373-9.
15
Harrell FE Jr. Regression modeling strategies
with applications to linear models, logistic
regression and survival analysis. New York
Springer 2001. Green SB. How many subjects
does it take to do a regression analysis?
Multivar Behav Res 1991 26 499510. Peduzzi
PN, Concato J, Holford TR, Feinstein AR. The
importance of events per independent variable in
multivariable analysis, II accuracy and
precision of regression estimates. J Clin
Epidemiol 1995 48 150310 Peduzzi PN, Concato
J, Kemper E, Holford TR, Feinstein AR. A
simulation study of the number of events per
variable in logistic regression analysis. J Clin
Epidemiol 1996 49 13739. Thompson B. Stepwise
regression and stepwise discriminant analysis
need not apply here a guidelines editorial. Ed
Psychol Meas 1995 55 52534. Cohen J. Things
I have learned (so far). Am Psychol 1990 45
130412. Roecker EB. Prediction error and its
estimation for subset-selected models
Technometrics 1991 33 45968.
16
Tibshirani R. Regression shrinkage and selection
via the lasso. J R Stat Soc B 2003 58 26788.
Grambsch PM, OBrien PC. The effects of
preliminary tests for nonlinearity in regression.
Stat Med 1991 10 697709. Faraway JJ. The cost
of data analysis. J Comput Graph Stat 1992 1
21329. Altman DG, Andersen PK. Bootstrap
investigation of the stability of a Cox
regression model. Stat Med 2003 8 77183.
Derksen S, Keselman HJ. Backward, forward and
stepwise automated subset selection algorithms
frequency of obtaining authentic and noise
variables. Br J Math Stat Psychol 1992 45
26582. Steyerberg EW, Harrell FE, Habbema JD.
Prognostic modeling with logistic regression
analysis in search of a sensible strategy in
small data sets. Med Decis Making 2001 21
4556. Steyerberg EW, Harrell FE Jr, Borsboom
GJ, Eijkemans MJ, Vergouwe Y, Habbema JD.
Internal validation of predictive models
efficiency of some procedures for logistic
regression analysis. J Clin Epidemiol 2001 54
77481.
17
Maxwell SE, Delaney HD. Bivariate median splits
and spurious statistical significance. Psychol
Bull 1993 113 18190. MacCallum RC, Zhang S,
Preacher KJ, Rucker DD. On the practice of
dichotomization of quantitative variables.
Psychol Methods 2002 7 1940. McClelland G.
Negative consequences of dichotomizing continuous
predictor variables. Available at
http//psych.colorado.edu/ mcclella/MedianSplit/.
Royston P, Altman DG, Sauerbrei Links
Dichotomizing continuous predictors in multiple
regression a bad idea. Stat Med. 2006 Jan
1525(1)127-41 Freedman D. Statistical models
and shoe leather (with discussion). Soc Methodol
1991 21 291313.
Write a Comment
User Comments (0)
About PowerShow.com