Multivariable regression models with continuous covariates - PowerPoint PPT Presentation

1 / 55

About This Presentation

Title:

Multivariable regression models with continuous covariates

Description:

'Quantifying epidemiologic risk factors using non-parametric regression: model ... JRSS(A) 162: 71-94. Corrigendum JRSS(A) 165: 399--400, 2002 ... – PowerPoint PPT presentation

Number of Views:289

Avg rating:3.0/5.0

Slides: 56

Provided by: PROY150

Category:

more less

Transcript and Presenter's Notes

Title: Multivariable regression models with continuous covariates

1
Multivariable regression models with continuous
covariates
Willi SauerbreiInstitut of Medical Biometry and
Informatics University Medical Center Freiburg,
Germany
Patrick Royston MRC Clinical Trials Unit,
London, UK

with a practical emphasis on fractional
polynomials and applications in clinical
epidemiology

2
The problem
Quantifying epidemiologic risk factors using
non-parametric regression model selection
remains the greatest challenge Rosenberg PS et
al, Statistics in Medicine 2003
223369-3381 Trivial nowadays to fit almost any
model To choose a good model is much harder
3
Overview

Context and motivation
Introduction to fractional polynomials for the
univariate smoothing problem
Extension to multivariable models
Robustness and stability
Software sources
Conclusions

4
Motivation

Often have continuous risk factors in
epidemiology and clinical studies how to model
them?
Linear model may describe a dose-response
relationship badly
Linear straight line ?0 ?1 X
throughout talk
Using cut-points has several problems
Splines recommended by some but are not ideal
Lack a well-defined approach to model selection
Black box
Robustness issues

5
Problems of cut-points

Step-function is a poor approximation to true
relationship
Almost always fits data less well than a suitable
continuous function
Optimal cut-points have several difficulties
Biased effect estimates
Inflated P-values
Not reproducible in other studies

6
Example datasets1. Epidemiology

Whitehall 1
17,370 male Civil Servants aged 40-64 years
Measurements include age, cigarette smoking, BP,
cholesterol, height, weight, job grade
Outcomes of interest coronary heart disease,
all-cause mortality ? logistic regression
Interested in risk as function of covariates
Several continuous covariates
Some may have no influence in multivariable
context

7
Example datasets2. Clinical studies

German breast cancer study group (BMFT-2)
Prognostic factors in primary breast cancer
Age, menopausal status, tumour size, grade, no.
of positive lymph nodes, hormone receptor status
Recurrence-free survival time ? Cox regression
686 patients, 299 events
Several continuous covariates
Interested in prognostic model and effect of
individual variables

8
ExampleSystolic blood pressure vs. age
9
Example Curve fitting(Systolic BP and age not
linear)
10
Empirical curve fitting Aims

Smoothing
Visualise relationship of Y with X
Provide and/or suggest functional form

11
Some approaches

Non-parametric (local-influence) models
Locally weighted (kernel) fits (e.g. lowess)
Regression splines
Smoothing splines (used in generalized additive
models)
Parametric (non-local influence) models
Polynomials
Non-linear curves
Fractional polynomials
Intermediate between polynomials and non-linear
curves

12
Local regression models

Advantages
Flexible because local!
May reveal true curve shape (?)
Disadvantages
Unstable because local!
No concise form for models
Therefore, hard for others to use
publication,compare results with those from other
models
Curves not necessarily smooth
Black box approach
Many approaches which one(s) to use?

13
Polynomial models

Do not have the disadvantages of local regression
models, but do have others
Lack of flexibility (low order)
Artefacts in fitted curves (high order)
Cannot have asymptotes

14
Fractional polynomial models

Describe for one covariate, X
multiple regression later
Fractional polynomial of degree m for X with
powers p1, , pm is given by FPm(X) ?1 X p1
?m X pm
Powers p1,, pm are taken from a special set
?2, ? 1, ? 0.5, 0, 0.5, 1, 2, 3
Usually m 1 or m 2 is sufficient for a good
fit

15
FP1 and FP2 models

FP1 models are simple power transformations
1/X2, 1/X, 1/?X, log X, ?X, X, X2, X3
8 models
FP2 models are combinations of these
For example ?1(1/X) ?2(X2)
28 models
Note repeated powers models
For example ?1(1/X) ?2(1/X)log X
8 models

16
FP1 and FP2 modelssome properties

Many useful curves
A variety of features are available
Monotonic
Can have asymptote
Non-monotonic (single maximum or minimum)
Single turning-point
Get better fit than with conventional
polynomials, even of higher degree

17
Examples of FP2 curves- varying powers
18
Examples of FP2 curves- single power, different
coefficients
19
A philosophy of function selection

Prefer simple (linear) model
Use more complex (non-linear) FP1 or FP2 model if
indicated by the data
Contrast to local regression modelling
Already starts with a complex model

20
Estimation and significance testing for FP models

Fit model with each combination of powers
FP1 8 single powers
FP2 36 combinations of powers
Choose model with lowest deviance (MLE)
Comparing FPm with FP(m ? 1)
compare deviance difference with ?2 on 2 d.f.
one d.f. for power, 1 d.f. for regression
coefficient
supported by simulations slightly conservative

21
Selection of FP function

Has flavour of a closed test procedure
Use ?2 approximations to get P-values
Define nominal P-value for all tests (often 5)
Fit linear and best FP1 and FP2 models
Test FP2 vs. null test of any effect of X (?2
on 4 df)
Test FP2 vs linear test of non-linearity (?2 on
3 df)
Test FP2 vs FP1 test of more complex function
against simpler one (?2 on 2 df)

22
Example Systolic BP and age
Reminder FP1 had power 3 ?1 X3 FP2 had
powers (1,1) ?1 X ?2 X log X
23
Aside FP versus spline

Why care about FPs when splines are more
flexible?
More flexible ? more unstable
More chance of over-fitting
In epidemiology, dose-response relationships are
often simple
Illustrate by small simulation example

24
FP versus spline (continued)

Logarithmic relationships are common in practice
Simulate regression model y ?0 ?1log(X)
error
Error is normally distributed N(0, ?2)
Take ?0 0, ?1 1 X has lognormal
distribution
Vary ? 1, 0.5, 0.25, 0.125
Fit FP1, FP2 and spline with 2, 4, 6 d.f.
Compute mean square error
Compare with mean square error for true model

25
FP vs. spline (continued)
26
FP vs. spline (continued)
27
FP vs. spline (continued)
28
FP vs. spline (continued)
29
FP vs. spline (continued)

In this example, spline usually less accurate
than FP
FP2 less accurate than FP1 (over-fitting)
FP1 and FP2 more accurate than splines
Splines often had non-monotonic fitted curves
Could be medically implausible
Of course, this is a special example

30
Multivariable FP (MFP) models

Assume have k gt 1 continuous covariates and
perhaps some categoric or binary covariates
Allow dropping of non-significant variables
Wish to find best multivariable FP model for all
Xs
Impractical to try all combinations of powers
Require iterative fitting procedure

31
Fitting multivariable FP models(MFP algorithm)

Combine backward elimination of weak variables
with search for best FP functions
Determine fitting order from linear model
Apply FP model selection procedure to each X in
turn
fixing functions (but not ?s) for other Xs
Cycle until FP functions (i.e. powers) and
variables selected do not change

32
Example Prognostic factors in breast cancer

Aim to develop a prognostic index for risk of
tumour recurrence or death
Have 7 prognostic factors
4 continuous, 3 categorical
Select variables and functions using 5
significance level

33
Univariate linear analysis
34
Univariate FP2 analysis
Gain compares FP2 with linear on 3 d.f. All
factors except for X3 have a non-linear effect
35
Multivariable FP analysis
36
Comments on analysis

Conventional backwards elimination at 5 level
selects X4a, X5, X6, and X1 is excluded
FP analysis picks up same variables as backward
elimination, and additionally X1
Note considerable non-linearity of X1 and X5
X1 has no linear influence on risk of recurrence
FP model detects more structure in the data than
the linear model

37
Plots of fitted FP functions
38
Survival by risk groups
39
Robustness of FP functions

Breast cancer example showed non-robust functions
for nodes not medically sensible
Situation can be improved by performing covariate
transformation before FP analysis
Can be done systematically (work in progress)
Sauerbrei Royston (1999) used negative
exponential transformation of nodes
exp(0.12 number of nodes)

40
Making the function for lymph nodes more robust
41
2nd example Whitehall 1MFP analysis
No variables were eliminated by the MFP
algorithm Weight is eliminated by linear backward
elimination
42
Plots of FP functions
43
Stability

Models (variables, FP functions) selected by
statistical criteria cut-off on P-value
Approach has several advantages
and also is known to have problems
Omission bias
Selection bias
Unstable many models may fit equally well

44
Stability

Instability may be studied by bootstrap
resampling (sampling with replacement)
Take bootstrap sample B times
Select model by chosen procedure
Count how many times each variable is selected
Summarise inclusion frequencies their
dependencies
Study fitted functions for each covariate
May lead to choosing several possible models, or
a model different from the original one

45
Bootstrap stability analysis of the breast cancer
dataset

5000 bootstrap samples taken (!)
MFP algorithm with Cox model applied to each
sample
Resulted in 1222 different models (!!)
Nevertheless, could identify stable subset
consisting of 60 of replications
Judged by similarity of functions selected

46
Bootstrap stability analysis of the breast cancer
dataset
47
Bootstrap analysis summaries of fitted curves
from stable subset
48
Presentation of models for continuous covariates

The function 95 CI gives the whole story
Functions for important covariates should always
be plotted
In epidemiology, sometimes useful to give a more
conventional table of results in categories
This can be done from the fitted function

49
Example Cigarette smoking and all-cause
mortality (Whitehall 1)
50
Other issues (1)

Handling continuous confounders
May use a larger P-value for selection e.g. 0.2
Not so concerned about functional form here
Binary/continuous covariate interactions
Can be modelled using FPs (Royston Sauerbrei
2004)
Adjust for other factors using MFP

51
Other issues (2)

Time-varying effects in survival analysis
Can be modelled using FP functions of time
(Berger also Sauerbrei Royston, in progress)
Checking adequacy of FP functions
May be done by using splines
Fit FP function and see if spline function adds
anything, adjusting for the fitted FP function

52
Software sources

Most comprehensive implementation is in Stata
Command mfp is part of Stata 8
Versions for SAS and R are now available
Contact W Sauerbrei (wfs_at_imbi.uni-freiburg.de) to
request a copy of the SAS macro
R version available on CRAN archive
mfp package

53
Concluding remarks (1)

FP method in general
No reason (other than convention) why regression
models should include only positive integer
powers of covariates
FP is a simple extension of an existing method
Simple to program and simple to explain
Parametric, so can easily get predicted values
FP usually gives better fit than standard
polynomials
Cannot do worse, since standard polynomials are
included

54
Concluding remarks (2)

Multivariable FP modelling
Many applications in general context of multiple
regression modelling
Well-defined procedure based on standard
principles for selecting variables and functions
Aspects of robustness and stability have been
investigated (and methods are available)
Much experience gained so far suggests that
method is very useful in clinical epidemiology

55
Some references

Royston P, Altman DG (1994) Regression using
fractional polynomials of continuous covariates
parsimonious parametric modelling. Applied
Statistics 43 429-467
Royston P, Altman DG (1997) Approximating
statistical functions by using fractional
polynomial regression. The Statistician 46 1-12
Sauerbrei W, Royston P (1999) Building
multivariable prognostic and diagnostic models
transformation of the predictors by using
fractional polynomials. JRSS(A) 162 71-94.
Corrigendum JRSS(A) 165 399--400, 2002
Royston P, Ambler G, Sauerbrei W. (1999) The use
of fractional polynomials to model continuous
risk variables in epidemiology. International
Journal of Epidemiology, 28 964-974.
Royston P, Sauerbrei W (2004). A new approach to
modelling interactions between treatment and
continuous covariates in clinical trials by using
fractional polynomials. Statistics in Medicine
23 2509-2525.
Royston P, Sauerbrei W (2003) Stability of
multivariable fractional polynomial models with
selection of variables and transformations a
bootstrap investigation. Statistics in Medicine
22 639-659.
Armitage P, Berry G, Matthews JNS (2002)
Statistical Methods in Medical Research. Oxford,
Blackwell.

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Logistic Regression and Discriminant Function Analysis PowerPoint PPT Presentation

Logistic Regression and Discriminant Function Analysis - Requires an estimation and validation sample to assess predictive accuracy ... of the following variables predict whether a woman is hired to be a Hooters girl? ... | PowerPoint PPT presentation | free to view

$Multivariable%20regression%20models%20with%20continuous%20covariates%20with%20a%20practical%20emphasis%20on%20fractional%20polynomials%20and%20applications%20in%20clinical%20epidemiology PowerPoint PPT Presentation$

Multivariable%20regression%20models%20with%20continuous%20covariates%20with%20a%20practical%20emphasis%20on%20fractional%20polynomials%20and%20applications%20in%20clinical%20epidemiology - ... use publication,compare results with those from other models ... Black box' approach. Many approaches which one(s) to use? 8/4/2005. 13. Polynomial models ... | PowerPoint PPT presentation | free to view

Quantitative Trait Loci, QTL An introduction to quantitative genetics and common methods for mapping of loci underlying continuous traits: PowerPoint PPT Presentation

Quantitative Trait Loci, QTL An introduction to quantitative genetics and common methods for mapping of loci underlying continuous traits: - k is the dominance coeffcient. k = 0 means complete additivity, k = 1 means complete dominance (of A2), k 1 if A2 is overdominant. ... | PowerPoint PPT presentation | free to view

Freedom to the Designs Multiple logistic regression and mixed models PowerPoint PPT Presentation

Freedom to the Designs Multiple logistic regression and mixed models - Freedom to the Designs. Multiple logistic regression and mixed models. Florian Jaeger Roger Levy ... time data by Florian Jaeger http://www.stanford.edu/~tiflo ... | PowerPoint PPT presentation | free to view

Using complex random effect models in epidemiology and ecology PowerPoint PPT Presentation

Using complex random effect models in epidemiology and ecology - These m values are then averaged to give point estimates of the parameter of interest. ... For more information see my book: MCMC Estimation in MLwiN Browne (2003) ... | PowerPoint PPT presentation | free to view

Spatial processes and statistical modelling PowerPoint PPT Presentation

Spatial processes and statistical modelling - ... (CAR) models from the corresponding simultaneous autoregression (SAR) models ... or dependent (e.g. CAR model for logs) 61. Introducing covariates ... | PowerPoint PPT presentation | free to view

What You See May Not Be What You Get: A Primer on Regression Artifacts PowerPoint PPT Presentation

What You See May Not Be What You Get: A Primer on Regression Artifacts - Dichotomization of a variable measured with error (y = .4x e) ... Doesn't always make measurement sense. Almost always reduces power ... | PowerPoint PPT presentation | free to view

The Group Lasso for Logistic Regression PowerPoint PPT Presentation

The Group Lasso for Logistic Regression - The Group Lasso for Logistic Regression. Lukas Meier, Sara ... controls the amount of penalization. rescale the penalty with respect to the. dimensionality of ... | PowerPoint PPT presentation | free to view

Seeking Interpretable Models for High Dimensional Data PowerPoint PPT Presentation

Seeking Interpretable Models for High Dimensional Data - Consistency holds also for s and p growing with n, assuming ... Other methods applied to 500 features pre-selected by correlation. page 31. 9/19/09 ... | PowerPoint PPT presentation | free to view

Interactions With Continuous Variables Extensions of the Multivariable Fractional Polynomial Approac PowerPoint PPT Presentation

Interactions With Continuous Variables Extensions of the Multivariable Fractional Polynomial Approac - Adjuvant treatment with chemo- or hormonal therapy according to clinic guidelines. 70% without adjuvant treatment. Covariates. continuous ... | PowerPoint PPT presentation | free to view

Linear Models Lecture 1 PowerPoint PPT Presentation

Linear Models Lecture 1 - Linear Regression & General Linear Models (GLM's) Confounding, Interactions and Regression ... Insert a table of betas. Interpreting Regression Output ... | PowerPoint PPT presentation | free to view

Clustering of Time Course GeneExpression Data via Mixture Regression Models PowerPoint PPT Presentation

Clustering of Time Course GeneExpression Data via Mixture Regression Models - ... clustering of the data into g clusters - outright clustering by assigning ... A hard (outright) clustering is given by assigning each yj ... | PowerPoint PPT presentation | free to view

Linear correlation and linear regression summary of tests PowerPoint PPT Presentation

Linear correlation and linear regression summary of tests - cov(X,Y) 0 X and Y are positively correlated. cov(X,Y) 0 X and Y are ... ( remember max and mins from calculus)... Derivative[ (Yi-(mx b))2]=0. Prediction ... | PowerPoint PPT presentation | free to view

Multivariable model building with continuous data PowerPoint PPT Presentation

Multivariable model building with continuous data - Fractional polynomial of degree m for X with powers p1, ... , pm is given by ... In Crowley J, Ankerst DP (ed.), Handbook of Statistics in Clinical Oncology, ... | PowerPoint PPT presentation | free to view

$Multivariable regression modelling a pragmatic approach based on fractional polynomials for continuo PowerPoint PPT Presentation$

Multivariable regression modelling a pragmatic approach based on fractional polynomials for continuo - Multivariable regression modelling. a pragmatic approach based on fractional ... Small underpowered studies, poor study design, varying and sometimes ... | PowerPoint PPT presentation | free to view

$Making fractional polynomial models more robust PowerPoint PPT Presentation$

Making fractional polynomial models more robust - Institut of Medical Biometry and Informatics. University Medical Center Freiburg, Germany ... 13 continuous covariates comprising age, weight, height, 10 body ... | PowerPoint PPT presentation | free to view

$Multivariable regression models with continuous covariates with a practical emphasis on fractional polynomials and applications in clinical epidemiology PowerPoint PPT Presentation$