Title: When You Get Hit By a Car
1When You Get Hit By a Car
- Constance M. Elson
- MGH Biostatistics Center
2- Data was collected for an NIGMS-funded multi-site
study Inflammation and the Host Response to
Injury (U54GM62119). - 160 trauma patients at 7 hospitals. Study design
ensures that most of the injuries result from
motor vehicle accidents or falls. - This talk will demonstrate the use of several SAS
statistical procedures to explore the clinical
data for these patients.
3Clinical Variables For This Talk
- (Categorical/Ordinal Variables underlined)
- Demographic age , agecat (under 30, under 45,
old), male, smoker - Injury and Initial Response injury type (eg
fall, MVC-occupant, motorcycle, MVC-pedestrian),
delay, admit onvent, apache score, blood
pressure in ER, pre-hospital heart rate, worst
base deficit 0-12 hours, transfused blood 0-12
hours - Outcome hospital length of stay, complications,
multiple organ failure (MOF), death
4PROC CORR gives a quick overview of relationships
among data fields
- proc corr database outp corr1
- var age male smoker agecat delay
- injtype adonvent apache bp heartrate
- wbd12 blood12 hosplos complics MMOF
- diedb
- run
5PROC CORR Output
- Pearson Correlation Coefficients, N 158
- Prob gt r under H0 Rho0
- age male
smoker etc . . . -
- age 1.00 0.00023 -0.05896
lt- correl coeff - 0.9977
0.4618 lt- significance -
- male 0.00023 1.00 -0.03424
- 0.9977
0.6693 -
- etc ...
6Visual explanation of correlation
7Correlations for Demographic Variables
8Correlations for Injury Variables
9Correlations for Outcome Variables
10Data Type Suggests Statistical Method
11Regression Methods
- Hypothesis There is a significant linear
relationship between hospital length of stay and
APACHE injury score. - proc glm database
- model hosplos apache
- ods output modelANOVA tmp2
- ods output parameterEstimatestmp3
- run quit
12PROC GLM Regression Output
- Dependent Variable hosplos
-
Sum of - Source DF
Squares Mean Square F Value Pr gt F - Model 1
2294.91662 2294.91662 7.11
0.0085 - Error 156
50383.46945 322.97096 - Corrected Total 157
52678.38608 -
- Source DF Type
III SS Mean Square F Value Pr gt F - apache 1
2294.916622 2294.916622 7.11
0.0085 -
-
Standard - Parameter Estimate
Error t Value Pr gt t - Intercept 7.493583706
6.65862948 1.13 0.2622 - apache 0.636092377
0.23862639 2.67 0.0085 - Conclusion Length of hospital stay is linearly
related to the APACHE score. Each 1 unit
increase in score leads to .6 extra day in
hospital, on average.
13Linear Regression for Hospital Length of Stay
14Type I vs Type III Error
- proc glm database
- model hosplos blood12 apache etc .. .
- note more than 1 covariate
-
- OUTPUT
- Source DF Squares
Mean Square F Value Pr gt F - Model 2 2879.01
1439.50680 4.48 0.0128 -
- Source DF Type I SS
Mean Square F Value Pr gt F - blood12 1 1443.33
1443.336 4.49 0.0356 - apache 1 1435.677
1435.677 4.47 0.0361 -
- Source DF Type III
SS Mean Square F Value Pr gt F blood12
1 584.096987
584.096987 1.82 0.1795 - apache 1
1435.677181 1435.677181 4.47
0.0361 -
- Parameter Estimate Error t Value
Pr gt t - Intercept 7.94911 6.6498 1.20
0.2338 - blood12 0.00099 0.0007 1.35
0.1795 - apache 0.5299 0.2506
2.11 0.0361
15Selection Methods for Linear Regression
- proc reg database
- model hosplos age male delay adonvent wbd12
blood12 - / selection forward slentry
.1 - run
- OUTPUT
- Forward Selection Step 2 Variable
blood12 Entered - Parameter
Standard - Variable Estimate Error
Type II SS F Value Pr gt F - Intercept 23.91 2.58
27651 85.51 lt.0001 - smoker -5.99 2.93
1349.87 4.17 0.0427 - blood12 0.00137 0.000702
1236.99 3.83 0.0523 -
- No other variable met the 0.1000
significance level for entry into the model. -
- Summary of Forward Selection
- Variable Partial
Model - Step Entered R-Square
R-Square C(p) F Value Pr gt F -
16ANOVA Methods
- Question Does pre-hospital heart rate vary
significantly by age category? - proc glm database
- class statement makes this an ANOVA model
- class agecat
- model heartrate agecat
- run
- proc anova database
- class agecat
- model heartrate agecat
- run
-
17ANOVA Output
- Class Levels Values
- agecat 3 1 2 3
- Dependent Variable heartrate
-
- Source DF SS Mean
Square F Value Pr gt F - Model 2 5106.91
2553.45 5.10 0.0073 - Error 143 71649.33
501.04 - Total 145 76756.24
-
- Source DF Type III SS Mean
Square F Value Pr gt F - agecat 2 5106.91
2553.45 5.10 0.0073 - Conclusion At least one of the age categories
had a significantly different pre-hospital heart
rate.
18ANOVA PLOT
19ANOVA vs LINEAR REGRESSION
- Omitting the class statement in proc glm gives a
regression model with the same - output as the ANOVA model but with estimates
for model coefficients - Standard
- Parameter Estimate Error t Value
Pr gt t - Intercept 133.53 4.64
28.77 lt.0001 - agecat -7.66 2.391
-3.20 0.0017 - However if a regression model is your intention,
using the continuous variable age instead of
the ordinal variable agecat may give slightly
better results - Standard
- Parameter Estimate Error t
Value Pr gt t - Intercept 138.018 5.871
23.51 lt.0001 - age -0.534 0.164
-3.25 0.0014
20Linear Regression Ordinal vs Continuous
21Contingency Table Methods
- Question Are the types of injury the same for
different age groups? - proc freq database
- tables agecatinjtype / nopercent norow
- where injtype in (1,3,4,6)
- exact fisher
- output outtemp exact
- run
- Fisher exact statistics are useful for sparse
contingency tables. They are computed by
creating all possible urn models of the data,
using balls of appropriate types. This takes a
LOOOOOONG time to do for 3 age categories and 7
injury types so we shortened it. Alternatively,
we could just replace exact fisher with exact
chisq.
22PROC FREQ Output
- Injtype 1fall, 3MVC Occupant, 4Motorcycle,
6MVC Pedestrian
23PROC FREQ Output Statistics
- Statistics for Table of AGECAT by INJTYPE
-
- Statistic
DF Value Prob - Chi-Square
6 10.5110 0.1047 - Likelihood Ratio Chi-Square 6
12.2874 0.0559 - Mantel-Haenszel Chi-Square 1
5.4840 0.0192 - Phi Coefficient
0.2683 -
- WARNING 33 of the cells have expected
counts less - than 5. Chi-Square may not be a valid
test. -
- Fisher's Exact Test
- Table Probability (P) 1.366E-06
- Pr lt P 0.0950
- Conclusion The distribution of injury type is
only weakly significantly different for the
different age groups.
24Logistic Regression Methods
- Question Can we predict the probability of
death based on 0-12 hour worst base deficit? - Logistic regression models are more powerful than
linear models when the independent variable is
binary or takes only a few values. - proc logistic dataepi descending
- model diedb wbd12
- run
- OUTPUT Response Profile
- Ordered Value
diedb Total Frequency - 1
1 170 - 2
0 693 - Probability modeled is
diedb1. - Model Convergence Status Convergence criterion
(GCONV1E-8) satisfied. - Model Fit Statistics . . .
25PROC LOGISTIC Output, contd
- Testing Global Null Hypothesis BETA0
- Test Chi-Square
DF Pr gt ChiSq - Likelihood Ratio 141.591 1
lt.0001 - Score 148.60
1 lt.0001 - Wald 110.334
1 lt.0001 -
- Analysis of Maximum Likelihood Estimates
- Standard Wald
- Parameter DF Estimate Error
Chi-Square Pr gt ChiSq - Intercept 1 -3.6282 0.2463
217.08 lt.0001 - wbd12 1 0.2281 0.0217
110.33 lt.0001 - Odds Ratio Estimates
- Point
95 Wald - Effect Estimate Confidence
Limits - wbd12 1.256 1.204
1.311 - Conclusion There is a significant relationship
between probability of death and 0-12 hour worst
base deficit for every unit increase in the
worst base deficit, the odds of death increase by
25.6
26Mortality by WBD12, with Lin Reg Line
27Probability of Death by WBD Group
28Linear Regression for Probability of Death by WBD
Group
29Logistic Regression for Probability of Death by
WBD Group
30How Logistic Regression Works
- Assume covariate (worst base deficit) x.
- Let p(x) Probability of event (death) when
covariate x - Odds of event are p(x) / (1 p(x)).
- Log-odds of event L(x) ln p(x)/ (1-p(x)) .
Also called logit ( p(x) ). - Logistic regression model says that the log-odds
of event are linear L(x) a bx
31How Logistic Regression Works, contd
- PROC LOGISTIC computes L(x) and estimates the
coefficients a and b for the linear model for
L(x). - Example event death, x wbd12.
- Proc Logistic gave the estimate L(x) -3.6
.228 x. - High school algebra (challenge!) shows that this
is equivalent to -
- Graph of p(x)