Title: EP520 Introductory Biostatistics
1EP520Introductory Biostatistics
- Lecture 25
- Regression and
Correlation Methods (Chp. 11) - Multiple linear regression
- Estimation of parameters for multiple linear
regression - Inferences for multiple linear regression
- Multiple R-squared (Multiple correlation
coefficient, for - assessing goodness of fit)
2 Multiple Regression
3(No Transcript)
4(No Transcript)
5 Partial Regression Coefficients
6(No Transcript)
7(No Transcript)
8Systolic BP example for multiple linear regression
. list --------------------------
brthwgt agedys sysbp
-------------------------- 1. 135
3 89 2. 120 4 90
3. 100 3 83 4. 105
2 77 5. 130 4 92
-------------------------- 6.
125 5 98 7. 125 2
82 8. 105 3 85 9.
120 5 96 10. 90 4
95 -------------------------- 11.
120 2 80 12. 95
3 79 13. 120 3 86
14. 150 4 97 15. 160
3 92 --------------------------
16. 125 3 88
--------------------------
9. regress sysbp brthwgt agedys Source
SS df MS Number of
obs 16 -----------------------------------
-------- F( 2, 13) 48.08
Model 591.03564 2 295.51782
Prob gt F 0.0000 Residual
79.9018602 13 6.14629694 R-squared
0.8809 ------------------------------------
------- Adj R-squared 0.8626
Total 670.9375 15 44.7291667
Root MSE 2.4792 -------------------------
--------------------------------------------------
--- sysbp Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- brthwgt .1255833 .0343362
3.66 0.003 .0514044 .1997621
agedys 5.887719 .6802051 8.66 0.000
4.418225 7.357213 _cons 53.45019
4.531889 11.79 0.000 43.65964
63.24074 -----------------------------------------
-------------------------------------
10. graph twoway scatter sysbp agedys
11. graph twoway scatter sysbp brthwgt
12. regress sysbp brthwgt agedys Source
SS df MS Number of
obs 16 -----------------------------------
-------- F( 2, 13) 48.08
Model 591.03564 2 295.51782
Prob gt F 0.0000 Residual
79.9018602 13 6.14629694 R-squared
0.8809 ------------------------------------
------- Adj R-squared 0.8626
Total 670.9375 15 44.7291667
Root MSE 2.4792 -------------------------
--------------------------------------------------
--- sysbp Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- brthwgt .1255833 .0343362
3.66 0.003 .0514044 .1997621
agedys 5.887719 .6802051 8.66 0.000
4.418225 7.357213 _cons 53.45019
4.531889 11.79 0.000 43.65964
63.24074 -----------------------------------------
-------------------------------------
13. regress sysbp brthwgt agedys Source
SS df MS Number of
obs 16 -----------------------------------
-------- F( 2, 13) 48.08
Model 591.03564 2 295.51782
Prob gt F 0.0000 Residual
79.9018602 13 6.14629694 R-squared
0.8809 ------------------------------------
------- Adj R-squared 0.8626
Total 670.9375 15 44.7291667
Root MSE 2.4792 -------------------------
--------------------------------------------------
--- sysbp Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- brthwgt .1255833 .0343362
3.66 0.003 .0514044 .1997621
agedys 5.887719 .6802051 8.66 0.000
4.418225 7.357213 _cons 53.45019
4.531889 11.79 0.000 43.65964
63.24074 -----------------------------------------
-------------------------------------
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18Systolic BP example including interaction term
. generate age_bwt agedys brthwgt . regress
sysbp brthwgt agedys age_bwt Source
SS df MS Number of obs
16 ---------------------------------------
---- F( 3, 12) 46.87
Model 618.183307 3 206.061102
Prob gt F 0.0000 Residual
52.7541926 12 4.39618272 R-squared
0.9214 ------------------------------------
------- Adj R-squared 0.9017
Total 670.9375 15 44.7291667
Root MSE 2.0967 -------------------------
--------------------------------------------------
--- sysbp Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- brthwgt .551233 .173731
3.17 0.008 .1727057 .9297603
agedys 21.2873 6.223628 3.42 0.005
7.727175 34.84742 age_bwt -.1282085
.0515927 -2.49 0.029 -.2406194
-.0157976 _cons 2.551581 20.83776
0.12 0.905 -42.85 47.95316 -----------
--------------------------------------------------
-----------------
19(No Transcript)
20Regression of FEV on Height (in.) for Boys in
Tecumseh, Michigan Unrestricted Ages 3 - 19
. regress fev hgt if sex1 Source
SS df MS Number of obs
336 -----------------------------------------
-- F( 1, 334) 1174.58 Model
262.711233 1 262.711233 Prob gt
F 0.0000 Residual 74.7035646 334
.223663367 R-squared
0.7786 ------------------------------------------
- Adj R-squared 0.7779 Total
337.414798 335 1.00720835 Root
MSE .47293 ------------------------------
------------------------------------------------
fev Coef. Std. Err. t
Pgtt 95 Conf. Interval ------------------
--------------------------------------------------
--------- hgt .1398832 .0040815
34.27 0.000 .1318544 .1479119
_cons -5.863848 .2544698 -23.04 0.000
-6.364414 -5.363283 ----------------------------
--------------------------------------------------
. graph twoway scatter fev hgt if sex1
lfit fev hgt
Least Squares Regression Equation FEV -5.8638
0.1399 height(in)
21. graph twoway scatter fev height if sex1
lfit fev height Least Squares Regression
Equation FEV -5.8638 0.1399 height(in)
22. twoway lfitci fev hgt
23. summarize age if sex1 Variable
Obs Mean Std. Dev. Min
Max ---------------------------------------------
------------------------ age 336
10.01488 2.975986 3 19 .
generate agelinage-10.01488 if sex1 (318
missing values generated) . generate
agequadagelin2 if sex1 (318 missing values
generated) . summarize if sex1 Variable
Obs Mean Std. Dev. Min
Max -------------------------------------------
-------------------------- id
336 36232.54 23700.52 201
90001 age 336 10.01488
2.975986 3 19 fev
336 2.812446 1.003598 .796
5.793 hgt 336 62.0253
6.330696 47 74 sex
336 1 0 1
1 -----------------------------------------------
---------------------- smoke 336
.077381 .2675934 0 1
agelin 336 9.47e-07 2.975986
-7.01488 8.98512 agequad 336
8.830136 12.62822 .0002214 80.73238
Center age at 0
Compute quadratic term for age
24. generate ht_linhgt-62.0253 if sex1 (318
missing values generated) . generate
ht_quadht_lin2 if sex1 (318 missing values
generated) . summarize if sex1 Variable
Obs Mean Std. Dev. Min
Max -------------------------------------------
-------------------------- id
336 36232.54 23700.52 201
90001 age 336 10.01488
2.975986 3 19 fev
336 2.812446 1.003598 .796
5.793 hgt 336 62.0253
6.330696 47 74 sex
336 1 0 1
1 -----------------------------------------------
---------------------- smoke 336
.077381 .2675934 0 1
agelin 336 9.47e-07 2.975986
-7.01488 8.98512 agequad 336
8.830136 12.62822 .0002214 80.73238
ht_lin 336 -2.41e-06 6.330696
-15.0253 11.9747 ht_quad 336
39.95844 41.29339 .0006401 225.7596
Center height at 0
Compute quadratic term for height
25. regress fev ht_lin if sex1 Source
SS df MS Number of
obs 336 -----------------------------------
-------- F( 1, 334) 1174.58
Model 262.711233 1 262.711233
Prob gt F 0.0000 Residual
74.7035646 334 .223663367 R-squared
0.7786 ------------------------------------
------- Adj R-squared 0.7779
Total 337.414798 335 1.00720835
Root MSE .47293 -------------------------
--------------------------------------------------
--- fev Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ht_lin .1398832 .0040815
34.27 0.000 .1318544 .1479119
_cons 2.812447 .0258005 109.01 0.000
2.761695 2.863199 ----------------------------
--------------------------------------------------
. regress fev ht_lin ht_quad if sex1
Source SS df MS
Number of obs 336 -------------------------
------------------ F( 2, 333)
680.88 Model 271.116793 2
135.558397 Prob gt F 0.0000
Residual 66.2980047 333 .199093107
R-squared 0.8035 ------------------------
------------------- Adj R-squared
0.8023 Total 337.414798 335
1.00720835 Root MSE
.4462 -------------------------------------------
----------------------------------- fev
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
--------------------------------------------
ht_lin .1450898 .0039333 36.89 0.000
.1373525 .152827 ht_quad .0039182
.000603 6.50 0.000 .002732
.0051044 _cons 2.655882 .0342511
77.54 0.000 2.588506 2.723258 -----------
--------------------------------------------------
-----------------
26. regress fev ht_lin ht_quad agelin if sex1
Source SS df MS
Number of obs 336 -----------------------
-------------------- F( 3, 332)
501.58 Model 276.425205 3
92.1417351 Prob gt F 0.0000
Residual 60.9895927 332 .183703592
R-squared 0.8192 ------------------------
------------------- Adj R-squared
0.8176 Total 337.414798 335
1.00720835 Root MSE
.42861 ------------------------------------------
------------------------------------ fev
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
--------------------------------------------
ht_lin .1146489 .0068076 16.84 0.000
.1012575 .1280402 ht_quad .0037141
.0005805 6.40 0.000 .0025722
.004856 agelin .0769142 .0143081
5.38 0.000 .0487681 .1050602
_cons 2.664039 .0329357 80.89 0.000
2.59925 2.728828 ----------------------------
--------------------------------------------------
27. regress fev ht_lin ht_quad agelin agequad if
sex1 Source SS df MS
Number of obs
336 -------------------------------------------
F( 4, 331) 383.92 Model
277.584619 4 69.3961548 Prob gt F
0.0000 Residual 59.8301789 331
.180755827 R-squared
0.8227 ------------------------------------------
- Adj R-squared 0.8205 Total
337.414798 335 1.00720835 Root
MSE .42515 ------------------------------
------------------------------------------------
fev Coef. Std. Err. t
Pgtt 95 Conf. Interval ------------------
--------------------------------------------------
--------- ht_lin .1218146 .0073215
16.64 0.000 .1074121 .1362171
ht_quad .0027855 .0006826 4.08 0.000
.0014427 .0041283 agelin .0543468
.0167582 3.24 0.001 .0213808
.0873127 agequad .0063499 .0025072
2.53 0.012 .0014178 .011282
_cons 2.645071 .0335178 78.92 0.000
2.579136 2.711006 ----------------------------
--------------------------------------------------
28Goodness of fit criteria Plot of studentized
residuals versus predicted values of systolic
blood pressure . predict yhat (option xb
assumed fitted values) . label variable yhat
"Predicted value of sbp" . predict e, rstudent .
label variable e "Studentized residual" . graph
twoway scatter e yhat
29Plot of studentized residual versus age . graph
twoway scatter e agedys
30. graph twoway scatter e brthwgt
31 Standardized Regression Coefficient
32 Standardized Regression Coefficient
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37This shows equivalently of t-test and regression
with one indicator variable (2 groups). Recall
that two-sample t-test with independent samples
is a special case of one way (fixed effects)
ANOVA.
38 Dummy variables
Extending this relationship Multiple-group
analysis Goal to show that one-way ANOVA is a
special case of regression. Use of dummy
variables to represent a categorical variable
with k categories.
39Use of dummy variables to represent a categorical
variable with k categories. Categorical
variable C with k categories
Category k is reference (or baseline) group.
40Treatment Category Dummy variable x1
x2 x3 x4 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4
0 0 0 1 5 0 0 0 0
41(No Transcript)
42 Anova and Regression are the same test
43(No Transcript)
44(No Transcript)
45EP520 Introductory Biostatistics
- Lecture 26
- Regression and Correlation Methods
(chapter 11) - Relationship between Regression and ANOVA
- Analysis of covariance
- Two-way ANOVA (brief)
- Testing for correlations
46Relationship between Multiple Linear Regression
and One Way ANOVA
- E(y) µ aj
- Want to compare underlying means among k
- groups where observations in group j are
- N(µj µ aj, s2).
- To test H0 aj 0, for all j 1,,k vs.
- H0 at least one of the ais are different.
47- Two Equivalent Methods
- Overall F Test for One-Way ANOVA (Fixed)
- Set up Multiple Regression Model
- Between SS Regression SS
- Within SS Residual SS
- F Statistics and p-values are same
48(No Transcript)
49(No Transcript)
50- . Comparing ANOVA and Regression - Aspirin
Example - . list
- --------------------------------------
- drug fevreduc ind1 ind2 ind5
- --------------------------------------
- 1. 1 2 1 0 0
- 2. 1 1.6 1 0 0
- 3. 1 2.1 1 0 0
- 4. 1 .6 1 0 0
- 5. 1 1.3 1 0 0
- --------------------------------------
- 6. 2 .5 0 1 0
- 7. 2 1.2 0 1 0
- 8. 2 .3 0 1 0
- 9. 2 .2 0 1 0
- 10. 2 -.4 0 1 0
- --------------------------------------
- 11. 3 1.1 0 0 1
51- . anova fevreduc drug, detail reg
- Factor Value Value
Value Value - ------------------------------------------------
------------------------ - drug 1 1 2 2 3 3
- Source SS df MS
Number of obs 15 - -------------------------------------------
F( 2, 12) 6.79 - Model 5.82933309 2 2.91466655
Prob gt F 0.0106 - Residual 5.14800001 12 .429000001
R-squared 0.5310 - -------------------------------------------
Adj R-squared 0.4529 - Total 10.9773331 14 .784095222
Root MSE .65498 - --------------------------------------------------
---------------------------- - fevreduc Coef. Std. Err. t
Pgtt 95 Conf. Interval - --------------------------------------------------
---------------------------- - _cons .08 .2929164 0.27 0.789
-.5582099 .71821 - Drug 1 1.44 .4142463 3.48 0.005
.5374348 2.342565
52- . regress fevreduc ind1 ind2
- Source SS df MS
Number of obs 15 - -------------------------------------------
F( 2, 12) 6.79 - Model 5.82933309 2 2.91466655
Prob gt F 0.0106 - Residual 5.14800001 12 .429000001
R-squared 0.5310 - -------------------------------------------
Adj R-squared 0.4529 - Total 10.9773331 14 .784095222
Root MSE .65498 - --------------------------------------------------
---------------------------- - fevreduc Coef. Std. Err. t
Pgtt 95 Conf. Interval - -------------------------------------------------
---------------------------- - ind1 1.44 .4142463 3.48
0.005 .5374348 2.342565 - ind2 .28 .4142463 0.68
0.512 -.6225652 1.182565 - _cons .08 .2929164 0.27
0.789 -.5582099 .71821 - --------------------------------------------------
----------------------------
53- . regress fevreduc ind1 ind2 ind3
- Source SS df MS
Number of obs 15 - -------------------------------------------
F( 2, 12) 6.79 - Model 5.82933309 2 2.91466655
Prob gt F 0.0106 - Residual 5.14800001 12 .429000001
R-squared 0.5310 - -------------------------------------------
Adj R-squared 0.4529 - Total 10.9773331 14 .784095222
Root MSE .65498 - --------------------------------------------------
---------------------------- - fevreduc Coef. Std. Err. t
Pgtt 95 Conf. Interval - -------------------------------------------------
---------------------------- - ind1 1.16 .4142463 2.80
0.016 .2574348 2.062565 - ind2 (dropped)
- ind3 -.28 .4142463 -0.68
0.512 -1.182565 .6225652 - _cons .36 .2929164 1.23
0.243 -.2782099 .99821 - --------------------------------------------------
----------------------------
54 Analysis of Covariance
- Say we want to compare k groups using
- ANOVA (regression with dummies) but also
- need to adjust for differences in age in the
- groups.
- y a ß1x1 ß2x2 ßk-1xk-1 age ßAGE
- This is called an Analysis of Covariance Model
55Two Way ANOVAStudy performed to look at level of
Systolic BP for 3 Groups
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67(No Transcript)
68Correlation
- Problem Interested in the relationship
- between two variables, Height and Weight,
- Age and Systolic BP, REM Sleep and
- Stress, etc. Want to quantify this
- relationship. The Correlation Coefficient
- (Pearson Correlation) is useful for this
- purpose.
69(No Transcript)
70(No Transcript)
71(No Transcript)
72- Note Plot of x vs. y
- looking like a straight
- Line is indicative of high
- correlation, does not
- need to be 45 degree
- Line.
73(No Transcript)
74(No Transcript)
75(No Transcript)
76(No Transcript)
77(No Transcript)
78(No Transcript)
79(No Transcript)
80(No Transcript)
81(No Transcript)
82(No Transcript)
83(No Transcript)
84(No Transcript)
85(No Transcript)
86(No Transcript)
87- One Sample T test for Correlation Coefficient
- . display 2(1-normprob(0.25(100-2)0.5/(1-(0.25)
2)0.5)) - .01058714
- One Sample Z test for Correlation Coefficient
- . display 0.5ln((10.38)/(1-0.38))
- .40005965
- . display 0.5ln((10.5)/(1-0.5))
- .54930614
- . display 2normprob(-1.4699075)
- .14158681
88(No Transcript)
89(No Transcript)
90(No Transcript)
91(No Transcript)
92(No Transcript)
93- Two Sample tests for Correlations
- Example
- Two groups of children. One group lives with
- natural parents, other group lives with adopted
- parents.
- Is correlation between BP (SBP?) of mother
- and child different for these groups? Would
- suggest genetic link.
94(No Transcript)
95(No Transcript)
96(No Transcript)