EP520 Introductory Biostatistics - PowerPoint PPT Presentation

1 / 96
About This Presentation
Title:

EP520 Introductory Biostatistics

Description:

Estimation of parameters for multiple linear regression ... Diet (NOR) 105.2 (37) 115.5 (26) Lacto Vegetarian (LV) 102.6 (88) 109.9 (138) ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 97
Provided by: rham6
Category:

less

Transcript and Presenter's Notes

Title: EP520 Introductory Biostatistics


1
EP520Introductory Biostatistics
  • Lecture 25
  • Regression and
    Correlation Methods (Chp. 11)
  • Multiple linear regression
  • Estimation of parameters for multiple linear
    regression
  • Inferences for multiple linear regression
  • Multiple R-squared (Multiple correlation
    coefficient, for
  • assessing goodness of fit)

2
Multiple Regression
3
(No Transcript)
4
(No Transcript)
5
Partial Regression Coefficients
6
(No Transcript)
7
(No Transcript)
8
Systolic BP example for multiple linear regression
. list --------------------------
brthwgt agedys sysbp
-------------------------- 1. 135
3 89 2. 120 4 90
3. 100 3 83 4. 105
2 77 5. 130 4 92
-------------------------- 6.
125 5 98 7. 125 2
82 8. 105 3 85 9.
120 5 96 10. 90 4
95 -------------------------- 11.
120 2 80 12. 95
3 79 13. 120 3 86
14. 150 4 97 15. 160
3 92 --------------------------
16. 125 3 88
--------------------------
9
. regress sysbp brthwgt agedys Source
SS df MS Number of
obs 16 -----------------------------------
-------- F( 2, 13) 48.08
Model 591.03564 2 295.51782
Prob gt F 0.0000 Residual
79.9018602 13 6.14629694 R-squared
0.8809 ------------------------------------
------- Adj R-squared 0.8626
Total 670.9375 15 44.7291667
Root MSE 2.4792 -------------------------
--------------------------------------------------
--- sysbp Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- brthwgt .1255833 .0343362
3.66 0.003 .0514044 .1997621
agedys 5.887719 .6802051 8.66 0.000
4.418225 7.357213 _cons 53.45019
4.531889 11.79 0.000 43.65964
63.24074 -----------------------------------------
-------------------------------------
10
. graph twoway scatter sysbp agedys
11
. graph twoway scatter sysbp brthwgt
12
. regress sysbp brthwgt agedys Source
SS df MS Number of
obs 16 -----------------------------------
-------- F( 2, 13) 48.08
Model 591.03564 2 295.51782
Prob gt F 0.0000 Residual
79.9018602 13 6.14629694 R-squared
0.8809 ------------------------------------
------- Adj R-squared 0.8626
Total 670.9375 15 44.7291667
Root MSE 2.4792 -------------------------
--------------------------------------------------
--- sysbp Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- brthwgt .1255833 .0343362
3.66 0.003 .0514044 .1997621
agedys 5.887719 .6802051 8.66 0.000
4.418225 7.357213 _cons 53.45019
4.531889 11.79 0.000 43.65964
63.24074 -----------------------------------------
-------------------------------------
13
. regress sysbp brthwgt agedys Source
SS df MS Number of
obs 16 -----------------------------------
-------- F( 2, 13) 48.08
Model 591.03564 2 295.51782
Prob gt F 0.0000 Residual
79.9018602 13 6.14629694 R-squared
0.8809 ------------------------------------
------- Adj R-squared 0.8626
Total 670.9375 15 44.7291667
Root MSE 2.4792 -------------------------
--------------------------------------------------
--- sysbp Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- brthwgt .1255833 .0343362
3.66 0.003 .0514044 .1997621
agedys 5.887719 .6802051 8.66 0.000
4.418225 7.357213 _cons 53.45019
4.531889 11.79 0.000 43.65964
63.24074 -----------------------------------------
-------------------------------------
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
Systolic BP example including interaction term
. generate age_bwt agedys brthwgt . regress
sysbp brthwgt agedys age_bwt Source
SS df MS Number of obs
16 ---------------------------------------
---- F( 3, 12) 46.87
Model 618.183307 3 206.061102
Prob gt F 0.0000 Residual
52.7541926 12 4.39618272 R-squared
0.9214 ------------------------------------
------- Adj R-squared 0.9017
Total 670.9375 15 44.7291667
Root MSE 2.0967 -------------------------
--------------------------------------------------
--- sysbp Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- brthwgt .551233 .173731
3.17 0.008 .1727057 .9297603
agedys 21.2873 6.223628 3.42 0.005
7.727175 34.84742 age_bwt -.1282085
.0515927 -2.49 0.029 -.2406194
-.0157976 _cons 2.551581 20.83776
0.12 0.905 -42.85 47.95316 -----------
--------------------------------------------------
-----------------
19
(No Transcript)
20
Regression of FEV on Height (in.) for Boys in
Tecumseh, Michigan Unrestricted Ages 3 - 19
. regress fev hgt if sex1 Source
SS df MS Number of obs
336 -----------------------------------------
-- F( 1, 334) 1174.58 Model
262.711233 1 262.711233 Prob gt
F 0.0000 Residual 74.7035646 334
.223663367 R-squared
0.7786 ------------------------------------------
- Adj R-squared 0.7779 Total
337.414798 335 1.00720835 Root
MSE .47293 ------------------------------
------------------------------------------------
fev Coef. Std. Err. t
Pgtt 95 Conf. Interval ------------------
--------------------------------------------------
--------- hgt .1398832 .0040815
34.27 0.000 .1318544 .1479119
_cons -5.863848 .2544698 -23.04 0.000
-6.364414 -5.363283 ----------------------------
--------------------------------------------------
. graph twoway scatter fev hgt if sex1
lfit fev hgt
Least Squares Regression Equation FEV -5.8638
0.1399 height(in)
21
. graph twoway scatter fev height if sex1
lfit fev height Least Squares Regression
Equation FEV -5.8638 0.1399 height(in)
22
. twoway lfitci fev hgt
23
. summarize age if sex1 Variable
Obs Mean Std. Dev. Min
Max ---------------------------------------------
------------------------ age 336
10.01488 2.975986 3 19 .
generate agelinage-10.01488 if sex1 (318
missing values generated) . generate
agequadagelin2 if sex1 (318 missing values
generated) . summarize if sex1 Variable
Obs Mean Std. Dev. Min
Max -------------------------------------------
-------------------------- id
336 36232.54 23700.52 201
90001 age 336 10.01488
2.975986 3 19 fev
336 2.812446 1.003598 .796
5.793 hgt 336 62.0253
6.330696 47 74 sex
336 1 0 1
1 -----------------------------------------------
---------------------- smoke 336
.077381 .2675934 0 1
agelin 336 9.47e-07 2.975986
-7.01488 8.98512 agequad 336
8.830136 12.62822 .0002214 80.73238
Center age at 0
Compute quadratic term for age
24
. generate ht_linhgt-62.0253 if sex1 (318
missing values generated) . generate
ht_quadht_lin2 if sex1 (318 missing values
generated) . summarize if sex1 Variable
Obs Mean Std. Dev. Min
Max -------------------------------------------
-------------------------- id
336 36232.54 23700.52 201
90001 age 336 10.01488
2.975986 3 19 fev
336 2.812446 1.003598 .796
5.793 hgt 336 62.0253
6.330696 47 74 sex
336 1 0 1
1 -----------------------------------------------
---------------------- smoke 336
.077381 .2675934 0 1
agelin 336 9.47e-07 2.975986
-7.01488 8.98512 agequad 336
8.830136 12.62822 .0002214 80.73238
ht_lin 336 -2.41e-06 6.330696
-15.0253 11.9747 ht_quad 336
39.95844 41.29339 .0006401 225.7596
Center height at 0
Compute quadratic term for height
25
. regress fev ht_lin if sex1 Source
SS df MS Number of
obs 336 -----------------------------------
-------- F( 1, 334) 1174.58
Model 262.711233 1 262.711233
Prob gt F 0.0000 Residual
74.7035646 334 .223663367 R-squared
0.7786 ------------------------------------
------- Adj R-squared 0.7779
Total 337.414798 335 1.00720835
Root MSE .47293 -------------------------
--------------------------------------------------
--- fev Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- ht_lin .1398832 .0040815
34.27 0.000 .1318544 .1479119
_cons 2.812447 .0258005 109.01 0.000
2.761695 2.863199 ----------------------------
--------------------------------------------------
. regress fev ht_lin ht_quad if sex1
Source SS df MS
Number of obs 336 -------------------------
------------------ F( 2, 333)
680.88 Model 271.116793 2
135.558397 Prob gt F 0.0000
Residual 66.2980047 333 .199093107
R-squared 0.8035 ------------------------
------------------- Adj R-squared
0.8023 Total 337.414798 335
1.00720835 Root MSE
.4462 -------------------------------------------
----------------------------------- fev
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
--------------------------------------------
ht_lin .1450898 .0039333 36.89 0.000
.1373525 .152827 ht_quad .0039182
.000603 6.50 0.000 .002732
.0051044 _cons 2.655882 .0342511
77.54 0.000 2.588506 2.723258 -----------
--------------------------------------------------
-----------------
26
. regress fev ht_lin ht_quad agelin if sex1
Source SS df MS
Number of obs 336 -----------------------
-------------------- F( 3, 332)
501.58 Model 276.425205 3
92.1417351 Prob gt F 0.0000
Residual 60.9895927 332 .183703592
R-squared 0.8192 ------------------------
------------------- Adj R-squared
0.8176 Total 337.414798 335
1.00720835 Root MSE
.42861 ------------------------------------------
------------------------------------ fev
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
--------------------------------------------
ht_lin .1146489 .0068076 16.84 0.000
.1012575 .1280402 ht_quad .0037141
.0005805 6.40 0.000 .0025722
.004856 agelin .0769142 .0143081
5.38 0.000 .0487681 .1050602
_cons 2.664039 .0329357 80.89 0.000
2.59925 2.728828 ----------------------------
--------------------------------------------------

27
. regress fev ht_lin ht_quad agelin agequad if
sex1 Source SS df MS
Number of obs
336 -------------------------------------------
F( 4, 331) 383.92 Model
277.584619 4 69.3961548 Prob gt F
0.0000 Residual 59.8301789 331
.180755827 R-squared
0.8227 ------------------------------------------
- Adj R-squared 0.8205 Total
337.414798 335 1.00720835 Root
MSE .42515 ------------------------------
------------------------------------------------
fev Coef. Std. Err. t
Pgtt 95 Conf. Interval ------------------
--------------------------------------------------
--------- ht_lin .1218146 .0073215
16.64 0.000 .1074121 .1362171
ht_quad .0027855 .0006826 4.08 0.000
.0014427 .0041283 agelin .0543468
.0167582 3.24 0.001 .0213808
.0873127 agequad .0063499 .0025072
2.53 0.012 .0014178 .011282
_cons 2.645071 .0335178 78.92 0.000
2.579136 2.711006 ----------------------------
--------------------------------------------------

28
Goodness of fit criteria Plot of studentized
residuals versus predicted values of systolic
blood pressure . predict yhat (option xb
assumed fitted values) . label variable yhat
"Predicted value of sbp" . predict e, rstudent .
label variable e "Studentized residual" . graph
twoway scatter e yhat
29
Plot of studentized residual versus age . graph
twoway scatter e agedys
30
. graph twoway scatter e brthwgt
31
Standardized Regression Coefficient
32
Standardized Regression Coefficient
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
This shows equivalently of t-test and regression
with one indicator variable (2 groups). Recall
that two-sample t-test with independent samples
is a special case of one way (fixed effects)
ANOVA.
38
Dummy variables
Extending this relationship Multiple-group
analysis Goal to show that one-way ANOVA is a
special case of regression. Use of dummy
variables to represent a categorical variable
with k categories.
39
Use of dummy variables to represent a categorical
variable with k categories. Categorical
variable C with k categories
Category k is reference (or baseline) group.
40
Treatment Category Dummy variable x1
x2 x3 x4 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4
0 0 0 1 5 0 0 0 0
41
(No Transcript)
42
Anova and Regression are the same test
43
(No Transcript)
44
(No Transcript)
45
EP520 Introductory Biostatistics
  • Lecture 26
  • Regression and Correlation Methods
    (chapter 11)
  • Relationship between Regression and ANOVA
  • Analysis of covariance
  • Two-way ANOVA (brief)
  • Testing for correlations

46
Relationship between Multiple Linear Regression
and One Way ANOVA
  • E(y) µ aj
  • Want to compare underlying means among k
  • groups where observations in group j are
  • N(µj µ aj, s2).
  • To test H0 aj 0, for all j 1,,k vs.
  • H0 at least one of the ais are different.

47
  • Two Equivalent Methods
  • Overall F Test for One-Way ANOVA (Fixed)
  • Set up Multiple Regression Model
  • Between SS Regression SS
  • Within SS Residual SS
  • F Statistics and p-values are same

48
(No Transcript)
49
(No Transcript)
50
  • . Comparing ANOVA and Regression - Aspirin
    Example
  • . list
  • --------------------------------------
  • drug fevreduc ind1 ind2 ind5
  • --------------------------------------
  • 1. 1 2 1 0 0
  • 2. 1 1.6 1 0 0
  • 3. 1 2.1 1 0 0
  • 4. 1 .6 1 0 0
  • 5. 1 1.3 1 0 0
  • --------------------------------------
  • 6. 2 .5 0 1 0
  • 7. 2 1.2 0 1 0
  • 8. 2 .3 0 1 0
  • 9. 2 .2 0 1 0
  • 10. 2 -.4 0 1 0
  • --------------------------------------
  • 11. 3 1.1 0 0 1

51
  • . anova fevreduc drug, detail reg
  • Factor Value Value
    Value Value
  • ------------------------------------------------
    ------------------------
  • drug 1 1 2 2 3 3
  • Source SS df MS
    Number of obs 15
  • -------------------------------------------
    F( 2, 12) 6.79
  • Model 5.82933309 2 2.91466655
    Prob gt F 0.0106
  • Residual 5.14800001 12 .429000001
    R-squared 0.5310
  • -------------------------------------------
    Adj R-squared 0.4529
  • Total 10.9773331 14 .784095222
    Root MSE .65498
  • --------------------------------------------------
    ----------------------------
  • fevreduc Coef. Std. Err. t
    Pgtt 95 Conf. Interval
  • --------------------------------------------------
    ----------------------------
  • _cons .08 .2929164 0.27 0.789
    -.5582099 .71821
  • Drug 1 1.44 .4142463 3.48 0.005
    .5374348 2.342565

52
  • . regress fevreduc ind1 ind2
  • Source SS df MS
    Number of obs 15
  • -------------------------------------------
    F( 2, 12) 6.79
  • Model 5.82933309 2 2.91466655
    Prob gt F 0.0106
  • Residual 5.14800001 12 .429000001
    R-squared 0.5310
  • -------------------------------------------
    Adj R-squared 0.4529
  • Total 10.9773331 14 .784095222
    Root MSE .65498
  • --------------------------------------------------
    ----------------------------
  • fevreduc Coef. Std. Err. t
    Pgtt 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • ind1 1.44 .4142463 3.48
    0.005 .5374348 2.342565
  • ind2 .28 .4142463 0.68
    0.512 -.6225652 1.182565
  • _cons .08 .2929164 0.27
    0.789 -.5582099 .71821
  • --------------------------------------------------
    ----------------------------

53
  • . regress fevreduc ind1 ind2 ind3
  • Source SS df MS
    Number of obs 15
  • -------------------------------------------
    F( 2, 12) 6.79
  • Model 5.82933309 2 2.91466655
    Prob gt F 0.0106
  • Residual 5.14800001 12 .429000001
    R-squared 0.5310
  • -------------------------------------------
    Adj R-squared 0.4529
  • Total 10.9773331 14 .784095222
    Root MSE .65498
  • --------------------------------------------------
    ----------------------------
  • fevreduc Coef. Std. Err. t
    Pgtt 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • ind1 1.16 .4142463 2.80
    0.016 .2574348 2.062565
  • ind2 (dropped)
  • ind3 -.28 .4142463 -0.68
    0.512 -1.182565 .6225652
  • _cons .36 .2929164 1.23
    0.243 -.2782099 .99821
  • --------------------------------------------------
    ----------------------------

54
Analysis of Covariance
  • Say we want to compare k groups using
  • ANOVA (regression with dummies) but also
  • need to adjust for differences in age in the
  • groups.
  • y a ß1x1 ß2x2 ßk-1xk-1 age ßAGE
  • This is called an Analysis of Covariance Model

55
Two Way ANOVAStudy performed to look at level of
Systolic BP for 3 Groups
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
Correlation
  • Problem Interested in the relationship
  • between two variables, Height and Weight,
  • Age and Systolic BP, REM Sleep and
  • Stress, etc. Want to quantify this
  • relationship. The Correlation Coefficient
  • (Pearson Correlation) is useful for this
  • purpose.

69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
  • Note Plot of x vs. y
  • looking like a straight
  • Line is indicative of high
  • correlation, does not
  • need to be 45 degree
  • Line.

73
(No Transcript)
74
(No Transcript)
75
(No Transcript)
76
(No Transcript)
77
(No Transcript)
78
(No Transcript)
79
(No Transcript)
80
(No Transcript)
81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
(No Transcript)
85
(No Transcript)
86
(No Transcript)
87
  • One Sample T test for Correlation Coefficient
  • . display 2(1-normprob(0.25(100-2)0.5/(1-(0.25)
    2)0.5))
  • .01058714
  • One Sample Z test for Correlation Coefficient
  • . display 0.5ln((10.38)/(1-0.38))
  • .40005965
  • . display 0.5ln((10.5)/(1-0.5))
  • .54930614
  • . display 2normprob(-1.4699075)
  • .14158681

88
(No Transcript)
89
(No Transcript)
90
(No Transcript)
91
(No Transcript)
92
(No Transcript)
93
  • Two Sample tests for Correlations
  • Example
  • Two groups of children. One group lives with
  • natural parents, other group lives with adopted
  • parents.
  • Is correlation between BP (SBP?) of mother
  • and child different for these groups? Would
  • suggest genetic link.

94
(No Transcript)
95
(No Transcript)
96
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com