Biostatistics and Computer Applications - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Biostatistics and Computer Applications

Description:

If not fit well, you may try different initial values. 24 ... Least squares fit. Regression parameters (a bk) are determined using method of least squares ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 49

Provided by: dafen

Category:

more less

Transcript and Presenter's Notes

Title: Biostatistics and Computer Applications

1
Biostatistics and Computer Applications
Correlation Analysis Nonlinear
regression Multiple regression SAS
programming 1/8/2003
2
Recap( Regression analysis)
Regression equation and standard deviation from
regression.
3
Recap (F and t test for .

The ANOVA table for regression analysis

4
Recap (Confidence intervals)
confidence interval for
Population individual observation Y 1-alpha
prediction interval
5
Recap (Confidence Intervals)
1-alpha confidence interval for intercept
1-alpha confidence interval for slope
6
Correlation analysis

Example A malacologist interested in the
morphology of West Indian chitons, Chiton
olivaceous, measured the length (X, cm) and width
(Y, cm) of the eight overlapping plates composing
the shell of 10 of these animals. Her data are
presented below

2 Ys
7
Determination coefficient

For data cant be distinguished as dependent and
independent variables, we cant use regression
equation.
We use determination coefficient (r2 ) to measure
degree of association.
r correlation coefficient.

8
Calculation of Determination coefficient
Equation 1 Equation 2
9
Determination Coefficient

Proportion of variation explained by
relationship between X Y.

0 ? r2 ? 1
10
Correlation Coefficient Values
Perfect Negative Correlation
Perfect Positive Correlation
No Correlation
-1.0
1.0
0
-.5
.5
r
Increasing degree of positive correlation
Increasing degree of negative correlation
11
Coefficient of Correlation Examples
r 1
r -1
Y
Y
X
X
r .89
r 0
Y
Y
X
X
12
Example of Coefficient of Determination

Chiton length and width

13
Test of correlation coefficient

Test if there is a linear relationship between 2
numerical variables
Hypotheses
H0 ? 0 (No Correlation)
Ha ? ? 0 (Correlation)

14
Model of linear correlation

Both X and Y are random variables and normal
distributed.
Population correlation coefficient

15
Example of test of r

Chiton
Amount of insect and PPT/T

This t is exactly the same as H0 slope0.
Check r table for significant test.
16
Confidence Interval for Correlation

Chiton

17
Relationship of regression and correlation

We can use r2 and r in the regression analysis,
but cant use regression equation in the
correlation analysis.
18
Cautions in regression and correlation analysis

Widely used and easy to be misused or
misinterpreted.
YabX and r are used to describe to linear
relationship of Y on X. r is not significant does
not mean there is no relationship between Y and
X, only means no significant linear
relationship.
A significant r does not mean that the true
relationship of Y and X IS linear. Maybe
nonlinear is better than linear one.
Even if the true relationship of Y and X is
nonlinear, we may use linear regression to
estimate or predict Y if the r is significant.
But be caution when you do extrapolation.

19
Cautions in regression and correlation analysis

A significant linear regression does not
guarantee you can use the equation to predict
practically. You need a large r (rgt0.7, 49 can
be explained by predictors). If you are sure that
there is a relationship of Y and X, but you get a
small r, this may be caused by 1) nonlinear, 2)
other important factors not included in the
model.
Control other variables to be constant if
possible, and use the equation under the same
conditions.
Sample size n should be large than 5 and design a
large range for variable X (easy to find
nonlinear relationship and decrease estimation
error)

20
SAS program

PROC CORR PROC REG.
The CORR procedure is a statistical procedure for
numeric random variables that computes Pearson
correlation coefficients, three nonparametric
measures of association and the probabilities
associated with these statistics.

21
SAS program

DATA corr
INPUT x y
DATALINES
10.7 5.8
11 6
9.5 5
11.1 6
10.3 5.3
10.7 5.8
9.9 5.2
10.6 5.7
10 5.3
12 6.3

PROC REG
MODEL yx
MODEL xy
RUN
PROC CORR
RUN

22
Types of regression analysis
Regression
2 Explanatory
1 Explanatory
Models
Variables
Variable
Simple
Multiple
Non-
Non-
Linear
Linear
Linear
Linear
23
Nonlinear regression

Under many conditions, relationship of Y and X is
not linear (curvilinear)
Use scatter plot helps to determine equation
Determine of the function mainly based on the
knowledge statistics help to estimate the
function and test the significance
Nonlinear estimations are related to initial
values of parameters in the equation. If not fit
well, you may try different initial values.

24
Example of Nonlinear regression

Growth of chicken may be describe by logistic
equation. Weight of chicken (Y) was measured at
different age (X).

25
Example of Nonlinear regression

PROC NLIN
PROC NLIN METHODDUDGAUSSetc
MODEL dependentexpression
PARAMETERS parametervalues lt,...,
parametervaluesgt

26
Example of Nonlinear regression

DATA nonlinear
INPUT age weight
DATALINES
2 0.3
4 0.86
6 1.73
8 2.2
10 2.47
12 2.67
14 2.8

PROC print
PROC NLIN METHODDUD
MODEL weightk/(1aexp(-bage))
PARMS k3 a20 b0.4
RUN

27
Multiple Linear Regression

Multiple linear regression extends simple linear
regression to multiple predictor variables
Most of time, there are more than one independent
variables influence the response variable.
For example, body mass index can be predicted by
caloric intake and gender CO2 flux between
biosphere and atmosphere can be predicted by
light, temperature, leaf area index and vapor
pressure deficit.

28
Multiple Regression Model

Estimate multiple linear regression equation
Test overall significance of model
Test significance of each independent variables
and select best model
Test relative important of each independent
variables.
Use model for prediction estimation

29
Multiple Linear Regression Model

Relationship between 1 dependent 2 or more
independent variables is a linear function

Population slopes
Population Y-intercept
Random error
Dependent (response) variable
Independent (explanatory) variables
30
Population Multiple Regression Model
Bivariate model
31
Sample Multiple Regression Model
Bivariate model
b
Y

X

X

b
a
e
Y
1
1
2
2

i
i
i
i
(Observed Y)
a
Response

e
Plane
i
X
2
X
(
X
,
X
)
1
1
2
i
i

a
b
b
Y

X

X
i
1
1
i
2
2
i
32
Least squares fit

Regression parameters (a bk) are determined
using method of least squares
Minimizes the squared differences between each
observation and the fitted line in the
multivariate plane i.e., minimizes the
residuals

33
Interpretation of regression model

?a is the intercept (Y value) when the predictors
are zero
?bk is one of the partial regression coefficient
or slopes of regression line
represents change in Y for a unit change in Xk
with other predictors held constant.
i.e., ?k is the average slope across all
subgroups created by the Xk levels
e is the error term for each individual and is
the residual for that individual
Residual is the difference between predicted and
observed values

34
The Residual

A residual is difference between an observed
value of Y and the estimated mean based on the
associated X value.
There exists one residual for every subject (XY
pair)
Measures the distance of each observation from
Useful for
Diagnostics, that is, techniques for checking
assumptions of the regression model
Understanding the variation in Y that is
unexplained by the linear function of X

35
Example of Parameter Estimation

In order to develop a multiple regression
equation to predict the yield of wheat variety
Fengchan, spikes per head (X1), head per plant
(X2), weight per 100 grains (X3), height of plant
(X4) and weight per plant (Y, g) were measured
as

36
Parameter Estimation Computer Output
Parameter
Estimates
Parameter Standard Variable
DF Estimate Error t Value
Pr gt t Intercept 1
-51.90207 13.35182 -3.89 0.0030
X1 1 2.02618
0.27204 7.45 lt.0001
X2 1 0.65400 0.30270
2.16 0.0561 X3
1 7.79694 2.33281 3.34
0.0075 X4 1 0.04970
0.08300 0.60 0.5626
37
Testing Overall Significance

1. Shows if there is a linear relationship
retween all X variables together Y
2. Uses F Test Statistic
3. Hypotheses
H0 ?1 ?2 ... ?k 0
No Linear Relationship
Ha At least one coefficient is not 0
At least one X variable affects Y

38
Testing Overall SignificanceComputer Output

Analysis of Variance
Sum of Mean
Source DF Squares
Square F Value Pr gt F
Model 4 221.47175
55.36794 30.06 lt.0001 Error
10 18.41758 1.84176
Corrected Total 14
239.88933 Root MSE
1.35711 R-Square 0.9232
Dependent Mean 14.47333 Adj
R-Sq 0.8925 Coeff Var
9.37665
39
Coefficient of determination Multiple

1. Proportion of variation in Y explained by
all X variables Taken Together
R2 Explained variation SSR Total
variation SSyy
2. Never decreases when new X variable is added
to model
Only Y values determine SSy
Disadvantage when comparing models

40
Parameter Estimation Computer Output
Parameter
Estimates
Parameter Standard Variable
DF Estimate Error t Value
Pr gt t Intercept 1
-51.90207 13.35182 -3.89 0.0030
X1 1 2.02618
0.27204 7.45 lt.0001
X2 1 0.65400 0.30270
2.16 0.0561 X3
1 7.79694 2.33281 3.34
0.0075 X4 1 0.04970
0.08300 0.60 0.5626
41
Variable Selection

There are many different ways to select variable
Forward
Backward
Stepwise
Rsquare

42
Variable Selection

Forward
Summary of Forward Selection
Variable Number Partial
Model
Step Entered Vars In R-Square
R-Square C(p) F Value Pr gt F
1 X1 1 0.8052
0.8052 14.3764 53.73 lt.0001
2 X3 2 0.0767
0.8818 6.3911 7.79 0.0163
3 X2 3 0.0386
0.9205 3.3585 5.34 0.0412
Parameter
Standard
Variable Estimate
Error Type II SS F Value Pr gt F
Intercept -46.96636
10.19262 36.82480 21.23 0.0008
X1 2.01314
0.26314 101.50782 58.53 lt.0001
X2 0.67464
0.29183 9.26887 5.34 0.0412
X3 7.83023
2.26313 20.76193 11.97 0.0053

43
Variable Selection

Backward
Summary of Backward Elimination
Variable Number Partial
Model
Step Removed Vars In R-Square
R-Square C(p) F Value P gt F
1 X4 3 0.0028
0.9205 3.3585 0.36 0.5626
Parameter
Standard
Variable Estimate
Error Type II SS F Value Pr gt F
Intercept -46.96636
10.19262 36.82480 21.23 0.0008
X1 2.01314
0.26314 101.50782 58.53 lt.0001
X2 0.67464
0.29183 9.26887 5.34 0.0412
X3 7.83023
2.26313 20.76193 11.97 0.0053

44
Variable Selection

Stepwise
Summary of Stepwise Selection
Variable Variable Number
Partial Model
Step Entered Removed Vars In
R-Square R-Square C(p) F Value Pr
gt F
1 X1 1
0.8052 0.8052 14.3764 53.73
lt.0001
2 X3 2
0.0767 0.8818 6.3911 7.79
0.0163
3 X2 3
0.0386 0.9205 3.3585 5.34
0.0412

45
Variable Selection

Rsquare R-Square Selection Method
Number in Model
R-Square Variables in Model
1 0.8052
X1
1 0.4747
X3
1 0.0021
X2
1 0.0000
X4
----------------------
---------------------
2 0.8818
X1 X3
2 0.8339
X1 X2
2 0.8113
X1 X4
2 0.4973
X2 X3
2 0.4750
X3 X4
2 0.0023
X2 X4
----------------------
---------------------
3 0.9205
X1 X2 X3
3 0.8874
X1 X3 X4
3 0.8375
X1 X2 X4
3 0.4973
X2 X3 X4

46
Variable Selection

Which one is better?
Depends
FORWARD keeps more in the model, BACKWARD deletes
more from model. STEPWISE is good in general.
Rsquare allows you to select highest R2 model.

47
Relative important of Variable

Use STB to print standardized regression
coefficient.
MODEL yvariables /STB

Parameter Estimates Parameter
Standard Standardized
Variable DF Estimate Error t Value
Pr gt t Estimate 95 Confidence
Limits Intercept 1 -46.96636 10.19262
-4.61 0.0008 0 -69.40016
-24.53256 X1 1 2.01314
0.26314 7.65 lt.0001 0.75342
1.43396 2.59231 X2 1 0.67464
0.29183 2.31 0.0412 0.19929
0.03233 1.31696 X3 1 7.83023
2.26313 3.46 0.0053 0.34139
2.84911 12.81134
48
Confidence intervals estimation
Dep Var Predicted Std Error Obs
Y Value Mean Predict 95 CL Mean
95 CL Predict Residual 1
15.7000 16.8707 0.4999 15.7705
17.9708 13.7703 19.9710 -1.1707 2
14.5000 12.8336 0.6867 11.3222
14.3449 9.5646 16.1025 1.6664 3
17.5000 16.9790 0.4666 15.9520
18.0061 13.9039 20.0542 0.5210 4
22.5000 22.3438 0.9090 20.3430
24.3446 18.8217 25.8659 0.1562 5
15.5000 16.1960 0.3732 15.3746
17.0174 13.1833 19.2087 -0.6960 6
16.9000 16.0876 0.5112 14.9624
17.2129 12.9783 19.1970 0.8124 7
8.6000 10.4953 0.6314 9.1056
11.8850 7.2808 13.7098 -1.8953 8
17.0000 15.9792 0.7946 14.2305
17.7280 12.5940 19.3645 1.0208 9
13.7000 13.2807 0.7933 11.5346
15.0267 9.8968 16.6645 0.4193 10
13.4000 13.9553 0.6118 12.6087
15.3019 10.7592 17.1514 -0.5553 11
20.3000 19.2197 0.9110 17.2146
21.2248 15.6952 22.7442 1.0803 12
10.2000 10.7121 0.5657 9.4670
11.9571 7.5574 13.8667 -0.5121 13
7.4000 5.6860 0.9191 3.6631
7.7089 2.1513 9.2207 1.7140 14
11.6000 12.2781 0.7637 10.5972
13.9590 8.9274 15.6288 -0.6781 15
12.3000 14.1829 0.3997 13.3032
15.0626 11.1537 17.2120 -1.8829

Write a Comment

User Comments (0)