Biostatistics and Computer Applications - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Biostatistics and Computer Applications

Description:

Regression analysis finds the relationship of two or more ... Example: We measured two deciduous trees leaf area and the product of leaf length and width. ... – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 59
Provided by: dafen
Category:

less

Transcript and Presenter's Notes

Title: Biostatistics and Computer Applications


1
Biostatistics and Computer Applications
Correlation and regression analysis Linear
regression Significant test Confidence interval
estimation SAS programming 1/07/2003
2
Recap (Analysis of variance)
  • Detect the effect of (response to) treatments
    (levels or combination of level of experimental
    factors).
  • Main idea Partition the total variance into
    different components.
  • One-way ANOVA
  • Two-way ANOVA
  • Hierarchical data ANOVA
  • ANOVA for different experimental designs
  • Purpose treatment effect
  • Designs to minimum experimental error.

3
Correlation relationship analysis
  • ANOVA analysis of the dependent variable (usually
    continuous variable) with the treatment effects
    (category variables).
  • Regression analysis finds the relationship of two
    or more continuous variables (for two variables,
    X and Y).

4
Correlation Relationship
  • Function Yf(X), for every value of variable of
    X, we have a fixed value Y corresponding to X.
    Deterministic models, Prediction error is
    negligible. Example Force is exactly mass times
    acceleration F ma, or area of a circle is
    YpiX2.
  • Correlation Yf(X)e, under certain conditions,
    for every value of X, we do not have a fixed Y,
    but we have a probability distribution of Y
    associated with X. Probabilistic Models. Sales
    volume is 10 times advertising spending random
    error Y 10X ?.

5
Dependent and independent variables
  • In the correlation relationship, if one variable
    response to the variance of another variable, we
    call the response variable dependent variable
    (Y), and the predictor variable independent
    variable (X).
  • For example, yield of wheat (Y) response to
    plant density (X), income (Y) response to
    education levels (X).
  • Regression analysis.
  • If there is no cause-response relationship
    between these two variables, but co-vary with
    other variables, we do not separate X and Y into
    dependent and independent variable (X, Y).
  • For example, height and weight of person.
  • Correlation analysis.

6
Tasks of correlation relationship analysis
  • Regression analysis
  • Develop the regression equation (Yf(X)) and
    estimate standard error of regression SY/X.
  • Correlation analysis
  • Calculate the correlation coefficient r which is
    the measure of the degree of association between
    two variables.

7
Scatter diagram
8
(No Transcript)
9
Background information
  • Historically, the study of predicting one
    measurement from knowledge of another occurred
    before the development of correlation procedures.
  • Sir Francis Galton published a paper, Regression
    towards Mediocrity in Hereditary Stature, in
    1885. His work concentrated on the prediction of
    physical traits of offspring from knowledge of
    the parents' physical traits.
  • A general finding of his work was that children
    tend to "regress toward the mean". For example,
    taller than average parents tended to have
    children that were shorter than them. Likewise,
    shorter than average parents tended to have
    children taller than them. Because of the
    regression phenomena, prediction studies became
    called regression studies.

10
Background information
  • Today's use of the term, regression, has nothing
    to do with the biological phenomena observed by
    Galton. Instead, regression refers to the
    prediction of one measurement based on knowledge
    of another.
  • The regression techniques used now were pioneered
    by Karl Pearson. The most commonly used
    correlation technique used today is called the
    Pearson product-moment correlation coefficient.
    It is applied to data which is at the level of
    numerical discrete or continuous.
  • Categorical data requires other correlation
    techniques such as the contingency coefficient.
    Full ranking scale data may be analyzed using the
    rank-order correlation methods.

11
Types of regression analysis
Regression
2 Explanatory
1 Explanatory
Models
Variables
Variable
Simple
Multiple
Non-
Non-
Linear
Linear
Linear
Linear
12
Regression Modeling Steps
  • 1. Determine the regression equation
  • 2. Calculate unknown model parameters
  • 3. Estimate standard deviation of regression
  • 4. Test the significance of the regression
    equation
  • 5. Use model for estimation and prediction.

13
Simple Linear Regression
  • Linear regression equation of
    Y on X.
  • a intercept. When X0, Y a.
  • b regression coefficient or slope. When X
    increases 1 unit, Y is expected to change b unit.

14
Simple Linear Regression
  • How would you draw a line through the points?
    How do you determine which line fits best?

15
Least squares method
  • Best Fit means difference between actual Y
    values predicted Y values is a minimum
  • But positive differences off-set negative
  • Least square minimizes the sum of the squared
    differences (SSe, Q)

16
Least squares estimation
Y
X
17
Fitting Regression Lines
To obtain the minimum of a function we find
the values that make the derivatives equal to
zero. So to find (a, b) that minimizes Q we
solve the normal equations Solving these
equations yields the estimates
18
Coefficient Equations
Prediction Equation
Sample Slope
Sample Y-intercept
SP sum of products
19
Interpretation of Coefficients
  • 1. Slope (b)
  • Estimated Y changes by b for each 1 unit increase
    in X.
  • If b4, then Y is expected to increase by 4 for
    each 1 unit increase in X.
  • 2. Y-intercept (a)
  • Average Value of Y When X 0.
  • If a 2, then average Y is expected to be 2 when
    X is 0.

20
Properties of a,b and equation
21
Computation Table
22
Example of Parameter Estimation
  • Example The amount of pest changes with climatic
    conditions. The following data are the amount of
    pest observed in 100 plants (Y) and the ratio of
    precipitation/temperature (PPT/T, X) during ten
    year at one site. Develop a regression equation.

23
Parameter Estimation Solution
(4.976,109)
Meaning of a,b
24
Standard deviation from regression
Standard error of estimation or standard
deviation from regression. Q sum of squares due
to the deviation from regression.
25
Example of standard deviation
68.26 Y observations are Y-25.74, 95.45 are
Y-225.74.
26
Linear Regression Model
  • Assumptions
  • 1. Predictor variables are fixed i.e., same
    meaning among individuals. Predictor variable
    measured without error
  • 2. For each value of the predictor variable,
    there is a normal distribution of outcomes
    (subpopulations) and the variance of these
    distributions are equal.

27
Linear Regression Model
  • Assumptions
  • 3. is
    constant, changes with X linearly

28
Linear Regression Model
  • Assumptions
  • 4.

29
Population Sample Regression Models
Population
Random Sample
Unknown Relationship
?
?
?
?
?
?
?
30
Hypothesis test of regression equation
  • If you have X,Y, you can develop a regression
    equation. Is it true?
  • Test if the sample is drawn from a population
    that Y and X has no correlation relationship.
  • F test
  • Student t test for regression coefficient.

31
F test
Population
Sample
32
Measures of Variation in Regression
  • Total Sum of Squares (SST)
  • Measures Variation of Observed Yi Around the
    Mean?Y
  • Regression sum of squares (U) Explained
    Variation
  • Variation Due to Relationship Between X Y
  • Residual sum of squares (Q) Unexplained
    Variation (Q)
  • Variation Due to Other Factors

33
Variation Measures
Observedvalue
ei Random error
_
y
Observed value
34
Variation Measures
Unexplained sum of squares (Yi -?Yi)2

Y
Yi
Total sum of squares (Yi -?Y)2
Explained sum of squares (Yi -?Y)2


Y
X
X
i
35
F test
  • The ANOVA table for regression analysis

36
Example Measures of Variation in Regression
  • Test if the linear regression equation is
    significant.

37
Student t test of Slope
  • Test if there is a linear relationship between X
    and Y.
  • Hypotheses
  • H0 ? 0 (No Linear Relationship)
  • HA ? ? 0 (Linear Relationship)
  • Theoretical Basis Is Sampling Distribution of
    Slope

b
b
38
Slope Test Statistic
Relationship between F test and t test
39
Example of Slope Test
Reject
Reject
.025
.025
t
0
3.355
-3.355
40
Confidence Interval of regression
  • Population mean response for given X
  • Point on Population Regression Line
  • Population individual response (Y) for given X
    (Prediction interval of Y)
  • Intercept .
  • Slope .

41
Estimation of and prediction of Y
Y
Y
X
b
Individual


a

Y
Mean Y,


X

Prediction, Y
X
X
i
42
Confidence interval of
confidence interval for
43
Example of CI for
Calculate when PPT/T, X7 , 95 confidence
interval for
Influence factors Level of Confidence (1 -
?), data Dispersion (s), sample size and distance
of Xi from mean?X.
44
Why Distance from Mean?
Greater dispersion than X1
?X
45
Prediction Interval of Individual Response Y
Population individual observation Y 1-alpha
prediction interval
46
Why the Extra ?
Y
Y

we're trying to
X
b
i

a
predict


e
Y
Expected
i
(Mean)

Prediction, Y
X
X
i
47
Example of Prediction Interval of Y
Calculate when X7, 95 the prediction interval
for the individual observation Y.
48
Hyperbolic Interval Bands
Y
b
X
a
i



Y
i
X
_
X
X
i
49
Confidence Intervals of Intercept and Slope
1-alpha confidence interval for slope
1-alpha confidence interval for intercept
50
Comparison of two regression equations
51
Example of comparison of two regression equations
Example We measured two deciduous trees leaf
area and the product of leaf length and width.
Test if the relationships of leaf area and the
product of lengthwidth change.
52
Example of comparison of two regression equations
53
Example of comparison of two regression equations
As there is no difference between intercepts and
slopes, we can merge these two data sets together
to estimate one equation.
54
Summary
  • 1. Described the correlation relationship
  • 2. Stated the regression modeling steps
  • 3. Computed regression coefficients
  • 4. Test significant regression model
  • 5. Estimate confidence interval.

55
SAS Programming
  • Procedures PROC REG PROC GLM.
  • Special procedures such as PROC LOGISTIC PROC
    RSREG PROC LIFEREG PROC ORTHOREG PROC PHREG
    PROC SURVEYREG PROC TRANSREG
  • PROC REG for one or multiple linear regression
    analysis and common procedure.

56
PROC REG
  • PROC REG lt options gt
  • lt label gt MODEL dependentsltregressorsgt lt /
    options gt
  • VAR variables
  • RESTRICT equation, ... ,equation
  • lt label gt MTEST ltequation, ... ,equationgt lt /
    options gt
  • lt label gt TEST equation,lt, ...,equationgt lt /
    option gt
  • ADD variables
  • DELETE variables
  • REFIT
  • PAINT ltcondition  ALLOBSgt lt / options gt  lt
    STATUS  UNDOgt
  • PLOT ltyvariablexvariablegt ltsymbolgt lt
    ...yvariablexvariablegt ltsymbolgt lt / options gt
  • PRINT lt options gt lt ANOVA gt lt MODELDATA gt
  • OUTPUT lt OUTSAS-data-set gt keywordnames      lt
    ... keywordnames gt
  • BY variables
  • FREQ variable
  • ID variables
  • WEIGHT variable
  • REWEIGHT ltcondition  ALLOBSgt  lt / options gt  lt
    STATUS  UNDOgt

57
SAS program 1
  • DATA pest
  • INPUT x y
  • DATALINES
  • 1.58 180
  • 9.98 28
  • 9.42 25
  • 11.01 40
  • 1.85 160
  • 6.04 120
  • 5.92 80
  • PROC REG SIMPLE CORR
  • MODEL yx /CLM CLI CLB
  • PLOT yx
  • RUN
  • / CLB confidence interval for slope
  • CLM confidence interval for population mean
    miu_Y/X
  • and CLI prediction interval for Y /

58
SAS program 2
  • DATA frog
  • INPUT temperature heartrate
  • DATALINES
  • 2 5
  • 4 11
  • 6 11
  • 8 14
  • 10 22
  • 12 23
  • 14 32
  • 16 29
  • 18 32
  • PROC REG NOPRINT
  • MODEL heartratetemperature
  • PRINT ALL
  • RUN
Write a Comment
User Comments (0)
About PowerShow.com