Title: Regression Dr.L.Jeyaseelan Dept. of Biostatistics Christian Medical College Vellore, India
1RegressionDr.L.JeyaseelanDept. of
BiostatisticsChristian Medical CollegeVellore,
India
2 Linear Regression ... a linear regression
coefficient indicates the impact of each
independent variable on the outcome in the
context of (or adjusting for) all other
variables. - J. Concato, A. R. Feinstein, T. R.
Holford
3Overview
- Research interests lies when we may want to
describe the relationship and thus predict the
value of one variable using the value of the
other variable for an individual. - Describing the relation between the values of the
two variables - Regression
4Origin of Regression Concept
- Sir Francis Galton (1822-1911) used the term
Regression. - To explain the relationship between the heights
(inches) of fathers and their sons. - Father Son pairs (n1,078)
- Sons height, Y 33.73 0.516 (Fathers height,
X) - when X 74 gt Y72 (son is not tall as his
father) - when X 65 gt Y 67 (son is taller than his
father)
5Assumptions
- Outcome is normally distributed
- Independent observations
- Relationship between variables is linear
6Linear regression Equation Suppose we want to
test whether there is any relation between birth
weight (BW) of baby and Blood Pressure
(BP) Dependent variable is BP and independent
variable is BW So the equation will be BP a
b (BW) i.e. Given a value of birth weight (BW)
corresponding Blood Pressure (BP) can be
predicted. In mathematics Y is called a function
of X but in statistics the term regression is
used to describe the relationship.
7So the regression equation will be
What does these coefficients tells us? The
slope b means that for each unit change in X
(i.e. Birth weight), Y ( Blood Pressure)
increases by 25.34 units.
8(No Transcript)
9 Straight line The equation of the
straight line is Y ß0 ß1 X where ß0 is
the Y intercept of the line
ß1 is the slope. The following diagram depicts
the relationship between the blood pressure and
the drug concentration.
10 The highest line is of the relationship
Y2015X, which represents the effect of
drug A on an animal. The quantity of drug is
measured in micrograms, the blood pressure in
millimeters mercury. If 4?g of the drug have
been given, then the blood pressure would be
Y20 15(4)80mm Hg. If the independent
variable equals zero, the dependent variable
does not also equals zero, but equals ß0. In
the diagram, it equals to a blood pressure of
20mm, which is the normal BP of animal in the
absence of drug. Obviously, when no drug is
administered, the BP should be at the same Y -
intercept, since the identical animal is
studied.
11 In the above equation ß0 is called Y-intercept.
ß1 is called the slope or regression
coefficient. In the lowest line, Y207.5X, the
Y intercept remains the same, but the slope has
been halved. We visualize this as the effect of
a different drug B on the animal.
(Kleinbaum and Kupper, 1978)
12Test for slope and intercepts
- The null hypothesis is, ß1 0.
- Wald Statistics, t
The data showed 5 units change in cholesterol
level for a one year increase in age Is this
increase of 5 units, just confined to this
dataset (chance effect) or is it a real change
due to the effect of age
13Interpretation
For a one year increase in age, there is a
significant 5 units increase in cholesterol level
14Prediction
- Age 43 years
- Cholesterol ???
- Cholesterol 107.55 (5.25Age)
15Principle
Estimated value of Y at X Xi- where and
are the intercept and slope regression
parameters to be determined Error in predicting
an actual observation Y Yi at X Xi is
16Total sum of squared errors (SSE)
x
y
x
x
x
x
x
x
Objective Fit so that SSE is minimised.
17Simple (Linear) Regression
- One independent variable
- Age and cholesterol
- Age and BP
- Age and Forced Vital Capacity
-
18Multiple (Linear) Regression
- More than one independent variable
- Age, gender, BMI and cholesterol
- Age, height, weight and FVC
19- Uses
- Measure of linear association
- Interpolation
- Prediction after controlling confounders
- To identify which combination of variables best
predicts - response variables or outcome.
20- Misuses
- Extrapolation without assurance that the
trend remains - same.
- Using the regression relationship whose slope
has - been shown to be not significantly different
from zero - Concluding that cause and effect relationship
exists, - while the relationship may just be
statistical - Applying the relationship established in one
group of subject to another group without the
assurance that is applicable to all groups.