Title: Polynomial regression models
1Polynomial regression models
- Possible models for when the response function is
curved
2Uses of polynomial models
- When the true response function really is a
polynomial function. - (Very common!) When the true response function is
unknown or complex, but a polynomial function
approximates the true function well.
3Example
- What is impact of exercise on human immune
system? - Is amount of immunoglobin in blood (y) related to
maximal oxygen uptake (x) (in a curved manner)?
4Scatter plot
5A quadratic polynomial regression function
- where
- Yi amount of immunoglobin in blood (mg)
- Xi maximal oxygen uptake (ml/kg)
- typical assumptions about error terms (INE)
6Estimated quadratic function
7Interpretation of the regression coefficients
- If 0 is a possible x value, then b0 is the
predicted response. Otherwise, interpretation of
b0 is meaningless. - b1 does not have a very helpful interpretation.
It is the slope of the tangent line at x 0. - b2 indicates the up/down direction of curve
- b2 lt 0 means curve is concave down
- b2 gt 0 means curve is concave up
8The regression equation is igg - 1464 88.3
oxygen - 0.536 oxygensq Predictor Coef SE
Coef T P VIF Constant -1464.4
411.4 -3.56 0.001 oxygen 88.31
16.47 5.36 0.000 99.9 oxygensq
-0.5362 0.1582 -3.39 0.002 99.9 S
106.4 R-Sq 93.8 R-Sq(adj)
93.3 Analysis of Variance Source DF
SS MS F P Regression
2 4602211 2301105 203.16 0.000 Residual
Error 27 305818 11327 Total 29
4908029 Source DF Seq SS oxygen
1 4472047 oxygensq 1 130164
9A multicollinearity problem
Pearson correlation of oxygen and oxygensq 0.995
10Center the predictors
Mean of oxygen 50.637
oxygen oxcent oxcentsq 34.6 -16.037
257.185 45.0 -5.637 31.776 62.3 11.663
136.026 58.9 8.263 68.277 42.5
-8.137 66.211 44.3 -6.337 40.158 67.9
17.263 298.011 58.5 7.863 61.827
35.6 -15.037 226.111 49.6 -1.037
1.075 33.0 -17.637 311.064
11Does it really work?
Pearson correlation of oxcent and oxcentsq 0.219
12A better quadratic polynomial regression function
ß0 mean response at the predictor mean ß1
linear effect coefficient ß11 quadratic
effect coefficient
13The regression equation is igg 1632 34.0
oxcent - 0.536 oxcentsq Predictor Coef SE
Coef T P VIF Constant 1632.20
29.35 55.61 0.000 oxcent 34.000
1.689 20.13 0.000 1.1 oxcentsq -0.5362
0.1582 -3.39 0.002 1.1 S 106.4
R-Sq 93.8 R-Sq(adj) 93.3 Analysis of
Variance Source DF SS MS
F P Regression 2 4602211
2301105 203.16 0.000 Residual Error 27
305818 11327 Total 29
4908029 Source DF Seq SS oxcent
1 4472047 oxcentsq 1 130164
14Interpretation of the regression coefficients
- b0 is predicted response at the predictor mean.
- b1 is the estimated slope of the tangent line at
the predictor mean and, typically, also the
estimated slope in the simple model. - b2 indicates the up/down direction of curve
- b2 lt 0 means curve is concave down
- b2 gt 0 means curve is concave up
15Estimated regression function
16Similar estimates
17The relationship between the two forms of the
model
Original model
Centered model
Where
18 Mean of oxygen 50.637
19(No Transcript)
20(No Transcript)
21What is predicted IgG if maximal oxygen uptake is
90?
Predicted Values for New Observations New Obs
Fit SE Fit 95.0 CI 95.0 PI 1
2139.6 219.2 (1689.8,2589.5) (1639.6,2639.7)
XX X denotes a row with X values away from the
center XX denotes a row with very extreme X
values Values of Predictors for New
Observations New Obs oxcent oxcentsq 1
39.4 1549
There is an even greater danger in extrapolation
when modeling data with a polynomial function,
because of changes in direction.
22It is possible to overfit the data with
polynomial models.
23It is even theoretically possible to fit the data
perfectly.
If you have n data points, then a polynomial of
order n-1 will fit the data perfectly, that is,
it will pass through each data point.
But, good statistical software will keep an
unsuspecting user from fitting such a model.
Error Not enough non-missing observations
to fit a polynomial of this order execution
aborted
24The hierarchical approach to model fitting
Widely accepted approach is to fit a higher-order
model and then explore whether a lower-order
(simpler) model is adequate.
Is a first-order linear model (line) adequate?
25The hierarchical approach to model fitting
But then if a polynomial term of a given order
is retained, then all related lower-order terms
are also retained. That is, if a quadratic term
was significant, you would use this regression
function
26Example
- Quality of a product (y) a score between 0 and
100 - Temperature (x1) degrees Fahrenheit
- Pressure (x2) pounds per square inch
27(No Transcript)
28A two-predictor, second-order polynomial
regression function
- where
- Yi quality
- Xi1 temperature
- Xi2 pressure
- ß12 interaction effect coefficient
29The regression equation is quality - 5128
31.1 temp 140 pressure -
0.133 tempsq - 1.14 presssq -
0.145 tp Predictor Coef SE Coef T
P VIF Constant -5127.9 110.3
-46.49 0.000 temp 31.096 1.344
23.13 0.000 1154.5 pressure 139.747
3.140 44.50 0.000 1574.5 tempsq
-0.133389 0.006853 -19.46 0.000
973.0 Press -1.14422 0.02741 -41.74
0.000 1453.0 tp -0.145500 0.009692
-15.01 0.000 304.0 S 1.679 R-Sq
99.3 R-Sq(adj) 99.1
30Again, some correlation
quality temp pressure tempsq
presssq temp -0.423 pressure 0.182
0.000 tempsq -0.434 0.999 0.000 presssq
0.162 0.000 1.000 -0.000 tp -0.227
0.773 0.632 0.772 0.632 Cell
Contents Pearson correlation
31A better two-predictor, second-order polynomial
regression function
- where
- Yi quality
- xi1 centered temperature
- xi2 centered pressure
- ß12 interaction effect coefficient
32Reduced correlation
quality tcent pcent tpcent
tcentsq tcent -0.423 pcent 0.182
0.000 tpcent -0.274 0.000 0.000 tcentsq
-0.355 -0.000 0.000 0.000 pcentsq -0.762
0.000 0.000 0.000 -0.000 Cell
Contents Pearson correlation
33The regression equation is quality 94.9 - 0.916
tcent 0.788 pcent - 0.146
tpcent - 0.133 tcentsq - 1.14
pcentsq Predictor Coef SE Coef T
P VIF Constant 94.9259 0.7224
131.40 0.000 tcent -0.91611 0.03957
-23.15 0.000 1.0 pcent 0.78778
0.07913 9.95 0.000 1.0 tpcent
-0.145500 0.009692 -15.01 0.000
1.0 tcentsq -0.133389 0.006853 -19.46
0.000 1.0 pcentsq -1.14422 0.02741
-41.74 0.000 1.0 S 1.679 R-Sq
99.3 R-Sq(adj) 99.1
34(No Transcript)
35(No Transcript)
36Predicted Values for New Observations New Obs
Fit SE Fit 95.0 CI 95.0 PI 1
94.926 0.722 (93.424,96.428) (91.125,98.726)
Values of Predictors for New Observations New
Obs tcent pcent tpcent tcentsq
pcentsq 1 0.0000 0.0000 0.0000
0.0000 0.0000