Polynomial regression models - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Polynomial regression models

Description:

XX denotes a row with very extreme X values. Values of Predictors for New Observations ... Temperature (x1) degrees Fahrenheit. Pressure (x2) pounds per ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 37
Provided by: lsi4
Category:

less

Transcript and Presenter's Notes

Title: Polynomial regression models


1
Polynomial regression models
  • Possible models for when the response function is
    curved

2
Uses of polynomial models
  • When the true response function really is a
    polynomial function.
  • (Very common!) When the true response function is
    unknown or complex, but a polynomial function
    approximates the true function well.

3
Example
  • What is impact of exercise on human immune
    system?
  • Is amount of immunoglobin in blood (y) related to
    maximal oxygen uptake (x) (in a curved manner)?

4
Scatter plot
5
A quadratic polynomial regression function
  • where
  • Yi amount of immunoglobin in blood (mg)
  • Xi maximal oxygen uptake (ml/kg)
  • typical assumptions about error terms (INE)

6
Estimated quadratic function
7
Interpretation of the regression coefficients
  • If 0 is a possible x value, then b0 is the
    predicted response. Otherwise, interpretation of
    b0 is meaningless.
  • b1 does not have a very helpful interpretation.
    It is the slope of the tangent line at x 0.
  • b2 indicates the up/down direction of curve
  • b2 lt 0 means curve is concave down
  • b2 gt 0 means curve is concave up

8
The regression equation is igg - 1464 88.3
oxygen - 0.536 oxygensq Predictor Coef SE
Coef T P VIF Constant -1464.4
411.4 -3.56 0.001 oxygen 88.31
16.47 5.36 0.000 99.9 oxygensq
-0.5362 0.1582 -3.39 0.002 99.9 S
106.4 R-Sq 93.8 R-Sq(adj)
93.3 Analysis of Variance Source DF
SS MS F P Regression
2 4602211 2301105 203.16 0.000 Residual
Error 27 305818 11327 Total 29
4908029 Source DF Seq SS oxygen
1 4472047 oxygensq 1 130164
9
A multicollinearity problem
Pearson correlation of oxygen and oxygensq 0.995
10
Center the predictors
Mean of oxygen 50.637
oxygen oxcent oxcentsq 34.6 -16.037
257.185 45.0 -5.637 31.776 62.3 11.663
136.026 58.9 8.263 68.277 42.5
-8.137 66.211 44.3 -6.337 40.158 67.9
17.263 298.011 58.5 7.863 61.827
35.6 -15.037 226.111 49.6 -1.037
1.075 33.0 -17.637 311.064
11
Does it really work?
Pearson correlation of oxcent and oxcentsq 0.219
12
A better quadratic polynomial regression function
ß0 mean response at the predictor mean ß1
linear effect coefficient ß11 quadratic
effect coefficient
13
The regression equation is igg 1632 34.0
oxcent - 0.536 oxcentsq Predictor Coef SE
Coef T P VIF Constant 1632.20
29.35 55.61 0.000 oxcent 34.000
1.689 20.13 0.000 1.1 oxcentsq -0.5362
0.1582 -3.39 0.002 1.1 S 106.4
R-Sq 93.8 R-Sq(adj) 93.3 Analysis of
Variance Source DF SS MS
F P Regression 2 4602211
2301105 203.16 0.000 Residual Error 27
305818 11327 Total 29
4908029 Source DF Seq SS oxcent
1 4472047 oxcentsq 1 130164
14
Interpretation of the regression coefficients
  • b0 is predicted response at the predictor mean.
  • b1 is the estimated slope of the tangent line at
    the predictor mean and, typically, also the
    estimated slope in the simple model.
  • b2 indicates the up/down direction of curve
  • b2 lt 0 means curve is concave down
  • b2 gt 0 means curve is concave up

15
Estimated regression function
16
Similar estimates
17
The relationship between the two forms of the
model
Original model
Centered model
Where
18
Mean of oxygen 50.637
19
(No Transcript)
20
(No Transcript)
21
What is predicted IgG if maximal oxygen uptake is
90?
Predicted Values for New Observations New Obs
Fit SE Fit 95.0 CI 95.0 PI 1
2139.6 219.2 (1689.8,2589.5) (1639.6,2639.7)
XX X denotes a row with X values away from the
center XX denotes a row with very extreme X
values Values of Predictors for New
Observations New Obs oxcent oxcentsq 1
39.4 1549
There is an even greater danger in extrapolation
when modeling data with a polynomial function,
because of changes in direction.
22
It is possible to overfit the data with
polynomial models.
23
It is even theoretically possible to fit the data
perfectly.
If you have n data points, then a polynomial of
order n-1 will fit the data perfectly, that is,
it will pass through each data point.
But, good statistical software will keep an
unsuspecting user from fitting such a model.
Error Not enough non-missing observations
to fit a polynomial of this order execution
aborted
24
The hierarchical approach to model fitting
Widely accepted approach is to fit a higher-order
model and then explore whether a lower-order
(simpler) model is adequate.
Is a first-order linear model (line) adequate?
25
The hierarchical approach to model fitting
But then if a polynomial term of a given order
is retained, then all related lower-order terms
are also retained. That is, if a quadratic term
was significant, you would use this regression
function
26
Example
  • Quality of a product (y) a score between 0 and
    100
  • Temperature (x1) degrees Fahrenheit
  • Pressure (x2) pounds per square inch

27
(No Transcript)
28
A two-predictor, second-order polynomial
regression function
  • where
  • Yi quality
  • Xi1 temperature
  • Xi2 pressure
  • ß12 interaction effect coefficient

29
The regression equation is quality - 5128
31.1 temp 140 pressure -
0.133 tempsq - 1.14 presssq -
0.145 tp Predictor Coef SE Coef T
P VIF Constant -5127.9 110.3
-46.49 0.000 temp 31.096 1.344
23.13 0.000 1154.5 pressure 139.747
3.140 44.50 0.000 1574.5 tempsq
-0.133389 0.006853 -19.46 0.000
973.0 Press -1.14422 0.02741 -41.74
0.000 1453.0 tp -0.145500 0.009692
-15.01 0.000 304.0 S 1.679 R-Sq
99.3 R-Sq(adj) 99.1
30
Again, some correlation
quality temp pressure tempsq
presssq temp -0.423 pressure 0.182
0.000 tempsq -0.434 0.999 0.000 presssq
0.162 0.000 1.000 -0.000 tp -0.227
0.773 0.632 0.772 0.632 Cell
Contents Pearson correlation
31
A better two-predictor, second-order polynomial
regression function
  • where
  • Yi quality
  • xi1 centered temperature
  • xi2 centered pressure
  • ß12 interaction effect coefficient

32
Reduced correlation
quality tcent pcent tpcent
tcentsq tcent -0.423 pcent 0.182
0.000 tpcent -0.274 0.000 0.000 tcentsq
-0.355 -0.000 0.000 0.000 pcentsq -0.762
0.000 0.000 0.000 -0.000 Cell
Contents Pearson correlation
33
The regression equation is quality 94.9 - 0.916
tcent 0.788 pcent - 0.146
tpcent - 0.133 tcentsq - 1.14
pcentsq Predictor Coef SE Coef T
P VIF Constant 94.9259 0.7224
131.40 0.000 tcent -0.91611 0.03957
-23.15 0.000 1.0 pcent 0.78778
0.07913 9.95 0.000 1.0 tpcent
-0.145500 0.009692 -15.01 0.000
1.0 tcentsq -0.133389 0.006853 -19.46
0.000 1.0 pcentsq -1.14422 0.02741
-41.74 0.000 1.0 S 1.679 R-Sq
99.3 R-Sq(adj) 99.1
34
(No Transcript)
35
(No Transcript)
36
Predicted Values for New Observations New Obs
Fit SE Fit 95.0 CI 95.0 PI 1
94.926 0.722 (93.424,96.428) (91.125,98.726)
Values of Predictors for New Observations New
Obs tcent pcent tpcent tcentsq
pcentsq 1 0.0000 0.0000 0.0000
0.0000 0.0000
Write a Comment
User Comments (0)
About PowerShow.com