STATS 330: Lecture 16 - PowerPoint PPT Presentation

About This Presentation
Title:

STATS 330: Lecture 16

Description:

No real evidence of non-linearity, but will check further with gams ... Check for linearity with gams library(mgcv) plot(gam(evap~s(avst) s(maxst) s(maxat) ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 28
Provided by: statAuc
Category:
Tags: stats | gams | lecture

less

Transcript and Presenter's Notes

Title: STATS 330: Lecture 16


1
STATS 330 Lecture 16
Case Study
2
Case study
  • Aim of todays lecture
  • To illustrate the modelling process using the
    evaporation data.

3
The Evaporation data
  • Data in data frame evap.df
  • Aims of the analysis
  • Understand relationships between explanatory
    variables and the response
  • Be able to predict evaporation loss given the
    other variables

4
Case Study Evaporation data
  • Recall from Lecture 15 variables are
  • evap the amount of moisture evaporating from
    the soil in the 24 hour period (response)
  • maxst maximum soil temperature over the 24 hour
    period
  • minst minimum soil temperature over the 24 hour
    period
  • avst average soil temperature over the 24 hour
    period
  • maxat maximum air temperature over the 24 hour
    period
  • minat minimum air temperature over the 24 hour
    period
  • avat average air temperature over the 24 hour
    period
  • maxh maximum humidity over the 24 hour period
  • minh minimum humidity over the 24 hour period
  • avh average humidity over the 24 hour period
  • wind average wind speed over the 24 hour period.

5
Modelling cycle
6
Modelling cycle (2)
  • Our plan of attack
  • Graphical check
  • Suitability for regression
  • Gross outliers
  • Preliminary fit
  • Model selection (for prediction)
  • Transforming if required
  • Outlier check
  • Use model for prediction

7
Step 1 Plots
  • Preliminary plots
  • Want to get an initial idea of suitability of
    data for regression modelling
  • Check for linear relationships, outliers
  • Pairs plots, coplots
  • Data looks OK to proceed, but evap/maxh plot
    looks curved

8
(No Transcript)
9
Points to note
  • Avh has very few values
  • Strong relationships between response and some
    variables (particularly maxh, avst)
  • Not much relationship between response and minst,
    minat, wind
  • strong relationships between min, av and max
  • No obvious outliers

10
Step 2 preliminary fit
  • Coefficients
  • Estimate Std. Error t value
    Pr(gtt)
  • (Intercept) -54.074877 130.720826 -0.414
    0.68164
  • avst 2.231782 1.003882 2.223
    0.03276
  • minst 0.204854 1.104523 0.185
    0.85393
  • maxst -0.742580 0.349609 -2.124
    0.04081
  • avat 0.501055 0.568964 0.881
    0.38452
  • minat 0.304126 0.788877 0.386
    0.70219
  • maxat 0.092187 0.218054 0.423
    0.67505
  • avh 1.109858 1.133126 0.979
    0.33407
  • minh 0.751405 0.487749 1.541
    0.13242
  • maxh -0.556292 0.161602 -3.442
    0.00151
  • wind 0.008918 0.009167 0.973
    0.33733
  • Residual standard error 6.508 on 35 degrees of
    freedom
  • Multiple R-squared 0.8463, Adjusted
    R-squared 0.8023
  • F-statistic 19.27 on 10 and 35 DF, p-value
    2.073e-11

11
(No Transcript)
12
Findings
  • Plots OK, normality dubious
  • Gam plots indicated no transformations
  • Point 31 has quite high Cooks distance but
    removing it doesnt change regression much
  • Model is OK.
  • Could interpret coefficients, but variables
    highly correlated.

13
Step 3 Model selection
  • Use APR
  • Model selected was
  • evap maxat maxh wind
  • However, this model does not fit all that well
    (outliers, non-normality)
  • Try best AIC model
  • evap avst maxst maxat minhmaxh
  • Now proceed to step 4

14
Step 4 Diagnostic checks
  • For a quick check, plot the regression object
    produced by lm

model1.lmlt-lm(evap avst maxst maxat
minhmaxh, dataevap.df) plot(model1.lm)
15
Outliers ? Non-normal?
16
Conclusions?
  • No real evidence of non-linearity, but will check
    further with gams
  • Normal plot looks curved
  • Some largish outliers
  • Points 2, 41 have largish Cooks D

17
Checking linearity
  • Check for linearity with gams

gt library(mgcv) gtplot(gam(evap s(avst)
s(maxst) s(maxat) s(maxh) s(wind),
dataevap.df))
18
Transform avst, maxh ?
19
Remedy
  • Gam plots for avst and maxh are curved
  • Try cubics in these variables
  • Plots look better
  • Cubic terms are significant

20
(No Transcript)
21
gt model2.lmlt-lm(evap poly(avst,3) maxst
maxat minhpoly(maxh,3), dataevap.df) gt
summary(model2.lm) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 74.6521 25.4308 2.935
0.00577 poly(avst, 3)1 83.0106 27.3221
3.038 0.00441 poly(avst, 3)2 21.4666
8.3097 2.583 0.01399 poly(avst, 3)3
14.1680 7.2199 1.962 0.05749 . maxst
-0.8167 0.1697 -4.814 2.65e-05
maxat 0.4175 0.1177 3.546
0.00111 minh 0.4580 0.3253
1.408 0.16766 poly(maxh, 3)1 -89.0809
20.0297 -4.447 8.02e-05 poly(maxh, 3)2
-10.7374 7.9265 -1.355 0.18398
poly(maxh, 3)3 15.1172 6.3209 2.392
0.02212 --- Residual standard error 5.276
on 36 degrees of freedom Multiple R-squared
0.8961, Adjusted R-squared 0.8701
F-statistic 34.49 on 9 and 36 DF, p-value
4.459e-15
22
New model
  • Lets now adopt model
  • lm(evappoly(avst,3)maxstmaxatpoly(maxh,3)
    wind
  • Outliers are not too bad but lets check

gt influenceplots(model2.lm)
23
(No Transcript)
24
(No Transcript)
25
Deletion of points
  • Points 2, 6, 7, 41 are affecting the fitted
    values, some coefficients. Removing these one at
    a time and refitting indicates that the cubics
    are not very robust, so we revert to the
    non-polynomial model
  • The coefficients of the non-polynomial model are
    fairly stable when we delete these points one at
    a time, so we decide to retain them.

26
Normality?
  • However, the normal plot for the non-polynomial
    model is not very straight WB test has p-value
    0.
  • Normality of polynomial model is better
  • Try predictions with both

27
predict.df data.frame(avst mean(evap.dfavst),
maxst mean(evap.dfmaxst), maxat
mean(evap.dfmaxat), maxh mean(evap.dfmaxh), mi
nh mean(evap.dfminh)) rbind(predict(model1.lm,
predict.df,interval"p" ), predict(model2.lm,
predict.df,interval"p" )) fit lwr
upr 1 34.67391 21.75680 47.59103 1 32.38471
21.39857 43.37084 CV fit fit lwr
upr 1 34.67391 21.02628 48.32154
Write a Comment
User Comments (0)
About PowerShow.com