Title: Regression least squares
1Regression (least squares) Idea Find a line
YabX that minimizes sum of squared errors
Run Intro demo
2Why would we want to minimize the error sum of
squares?? (1) It looks nice (2) A teacher
told me to (3) Its what everyone else does (4)
None of the above
3Least squares minimizes the variances of the
estimated slopes and intercepts around the true
but unknown line relating Y to X under certain
conditions
The conditions Y a bX e e N(0,
s2) es are independent
4Distributional assumptions critical for
inference Why we care Data from JAMA. Y
age at death X length of lifeline on
hand Does lifeline relate to actual
lifetime???
Intro part II
5(No Transcript)
6 Is the slope of the line just an estimate of the
number 0 ??? If the true slope is 0, what is the
probability of seeing as big a slope as this??
(Prgtt - justified ONLY under regression
assumptions!!)
Parameter Estimates
Parameter Standard
Variable Label DF Estimate
Error t Value Pr gt t Intercept
Intercept 1 79.23341 14.83229
5.34 lt.0001 line Lifeline 1
-1.36697 1.59782 -0.86
0.3965
7Autocorrelation Observations (or residuals)
close together in time are related (not
independent) Is least squares still best? Are
tests valid? (NO!!) What do we do now ???
8Case 1 Easteren wildfires I look at smoke
content in the air. At 100 today the smoke
content is higher than usual for this time of
year. Is it likely to be higher at 200? What
does this tell you about independent deviations
???
9Case 2 Floods in Iowa I notice that the Iowa
River is higher than my model predicted for
today. Is it likely to be higher again
tomorrow? What does this tell you about
independent deviations ???
10Iowa River long term with mark at data we will
analyze
11Iowa River Data
12PROC FOREAST (1) Run linear regression (2) Fit
autoregressive model to residuals r(t)
Default 13 lags rt a1 rt-1 a1 rt-2
a13 rt-13 (3) Omit insignificant lags and
use to forecast residuals from fitted line.
13PROC FORECAST on Iowa River Data
14Case 3 Some data Retail Sales for North
Carolina Features Seasonality Trend Auto
correlation Important series for NCs economy!!
Demo1
15Features ??
16Try Linear Regression A good forecast ??
17Check Residuals !!
18Idea Jan. X11, X20, X120 Feb. X10,
X21, X120 Dec. X10, X20, ,X121
seasonal dummy variables Y
10 - 3X1 -2X2 8X12 Jan. Y 10-3 7 Feb.
Y 10-2 8 (etc.) Dec. Y 108 18 Can add
trend etc.
19Idea Regress with trend, seasonal dummy
variables Can use PROC REG or AUTOREG Autoreg
will give PROBDW Autoreg can correct for
autocorrelation
PROC AUTOREG DATASALES MODEL SALES T
MN1-MN11/DWPROB OUTPUT OUTOUTR PREDICTEDP
RESIDUALR LCLL UCLU RUN
20Try seasonal dummies Still autocorrelated
? Test Durbin-Watson Uses residuals r
Ordinary Least Squares Estimates SSE
14148607.4 DFE 131 MSE
108005 Root MSE
328.64059 SBC 2128.58701 AIC
2089.97944 Regress R-Square 0.8901
Total R-Square 0.8901 Durbin-Watson
0.9622 Pr lt DW lt.0001
21Pr lt dw 0.0001 Autocorrelation () present
now what? Two choices in SAS (1) PROC
FORECAST (2) PROC AUTOREG with option NLAG
22PROC FOREAST (1) Run linear regression (2) Fit
autoregressive model to residuals r(t)
Default 13 lags rt a1 rt-1 a1 rt-2
a13 rt-13 Why 13 lags ?
example rt 0.8 rt-1 0.5rt-12
-0.4rt-13 et rt 0.8 rt-1 0.5(rt-12
-0.8)rt-13 et (1-0.8B 0.5B2
0.4B13)rt et
23PROC FOREAST RESULTS
24PROC AUTOREG (1) Can handle any inputs
(FORECAST has only linear quadratic) (2)
Can handle seasonality through dummy
variables, autocorrelation, or
both (FORECAST only uses autocorrelation) (3)
Recomputes estimates with generalized least
squares (EGLS) (FORECAST no recomputation of
coefficients)
25PROC AUTOREG Try just seasonal dummy
variables linear trend
(Pr lt dw still lt 0.0001)
26PROC AUTOREG - seasonal dummies only
residuals
Challenge Find what weve missed
27- Challenge Find what weve missed
- Some sort of break near vertical reference line
- Autocorrelation (?)
- spike followed (2 periods later) by similar
negative spike - We can add these to our model in AUTOREG!
- Run demo 4 look at data.
-
28(No Transcript)
29- PROC AUTOREG
- Run regression on predictors
- Diagnose residual correlations
- Refit with generalized least squares
- Predict (including prediction of future
residuals!) - Final model in demo5.
30(No Transcript)