Title: PROC LOESS
1PROC LOESS
October 10, 2001 Charlie Hallahan
2Overview
- fits nonparametric models
- supports use of multidimensional predictors
- supports multiple dependent variables
- supports both direct and interpolated fitting
using kd trees - computes confidence limits for predictions
- performs iterative reweighting for robustness
- supports scoring for multiple data sets
3Local Regression the LOESS Method
Assume yi g(xi) ?i and near x0 the
regression function g(x) can be locally
approximated by some parametric function, say
linear or quadratic. Smoothing parameter
fraction of neighboring points used in local
fitting
4Local Regression the LOESS Method
Weighted Least Squares where weights from a
kernel function. Direct implementation fitting
done at each data point. Computationally
intensive. KD trees method to select
representative points for fitting. Results
blended for observed data points.
5Local Regression the LOESS Method
Confidence intervals assuming error term
is normally distributed. Robust estimation
uses iterative reweighting.
6Exploratory Scatterplot Data Analysis
Use SAS sample data set ENSO, monthly averaged
atmospheric pressure differences between Easter
Island and Darwin, Australia for a period of 168
months (NIST, 1998)
7Exploratory Scatterplot Data Analysis
Compute a LOESS fit for a range of smoothing
parameters.
ods output OutputStatisticsENSOstats
fitsummaryENSOsummary proc loess dataENSO
model PressureMonth/ smooth 0.02 to 0.2
by .01 dfmethodexact run
8Exploratory Scatterplot Data Analysis
Note ODS (Output Delivery System) used to send
output to SAS data sets. Fit Summary table
contains information about the fitting parameters
in the MODEL statement summary statistics. It
is displayed in the printed output by
default. Output Statistics table contains
fitting points and predicted values. Additional
statistics, such as confidence intervals, if
requested on the MODEL statement, are
included. It is only displayed in the printed
output if a details option is included on the
MODEL statement.
9Exploratory Scatterplot Data Analysis
Smoothing Parameter 0.02
Dependent Variable Pressure
Fit Summary Fit Method
kd Tree Blending
Linear Number of
Observations 168
Number of Fitting Points 168
kd Tree Bucket Size
1 Degree of Local Polynomials
1 Smoothing Parameter
0.02000 Points in Local
Neighborhood 3 Residual
Sum of Squares 1.57772E-29
TraceL 168.00000
GCV
. AICC
. AICC1
. Delta1
0 Delta2
0 Equivalent
Number of Parameters 168.00000
Lookup Degrees of Freedom .
Residual Standard Error
.
10Exploratory Scatterplot Data Analysis
Scatterplots for LOESS fits with the 2 extreme
smoothing parameters, 0.02 and 0.2.
0.02 overfits 0.2 too smooth optimal value in
between
11Exploratory Scatterplot Data Analysis
Strategies for choosing smoothing parameter 1.
Graph fit residuals vs. predictor variable and
look for no structure left in residuals. 2.
Use an automatic method that combines
goodness-of-fit with penalty for model
complexity.
12Exploratory Scatterplot Data Analysis
13Exploratory Scatterplot Data Analysis
14Exploratory Scatterplot Data Analysis
Use SAS macro SmoothSelect described in SAS
documentation and handout that selects smoothing
parameter based on the bias corrected
AIC. Recall the Fit Summary tables are in the
ODS generated output data set named
ENSOsummary. SmoothSelect(ENSOsummary) selects
0.05 as optimal smoothing parameter.
15Exploratory Scatterplot Data Analysis
16Exploratory Scatterplot Data Analysis
Smoothed fit appears to have a 12 month
cycle. Are there other cycles?
17Exploratory Scatterplot Data Analysis
Filter the 12-month cycle examine residuals.
data enso(droppi) set enso pi
4atan(1) sin1sin(2piMonth/12)
cos1cos(2piMonth/12) run proc reg
dataenso model Pressuresin1 cos1 output
outenso1
rFilteredPressure run
18Exploratory Scatterplot Data Analysis
Fit a LOESS model to the filtered data (after
using the macro SmoothSelect again to find the
optimal smoothing parameter of 0.12)
ods output OutputStatisticsenso1Stats
FitSummaryenso1Summary proc loess
dataenso1 model FilteredPressureMonth/smooth0.
12 dfmethodexact run
19Exploratory Scatterplot Data Analysis
Graph the LOESS fit to the filtered data
title1 "Filtered ENSO Data" symbol1 colorblack
valuedot inone h3.5 pct symbol2 colorblue
interpoljoin valuenone width2 proc gplot
dataenso1Stats format DepVar f2.0 format
Month f3.0 plot (DepVar Pred)Month/overlay
hminor 0 vminor 0 vaxis axis1 href
45 87 129 frame axis1 label ( r0 a90 )
order(-6 to 6 by 2) run
20Exploratory Scatterplot Data Analysis
Vertical reference lines indicate possible
42-month cycle.
21Exploratory Scatterplot Data Analysis
The data is filtered to remove both 12-month
42-month cycles.
42-month cycle is El Nino. There appears to be
a remaining cycle of about 25 months. The cycle
with a period in the mid twenties is known to
climatologists as the Southern Oscillation.
22Exploratory Scatterplot Data Analysis
Checking normality of the LOESS residuals - add
the option r to the model statement in PROC
LOESS to include the residuals in the ODS
output data set. Use PROC UNIVARIATE for a Q-Q
plot.
title1 "Normal Probability Plot of the LOESS Fit
Residuals" proc univariate dataENSO2stats qqpl
ot residual / normal(muest sigmaest) run
23Exploratory Scatterplot Data Analysis
Residuals appear to be normal. Can use clm
option on PROC LOESS model statement to get a
95 confidence interval around the LOESS fit.
24Surface Fitting
The data set SO4 contains measurements in grams
per square meter of sulfate (SO4) deposits
during 1990 at 179 sites throughout the 48
states.
25Surface Fitting
Deposits concentrated in the Northeast. Nonparame
tric fit seems appropriate.
26Surface Fitting
The sulfate measurements are irregularly spaced.
The following statements create a SAS data set
containing a regular grid of points to be scored
for the purpose of plotting.
data PredGrid do Latitude 25 to 49 by
1 do Longitude 67 to 124 by 1
output end end
27Surface Fitting
A LOESS surface is fit to the SO4 data with the
following statements. The preliminary steps of
determining the best smoothing parameter of 0.12
are not shown.
ods output OutputStatisticsSO4stats
ScoreResultsSO4score proc loess
dataSO4 model SO4Latitude Longitude/
smooth0.12 score dataPredGrid
run
28Surface Fitting
The following statements use the regularly spaced
grid points in SO4Score to create a contour plot
of predicted values.
title1 "Loess Fit of SO4 Data Scored on a Regular
Grid" proc g3d dataSO4Score format
Latitude f4.0 format Longitude f4.0
format p_SO4 f4.1 plot LongitudeLatitude
p_SO4/ ctop'blue'
cbottom'green' caxis'black'
grid zmin-2 zmax4
zticknum4 tilt75
rotate80 run
29Surface Fitting
Need to use caution in viewing graph near
the boundaries since there the scored points
may not have real observations near them. See
next slide.
30Surface Fitting
The points C and B are not near any data points,
yet their scored values will be used to get
interpolated values within the rectangle.
31Surface Fitting
An alternative to scoring with evenly-spaced grid
points is to use the observed data directly. To
do this, use the direct option on the MODEL
statement. Also, local quadratic polynomials can
be fit in place of local lines by using the
degree2 option. Assume SmoothSelect has been
used to select 0.34 as the best smoothing
parameter.
proc loess dataSO4 ods output
OutputStatisticsSO4Stats1 model
SO4Latitude Longitude/smooth0.34
dfmethodexact direct degree2 run
32Surface Fitting
title1 "Direct Locally Quadratic Loess Fit of SO4
Data" proc g3d dataSO4Stats1 format
Latitude f4.0 format Longitude f4.0
format Pred f4.1 scatter
LongitudeLatitudePred / zmin0
zmax5 zticknum3
grid shape'pyramid'
caxis'black' ctext'black'
color'blue' size0.35
rotate80 tilt75 run quit
33Surface Fitting
34References
To obtain the SAS program, loess.sas, used in
this demo, connect to ftp//ftp.sas.com and
link to the subdirectory /pub/SUGI24. This
example was presented by Robert Cohen of
SAS Institute at SUGI24.