Title: Forecasting using simple models
1Forecasting using simple models
2Outline
- Basic forecasting models
- The basic ideas behind each model
- When each model may be appropriate
- Illustrate with examples
- Forecast error measures
- Automatic model selection
- Adaptive smoothing methods
- (automatic alpha adaptation)
- Ideas in model based forecasting techniques
- Regression
- Autocorrelation
- Prediction intervals
3Basic Forecasting Models
- Moving average and weighted moving average
- First order exponential smoothing
- Second order exponential smoothing
- First order exponential smoothing with trends
and/or seasonal patterns - Crostons method
4M-Period Moving Average
- i.e. the average of the last M data points
- Basically assumes a stable (trend free) series
- How should we choose M?
- Advantages of large M?
- Advantages of large M?
- Average age of data M/2
5Weighted Moving Averages
- The Wi are weights attached to each historical
data point - Essentially all known (univariate) forecasting
schemes are weighted moving averages - Thus, dont screw around with the general
versions unless you are an expert
6Simple Exponential Smoothing
- Pt1(t) Forecast for time t1 made at time t
- Vt Actual outcome at time t
- 0lt?lt1 is the smoothing parameter
7Two Views of Same Equation
- Pt1(t) Pt(t-1) ?Vt Pt(t-1)
- Adjust forecast based on last forecast error
- OR
- Pt1(t) (1- ?)Pt(t-1) ?Vt
- Weighted average of last forecast and last Actual
8Simple Exponential Smoothing
- Is appropriate when the underlying time series
behaves like a constant Noise - Xt ? Nt
- Or when the mean ? is wandering around
- That is, for a quite stable process
- Not appropriate when trends or seasonality present
9ES would work well here
10Simple Exponential Smoothing
- We can show by recursive substitution that ES can
also be written as - Pt1(t) ?Vt ?(1-?)Vt-1 ?(1-?)2Vt-2
?(1-?)3Vt-3 .. - Is a weighted average of past observations
- Weights decay geometrically as we go backwards in
time
11(No Transcript)
12Simple Exponential Smoothing
- Ft1(t) ?At ?(1-?)At-1 ?(1-?)2At-2
?(1-?)3At-3 .. - Large ? adjusts more quickly to changes
- Smaller ? provides more averaging and thus
lower variance when things are stable - Exponential smoothing is intuitively more
appealing than moving averages
13Exponential Smoothing Examples
14Zero Mean White Noise
15(No Transcript)
16(No Transcript)
17Shifting Mean Zero Mean White Noise
18(No Transcript)
19(No Transcript)
20Automatic selection of ?
- Using historical data
- Apply a range of ? values
- For each, calculate the error in one-step-ahead
forecasts - e.g. the root mean squared error (RMSE)
- Select the ? that minimizes RMSE
21RMSE vs Alpha
1.45
1.4
1.35
RMSE
1.3
1.25
1.2
1.15
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Alpha
22Recommended Alpha
- Typically alpha should be in the range 0.05 to
0.3 - If RMSE analysis indicates larger alpha,
exponential smoothing may not be appropriate
23(No Transcript)
24(No Transcript)
25Might look good, but is it?
26(No Transcript)
27(No Transcript)
28Series and Forecast using Alpha0.9
2
1.5
1
Forecast
0.5
0
-0.5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Period
29Forecast RMSE vs Alpha
0.67
0.66
0.65
0.64
0.63
Forecast RMSE
0.62
Series1
0.61
0.6
0.59
0.58
0.57
0
0.2
0.4
0.6
0.8
1
Alpha
30(No Transcript)
31(No Transcript)
32Forecast RMSE vs Alpha
for Lake Huron Data
1.1
1.05
1
0.95
0.9
RMSE
0.85
0.8
0.75
0.7
0.65
0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Alpha
33(No Transcript)
34(No Transcript)
35Forecast RMSE vs Alpha
for Monthly Furniture Demand Data
45.6
40.6
35.6
30.6
25.6
RMSE
20.6
15.6
10.6
5.6
0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Alpha
36Exponential smoothing will lag behind a trend
- Suppose Xtb0 b1t
- And St (1- ?)St-1 ?Xt
- Can show that
37(No Transcript)
38Double Exponential Smoothing
- Modifies exponential smoothing for following a
linear trend - i.e. Smooth the smoothed value
39St Lags
St2 Lags even more
402St -St2 doesnt lag
41(No Transcript)
42(No Transcript)
43(No Transcript)
44Example
45?0.2
46Single Lags a trend
476
5
4
Double Over-shoots a change (must re-learn the
slope)
3
Trend
2
Series Data
Single Smoothing
Double smoothing
1
0
-1
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101
48Holt-Winters Trend and Seasonal Methods
- Exponential smoothing for data with trend and/or
seasonality - Two models, Multiplicative and Additive
- Models contain estimates of trend and seasonal
components - Models smooth, i.e. place greater weight on
more recent data
49Winters Multiplicative Model
- Xt (b1b2t)ct ?t
- Where ct are seasonal terms and
- Note that the amplitude depends on the level of
the series - Once we start smoothing, the seasonal components
may not add to L
50Holt-Winters Trend Model
- Xt (b1b2t) ?t
- Same except no seasonal effect
- Works the same as the trend season model except
simpler
51 52(10.04t)
53150
5450
55- The seasonal terms average 100 (i.e. 1)
- Thus summed over a season, the ct must add to L
- Each period we go up or down some percentage of
the current level value - The amplitude increasing with level seems to
occur frequently in practice
56Recall Australian Red Wine Sales
57Smoothing
- In Winters model, we smooth the permanent
component, the trend component and the
seasonal component - We may have a different smoothing parameter for
each (?, ?, ?) - Think of the permanent component as the current
level of the series (without trend)
58(No Transcript)
59Current Observation
60Current Observation deseasonalized
61Estimate of permanent component from last time
last level slope1
62(No Transcript)
63(No Transcript)
64observed slope
65observed slope
previous slope
66(No Transcript)
67(No Transcript)
68Extend the trend out ? periods ahead
69Use the proper seasonal adjustment
70Winters Additive Method
- Xt b1 b2t ct ?t
- Where ct are seasonal terms and
- Similar to previous model except we smooth
estimates of b1, b2, and the ct
71Crostons Method
- Can be useful for intermittent, erratic, or
slow-moving demand - e.g. when demand is zero most of the time (say
2/3 of the time) - Might be caused by
- Short forecasting intervals (e.g. daily)
- A handful of customers that order periodically
- Aggregation of demand elsewhere (e.g. reorder
points)
72(No Transcript)
73Typical situation
- Central spare parts inventory (e.g. military)
- Orders from manufacturer
- in batches (e.g. EOQ)
- periodically when inventory nearly depleted
- long lead times may also effect batch size
74Example
Demand each period follows a distribution that
is usually zero
75Example
76Example
- Exponential smoothing applied (?0.2)
77Using Exponential Smoothing
- Forecast is highest right after a non-zero demand
occurs - Forecast is lowest right before a non-zero demand
occurs
78Crostons Method
- Separately Tracks
- Time between (non-zero) demands
- Demand size when not zero
- Smoothes both time between and demand size
- Combines both for forecasting
Demand Size Forecast
Time between demands
79Define terms
- V(t) actual demand outcome at time t
- P(t) Predicted demand at time t
- Z(t) Estimate of demand size (when it is not
zero) - X(t) Estimate of time between (non-zero)
demands - q a variable used to count number of
periods between non-zero demand
80Forecast Update
- For a period with zero demand
- Z(t)Z(t-1)
- X(t)X(t-1)
- No new information about
- order size Z(t)
- time between orders X(t)
- qq1
- Keep counting time since last order
81Forecast Update
- For a period with non-zero demand
- Z(t)Z(t-1) ?(V(t)-Z(t-1))
- X(t)X(t-1) ?(q - X(t-1))
- q1
82Forecast Update
- For a period with non-zero demand
- Z(t)Z(t-1) ?(V(t)-Z(t-1))
- X(t)X(t-1) ?(q - X(t-1))
- q1
- Update Size of order via smoothing
Latest order size
83Forecast Update
- For a period with non-zero demand
- Z(t)Z(t-1) ?(V(t)-Z(t-1))
- X(t)X(t-1) ?(q - X(t-1))
- q1
- Update size of order via smoothing
- Update time between orders via smoothing
Latest time between orders
84Forecast Update
- For a period with non-zero demand
- Z(t)Z(t-1) ?(V(t)-Z(t-1))
- X(t)X(t-1) ?(q - X(t-1))
- q1
- Update size of order via smoothing
- Update time between orders via smoothing
- Reset counter of time between orders
Reset counter
85Forecast
86Recall example
- Exponential smoothing applied (?0.2)
87Recall example
- Crostons method applied (?0.2)
88What is it forecasting?
- Average demand per period
True average demand per period0.176
89Behavior
- Forecast only changes after a demand
- Forecast constant between demands
- Forecast increases when we observe
- A large demand
- A short time between demands
- Forecast decreases when we observe
- A small demand
- A long time between demands
90Crostons Method
- Crostons method assumes demand is independent
between periods - That is one period looks like the rest
- (or changes slowly)
91Counter Example
- One large customer
- Orders using a reorder point
- The longer we go without an order
- The greater the chances of receiving an order
- In this case we would want the forecast to
increase between orders - Crostons method may not work too well
92Better Examples
- Demand is a function of intermittent random
events - Military spare parts depleted as a result of
military actions - Umbrella stocks depleted as a function of rain
- Demand depending on start of construction of
large structure
93Is demand Independent?
- If enough data exists we can check the
distribution of time between demand - Should tail off geometrically
94Theoretical behavior
95In our example
96Comparison
97Counterexample
- Crostons method might not be appropriate if the
time between demands distribution looks like this
98Counterexample
- In this case, as time approaches 20 periods
without demand, we know demand is coming soon. - Our forecast should increase in this case
99Error Measures
- Errors The difference between actual and
predicted (one period earlier) - et Vt Pt(t-1)
- et can be positive or negative
- Absolute error et
- Always positive
- Squared Error et2
- Always positive
- The percentage error PEt 100et / Vt
- Can be positive or negative
100Bias and error magnitude
- Forecasts can be
- Consistently too high or too low (bias)
- Right on average, but with large deviations both
positive and negative (error magnitude) - Should monitor both for changes
101Error Measures
- Look at errors over time
- Cumulative measures summed or averaged over all
data - Error Total (ET)
- Mean Percentage Error (MPE)
- Mean Absolute Percentage Error (MAPE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Smoothed measures reflects errors in the recent
past - Mean Absolute Deviation (MAD)
102Error Measures
Measure Bias
- Look at errors over time
- Cumulative measures summed or averaged over all
data - Error Total (ET)
- Mean Percentage Error (MPE)
- Mean Absolute Percentage Error (MAPE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Smoothed measures reflects errors in the recent
past - Mean Absolute Deviation (MAD)
103Error Measures
Measure error magnitude
- Look at errors over time
- Cumulative measures summed or averaged over all
data - Error Total (ET)
- Mean Percentage Error (MPE)
- Mean Absolute Percentage Error (MAPE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Smoothed measures reflects errors in the recent
past - Mean Absolute Deviation (MAD)
104Error Total
- Sum of all errors
- Uses raw (positive or negative) errors
- ET can be positive or negative
- Measures bias in the forecast
- Should stay close to zero as we saw in last
presentation
105MPE
- Average of percent errors
- Can be positive or negative
- Measures bias, should stay close to zero
106MSE
- Average of squared errors
- Always positive
- Measures magnitude of errors
- Units are demand units squared
107RMSE
- Square root of MSE
- Always positive
- Measures magnitude of errors
- Units are demand units
- Standard deviation of forecast errors
108MAPE
- Average of absolute percentage errors
- Always positive
- Measures magnitude of errors
- Units are percentage
109Mean Absolute Deviation
- Smoothed absolute errors
- Always positive
- Measures magnitude of errors
- Looks at the recent past
110Percentage or Actual units
- Often errors naturally increase as the level of
the series increases - Natural, thus no reason for alarm
- If true, percentage based measured preferred
- Actual units are more intuitive
111Squared or Absolute Errors
- Absolute errors are more intuitive
- Standard deviation units less so
- 66 within ? 1 S.D.
- 95 within ? 2 S.D.
- When using measures for automatic model
selection, there are statistical reasons for
preferring measures based on squared errors
112Ex-Post Forecast Errors
- Given
- A forecasting method
- Historical data
- Calculate (some) error measure using the
historical data - Some data required to initialize forecasting
method. - Rest of data (if enough) used to calculate
ex-post forecast errors and measure
113Automatic Model Selection
- For all possible forecasting methods
- (and possibly for all parameter values e.g.
smoothing constants but not in SAP?) - Compute ex-post forecast error measure
- Select method with smallest error
114Automatic ? Adaptation
- Suppose an error measure indicates behavior has
changed - e.g. level has jumped up
- Slope of trend has changed
- We would want to base forecasts on more recent
data - Thus we would want a larger ?
115Tracking Signal (TS)
- Bias/Magnitude Standardized bias
116? Adaptation
- If TS increases, bias is increasing, thus
increase ? - I dont like these methods due to instability
117Model Based Methods
- Find and exploit patterns in the data
- Trend and Seasonal Decomposition
- Time based regression
- Time Series Methods (e.g. ARIMA Models)
- Multiple Regression using leading indicators
- Assumes series behavior stays the same
- Requires analysis (no automatic model
generation)
118Univariate Time Series Models Based on
Decomposition
- Vt the time series to forecast
- Vt Tt St Nt
- Where
- Tt is a deterministic trend component
- St is a deterministic seasonal/periodic component
- Nt is a random noise component
119?(Vt)0.257
120(No Transcript)
121Simple Linear Regression Model
Vt2.8771740.020726t
122Use Model to Forecast into the Future
123Residuals Actual-Predictedet
Vt-(2.8771740.020726t)
?(et)0.211
124Simple Seasonal Model
- Estimate a seasonal adjustment factor for each
period within the season - e.g. SSeptember
125Sorted by season
Season averages
126Trend Seasonal Model
- Vt2.8771740.020726t Smod(t,3)
- Where
- S1 0.250726055
- S2 -0.242500035
- S3 -0.008226125
127(No Transcript)
128e?t Vt - (2.877174 0.020726t Smod(t,3))
?(e?t)0.145
129Can use other trend models
- Vt ?0 ?1Sin(2?t/k) (where k is period)
- Vt ?0 ?1t ?2t2 (multiple regression)
- Vt ?0 ?1ekt
- etc.
- Examine the plot, pick a reasonable model
- Test model fit, revise if necessary
130(No Transcript)
131(No Transcript)
132Model Vt Tt St Nt
- After extracting trend and seasonal components we
are left with the Noise - Nt Vt (Tt St)
- Can we extract any more predictable behavior from
the noise? - Use Time Series analysis
- Akin to signal processing in EE
133Zero Mean, and AperiodicIs our best forecast
?
134AR(1) Model
- This data was generated using the model
- Nt 0.9Nt-1 Zt
- Where Zt N(0,?2)
- Thus to forecast Nt1,we could use
135(No Transcript)
136(No Transcript)
137Time Series Models
- Examine the correlation of the time series to
past values. - This is called autocorrelation
- If Nt is correlated to Nt-1, Nt-2,..
- Then we can forecast better than
-
138Sample Autocorrelation Function
139Back to our Demand Data
140No Apparent Significant Autocorrelation
141Multiple Linear Regression
- V ?0 ?1 X1 ?2 X2 . ?p Xp ?
- Where
- V is the independent variable you want to
predict - The Xis are the dependent variables you want to
use for prediction (known) - Model is linear in the ?is
142Examples of MLR in Forecasting
- Vt ?0 ?1t ?2t2 ?3Sin(2?t/k) ?4ekt
- i.e a trend model, a function of t
- Vt ?0 ?1X1t ?2X2t
- Where X1t and X2t are leading indicators
- Vt ?0 ?1Vt-1 ?2Vt-2 ?12Vt-12 ?13Vt-13
- An Autoregressive model
143Example Sales and Leading Indicator
144Example Sales and Leading Indicator
Sales(t) -3.930.83Sales(t-3)
-0.78Sales(t-2)1.22Sales(t-1) -5.0Lead(t)