Title: The Economic Sentiment Indicator
1The Economic Sentiment Indicator
2ESI (EFN Report)
3Comments
- Forecasting-model
- Integrated process
- Forecasting intervals spread out rapidly
- Point-forecasts do not converge to the mean
- Series is bounded integrationmisspecification
- The 40-Interval (and a fortiori higher
confidence intervals) contains both trend
directions - Impossible to infer the occurrence of a
turning-point - Forecasts are uninformative
4An Artificial Example(Dynamics Close to Business
Surveys)
- Model-Misspecification
- Multi-Step Ahead Forecasting
5Artificial Time Series (close to KOF-Economic
Barometer)
6Series Dynamics and Characteristics
- Bounded time series
- As are many important economic time series like
rates for example (GDP growth-rate, unenmployment
rate, interest rates, log-returns, ) - Best Forecast is known
- Identify ARIMA-forecasting-model
- TRAMO, X-12-ARIMA
7Forecasting-Model and Diagnostics
8Problems
- In applications TRAMO and/or X-12-ARIMA often
identify airline-models - Interesting series (for example rates) are often
bounded, see examples below - Here Model is I(2)-process
- Misspecification cannot be detected
- One-step ahead forecasts good
- s1.16 (true innovations are N(0,1) )
- What are the consequences?
- Multi-step ahead perspective
9Multi-step ahead Forecasts0 months after TP1 of
cycle
10Multi-step ahead Forecasts6 months after TP1 of
cycle
11Multi-step ahead Forecasts1 year after TP1 of
cycle
12Multi-step ahead Forecasts 20 months after TP1
and 0 months after TP2
13Multi-step ahead Forecasts3 months after TP 2
14Multi-step ahead Forecasts6 months after TP 2
15Comments
- One-step ahead forecasts are good
- s1.16
- Poor multi-step ahead performance
- The first TP1 is detected after 20 months
- A false positive trend slope is suggested when
the second (down-turn) TP2 occurs - The down-slope after TP2 is detected with 6
months delay - The low-frequency part (cycle) is completely
misspecified - Model assumes spectral mass lies in frequency zero
16Multi-step ahead 95 Interval-Forecasts 6 months
after TP2
17Multi-step ahead 50 Interval-Forecasts 6 months
after TP2
18Comments
- Forecast intervals spread out much too rapidly
- True ones are of constant width
- Width of misspecified ones is O(h3/2) where h is
the forecasting horizon - It is impossible to assert the occurrence of TPs
- even 50-intervals are completely uninformative
(spread out too fast)
19Conclusions
- Misspecification cannot be detected
- Statistics based on one-step ahead performances
are not well suited for most practically relevant
forecasting applications - One-step ahead performances are good
- Mean-reversion of time series cannot be captured
by misspecified model - Turning-points are detected much too late
- Performance in TP is particularly poor
- Linear forecast cannot capture curvature
- Forecast intervals spread out much too rapidly
- Completely uninformative (even 50)
20NN3
21Receive updates
- www.neural-forecasting-competition.com
ObjectivesForecast a set of 111 economic time series as accurately as possible, using methods from computational intelligence and a consistent methodology. We hope to evaluate progress in modelling neural networks for forecasting to disseminate knowledge on best practices. The competition is conducted for academic purposes and supported by a grant from SAS the International Institute of Forecasters (IIF). MethodsThe prediction competition is open to all methods of computational intelligence, incl. feed-forward and recurrent neural networks, fuzzy predictors, evolutionary genetic algorithms, decision regression tress, support vector regression, hybrid approaches etc. used in financial forecasting, statistical prediction, time series analysis
22- Competitors
- Theta-model (winner of M3)
- Forecast-Pro (best commercial package M3)
- Autobox (ARIMA-based high-performer)
- X-12-ARIMA
- Latest neural net designs
23Data
24Data/Criterion
- Length between 50 and 110 observations
- Economic real monthly data (no artificial
simulation context) - Finance
- Macroeconomic data
- With/without season
- MAPE on 1-18 step-ahead forecasts
25 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
                                                 Â
                                             Â
                             Â
                                            Â
                             Â
               NN3 Results Results on the
Complete Dataset of 111 Time Series This
represents the actual benchmark of the NN3
competition, as the reduced dataset of 11 series
is included in the 111. Congratulations to all of
you that were able to forecast this many time
series automatically! Please find the results for
the top 50 of submissions released below by name
and description. All other participants must
contact the competition organisers via email to
agree the disclosure of their name and method
with their rank.
Rank on SMAPE Participant SMAPE CONFERENCEPRESENTATION DESCRIPTION
- Stat. Contender - Wildi 14,84 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
- Stat. Benchmark - Theta Method (Nikolopoulos) 14,89 Â Â description missing
1 Illies, Jäger, Kosuchinas, Rincon, Sakenas, Vaskevcius 15,18                 Â
- Stat. Benchmark - ForecastPro (Stellwagen) 15,44 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
- CI Benchmark - Theta AI (Nikolopoulos) 15,66 presentationmissing  description missing
- Stat. Benchmark - Autobox (Reilly) 15,95 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
2 Adeodato, Vasconcelos, Arnaud, Chunha, Monteiro 16,17 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
3 Flores, Anaya, Ramirez, Morales 16,31 presentationmissing                Â
4 Chen, Yao 16,55 presentationmissing                Â
5 D'yakonov 16,57 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
6 Kamel, Atiya, Gayar, El-Shishiny 16,92 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
7 Abou-Nasr 17,54 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
8 Theodosiou, Swamy 17,55 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
- CI Benchmark - Naive MLP (Crone) 17,84 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
9 de Vos 18,24 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
10 Yan 18,58 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
- CI Benchmark - Naive SVR (Crone, Pietsch) 18,60 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
11 C49 18,72 Â Â not disclosed by author
12 Perfilieva, Novak, Pavliska, Dvorak, Stepnicka 18,81 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
13 Kurogi, Koyama, Tanaka, Sanuki 19,00 presentationmissing                Â
14 Stat. Contender - Beadle 19,14 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
15 Stat. Contender - Lewicke 19,17 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
16 Sorjamaa, Lendasse 19,60 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
17 Isa 20,00 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
18 C28 20,54 Â not disclosed by author
19 Duclos-Gosselin 20,85 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
- Stat. Benchmark - Naive 22,69 Â not disclosed by author
20 Papadaki, Amaxopolous 22,70 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
21 Stat. Benchmark - Hazarika 23,72 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
22 C17 24,09 Â not disclosed by author
23 Stat. Contender - Njimi, Mélard 24,90                 Â
24 Pucheta, Patino, Kuchen 25,13 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
25 Corzo, Hong 27,53 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
26 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
                                                 Â
                                             Â
                             Â
                                            Â
                             Â
               NN3 Results Results on the
Complete Dataset of 111 Time Series This
represents the actual benchmark of the NN3
competition, as the reduced dataset of 11 series
is included in the 111. Congratulations to all of
you that were able to forecast this many time
series automatically! Please find the results for
the top 50 of submissions released below by name
and description. All other participants must
contact the competition organisers via email to
agree the disclosure of their name and method
with their rank.
29 C49 21,05 Â not disclosed by author
 Stat. Benchmark - X12 ARIMA (McElroy) 21,48 Â
30 C35 24,03 Â not disclosed by author
27Method for NN3
- Starting Point Standard Approach
- Flexible and adaptive
28Component- and State-Space-Models
29Interpretation
- If season then SARMA(1,0,0)(1,0,0)
- No season AR(2) (possible cycle)
- Noise terms in state equation
- Variability (adaptivity) of Trend
- Variability (adaptivity) of Trend-Growth
- Stability of cycle or season
- Model allows for changing levels, slopes and
seasonals - Adaptivity is controlled by variance of noise
terms in state equation hyperparameters
30Model- and Hyperparameters
- Noise variances state
- Adaptivity/stability of components 3
hyperparameters - AR(2) or SARMA(1,0,0)(1,0,0)
- 2 Model-parameters
- Initial States
- 2 parameters for trend and trend-growth
- Variance Initial States
- 2 hyperparameters for trend and trend-growth
- Interpretation shrinkage towards initial
solution
31Modifications of traditional approachCustomizatio
n
Experiences of past Competitions, Own Experience
32Experience ? 6 Modifications
- Past Competitions
- Fit model according to relevant criterion 3
- Performance dependent on Forecasting horizon 4
- Combination of forecasts often improves over
individual forecasts 5 - Own experience
- Out-of sample performance 1
- Robustification of MSE 2
- Speed of trend-slope estimate 6
33Estimation
- Traditionally Kalman-filter leads to
ML-estimates under Gaussianity - In-sample full-ML estimates
- Modifications 1, 2 and 3
- Estimates are computed based on true out of
sample performances - Criterion is robustified
- ML-criterion is modified
- MAPE
- Last observations more important than first ones
- Account for pure multi-step ahead forecasting as
well as model-structure (one-step ahead
ML-criterion)
34Modifications out-of-sample, robustification,
Criterion
35Discussion Robustification
- Does not make sense for traditional in-sample
criterion - Outliers can be masked by parameter distortions
- In out-of-sample perspective outliers can be
detected easily - Parameters are not distorted by outlier
- Using a robust scale estimate for decision makes
sense - Outliers are down-weighted (psi-function
vanishes) - Cost extent (speed) of adaptivity
36Discussion Criterion
- Criterion is ad hoc
- First term
- Pure absolute multi-step ahead out-of-sample
forecasting performance - Absolute errors because of MAPE
- Accounts for forecasting horizon
- Down-weights the past
37Discussion Criterion
- Second term
- Traditional Likelihood (up to robustification)
- One-step ahead
- Stabilizes Model parameters and up-dating
equations - Mean-square criterion ? errors are bounded
- Local mean-square through robustification
- Avoids out-of-sample overfitting by
hyperparameters - Down-weights the past
38Modifications 4 and 5Forecast-Horizon and
-Combination
- Optimize parameters specifically for each
forecasting horizon - Robustified
- Out-of-sample
- 18 Models
- Combine these 18 forecast functions
- Median
- Accounts for numerical problems
39Modification 6 Speed and Reliability
through TP-Filter
- A fast and reliable TP-filter is computed
- DFA
- If sign of (state space trend-slope)
differs from sign of real-time TP-estimate, then
sign of the former is changed - TP-filter is faster and more reliable
40Open Issues/Problems
41Open Issues/Problems
- Numerical optimization
- Hyperparameters
- Non-Linearity due to robustification
- Median of 18 forecasting functions alleviates
problems (but is not optimal) - Choice of a (in modified ML-criterion) and
robustification rule (2.5median) arbitrary - No experience before (and after) NN3
- Tuning-Parameters
42Open Issues/Problems
- Optimization criterion is ad hoc
- Term 1 accounts for pure forecasting
- Term 2 accounts for likelihood
- Stability, overfitting
- Relative weighting of both terms is arbitrary
43Open Issues/Problems
- Changing the sign of the trend slope if it
disagrees with TP-filter is arbitrary - Choice of model is to some extent arbitrary
- AR(2) and SARMA(1,0,0)(1,0,0)
- Should try ARMA for controlling the stability
- No formal identification routine
44Open Issues/Problems
- No Irregular Observations
- Outliers
- Level-shifts
- Transitory changes
- No intervention variables
- Difficult to evaluate partial and/or overall
contribution(s) of proposed modifications - Multidimensional problem
- Analysis on NN3-data when released
45New Evidences/Principles
46Simplicity vs. Complexity
- Goodrich (2003)"Perhaps the success of the Theta
method depends upon its use of the global trend
rather than the local It strengthens the
conviction that, ceteris paribus, simple methods
outperform more complex ones." - Trend slope of local trend
- Constraints of TP-filter imply immediate local
trend - Vanishing time delay in pass-band
- Method complex
- 9 parameters for state-space, 16 parameters for
TP-filter - Numerically difficult, computationally intensive
47Unusual Observations
- Outliers treatment of unusual observations
- May be useful ex post (to improve parameter
estimates) - Difficult to use ex ante at current boundary (in
forecasting) - Is an unusual current observation an outlier
(transitory) or a shift (permanent)? - Adaptive robust models based on out-of-sample
performances are less sensitive
48Comparison with Traditional BSM
- Basic Structural Model
- First 10 Series of NN3
49Traditional BSM
- Estimation
- In-sample mean-square full ML
- Past performance as important as present one in
Likelihood - No robustification
- One-step ahead criterion
- No forecast combination
- No TP-filter
- Simpler models for cycle/seasonal
50Series 1
51Series 2
52Series 3
53Series 4
54Series 5
55Series 6
56Series 7
57Series 8
58Series 9
59Series 10
60Analysis
- Some series lead to very similar forecasts
- series 1,5,9
- Main qualitative/quantitative differences
- Seasonal (extrapolation) weaker series 2,4,6,7,8
- Trends (extrapolations) weaker series 3,6,7,8,10
- Shifts at current boundary less extreme (more
stable figures) series 3,4,10 - Unusual or extreme observations at current
boundary are down-weighted (robustification)
61Weaknesses of Forecasting Competitions
- Not real-time exercises
- Important TPs are not an issue
- May favor particular approaches
- Arbitrary categorization
62Weaknesses of Forecasting Competitions (NN3, Ms)
- Practical forecasting problems are real-time
exercises - As new information flows in, forecasts are
adjusted - The amount of adjustment is crucial for the
performance and reveals the inner forecasting
mechanisms (learning-dimension) - NN3 and past competitions (Ms) are not real-time
exercises - One cannot appreciate how new information is
processed by forecasting method (real-time
outlier treatment!) - An important learning component is missing
- Some approaches may be favored
63Weaknesses of Forecasting Competitions (NN3, Ms)
- Particular approaches can be favored
- Some of the series are cointegrated
(synchronized) - Best NN-participant This approach was based on
the observation that the 111 competition series
come in six clearly discernible groups, where
each group contains series which are
approximately or perfectly co-temporal.
64Weaknesses of Forecasting Competitions (NN3, Ms)
- Depending on the chosen time point in the
vicinity of a common TP or not this
synchronization favors particular approaches - ARIMA-models and outlier treatment are favored if
no TP occurs - DFA-TP-filter is a real-time instrument whose
utility is not given in the absence of TPs - In practice, TP behavior is crucial in
forecasting and this feature cannot be observed
here
65Weaknesses of NN3
- It is frustrating that winner looses and that
looser wins - Exclusive ranking of artificial intelligence
- Discredits sponsors (SAS as well as IIF)
- Discredits participants
- Discredits competition
- Categorization
- The best AI-approach (The official winner) relies
on time series decomposition by X-12-ARIMA
66Weaknesses of NN3
- Categorization
- The best AI-approach (The official winner) relies
on time series decomposition by X-12-ARIMA - According to the authors this approach was
based on the observation that the 111 competition
series come in six clearly discernible groups,
where each group contains series which are
approximately or perfectly co-temporal. - We defined the blocks by visual inspection of
figure 1. - This is not a consistent methodology
(self-contained algorithm relying on
machine-learning only!) - It heavily relies on a statistical approach
(X-12-ARIMA) - It heavily relies on visual inspection
67Conclusions
68Summary
- Well-designed (customized) optimization criterion
performs best - Prototypical package
- NN3-series were the first series passing through
the code - No experience, limited time
- 2 weeks for code implementation and processing
- We expect substantial fine-tuning-potential