Title: Eric%20Plummer
1Time Series Forecasting WithFeed-Forward Neural
NetworksGuidelines And Limitations
- Eric Plummer
- Computer Science Department
- University of Wyoming
- June 6, 2018
2Topics
- Thesis Goals
- Time Series Forecasting
- Neural Networks
- K-Nearest-Neighbor
- Test-Bed Application
- Empirical Evaluation
- Data Preprocessing
- Contributions
- Future Work
- Conclusion
- Demonstration
3Thesis Goals
- Compare neural networks and k-nearest-neighbor
for time series forecasting - Analyze the response of various configurations to
data series with specific characteristics - Identify when neural networks and
k-nearest-neighbor are inadequate - Evaluate the effectiveness of data preprocessing
4Time Series Forecasting Description
- What is it?
- Given an existing data series, observe or model
the data series to make accurate forecasts - Example data series
- Financial (e.g., stocks, rates)
- Physically observed (e.g., weather, sunspots)
- Mathematical (e.g., Fibonacci sequence)
5Time Series Forecasting Difficulties
- Why is it difficult?
- Limited quantity of data
- Observed data series sometimes too short to
partition - Noise
- Erroneous data points
- Obscuring component
- Moving Average
- Nonstationarity
- Fundamentals change over time
- Nonstationary mean Ascending data series
- First-difference preprocessing
- Forecasting method selection
- Statistics
- Artificial intelligence
6Time Series Forecasting Importance
- Why is it important?
- Preventing undesirable events by forecasting the
event, identifying the circumstances preceding
the event, and taking corrective action so the
event can be avoided (e.g., inflationary economic
period) - Forecasting undesirable, yet unavoidable, events
to preemptively lessen their impact (e.g., solar
maximum w/ sunspots) - Profiting from forecasting (e.g., financial
markets)
7Neural Networks Background
- Loosely based on the human brains neuron
structure - Timeline
- 1940s McCulloch and Pitts proposed neuron
models in the form of binary threshold devices
and stochastic algorithms - 1950s 1960s Rosenblatt class of learning
machines called perceptrons - Late 1960s Minsky and Papert discouraging
analysis of perceptrons (linearly separable
classes) - 1980s Rumelhart, Hinton, and Williams
generalized delta rule for learning by
back-propagation for training multilayer
perceptrons - Present many new training algorithms and
architectures, but nothing revolutionary
8Neural Networks Architecture
- A feed-forward neural network can have any number
of - Layers
- Units per layer
- Network inputs
- Network outputs
- Hidden layers (A, B)
- Output layer (C)
9Neural Networks Units
- A unit has
- Connections
- Weights
- Bias
- Activation function
- Weights and bias are randomly initialized before
training - Units input consists of
- Sum of the products of each connection value and
associated weight - Add the bias
- Input is then fed into units activation function
- Units output is the output of activation
function - Hidden layers Sigmoid
- Output layer Linear
10Neural Networks Training
- Partition data series into
- Training set
- Validation set (optional)
- Test set (optional)
- Typically, the training procedure is
- Perform backpropagation training with training
set - After n epochs, compute total squared error on
training set and validation set - If consistently validation error ? and training
error ?, stop training. - Overfitting Training set learned too well
- Generalization Given inputs not in training and
validation sets, able to accurately forecast
11Neural Networks Training
- Backpropagation training
- First, examples in the form of ltinput, outputgt
pairs are extracted from the data series - Then, the network is trained with backpropagation
on the examples - Present an examples input vector to the network
inputs and run the network sequentially forward - Propagate the error sequentially backward from
the output layer - For every connection, change the weight modifying
that connection in proportion to the error - When all three steps have been performed for all
examples, one epoch has occurred - Goal is to converge to a near-optimal solution
based on the total squared error
12Neural Networks Training
Backpropagation training cycle
13Neural Networks Forecasting
- Forecasting method depends on examples
- Examples depend on step-ahead size
If step-ahead size is one Iterative forecasting
If step-ahead size is greater than one Direct
forecasting
14Neural Networks Forecasting
Iterative forecasting
Can continue this indefinitely
15Neural Networks Forecasting
Directly forecasting n steps
This is the only forecast
16K-Nearest-Neighbor Forecasting
- No model to train
- Simple linear search
- Compare reference to candidates
- Select k candidates with lowest error
- Forecast is average of k next values
17Test-Bed Application FORECASTER
- Written in Visual C with MFC
- Object-oriented
- Multithreaded
- Wizard-based
- Easily modified
- Implements feed-forward neural networks
k-nearest-neighbor - Used for time series forecasting
- Eventually will be upgraded for classification
problems
18Empirical Evaluation Data Series
Less Noisy
Original
More Noisy
Ascending
Sunspots
19Empirical Evaluation Neural Network
Architectures
- Number of network inputs based on data series
- Need to make unambiguous examples
- For sawtooths
- 24 inputs are necessary
- Test networks with 25 35 inputs
- Test networks with 1 hidden layer with 2, 10,
20 hidden layer units - One output layer unit
- For sunspots
- 30 inputs
- 1 hidden layer with 30 units
- For real-world data series, selection may be
trial-and-error!
20Empirical Evaluation Neural Network Training
- Heuristic method
- Start with aggressive learning rate
- Gradually lower learning rate as validation error
increases - Stop training when learning rate cannot be
lowered anymore
- Simple method
- Use conservative learning rate
- Training stops when
- Number of training epochs equals the epochs limit
-or- - Training error is less than or equal to error
limit
21Empirical Evaluation Neural Network Forecasting
- Metric to compare forecasts Coefficient of
Determination - Value may be (-?, 1
- Want value between 0 and 1, where 0 is
forecasting the mean of the data series and 1 is
forecasting the actual value - Must have actual values to compare with
forecasted values
- For networks trained on original, less noisy, and
more noisy data series, forecast will be compared
to original series - For networks trained on ascending data series,
forecast will be compared to continuation of
ascending series - For networks trained on sunspots data series,
forecast will be compared to test set
22Empirical Evaluation K-Nearest-Neighbor
- Choosing window size analogous to choosing number
of neural network inputs - For sawtooth data series
- k 2
- Test window sizes of 20, 24, and 30
- For sunspots data series
- k 3
- Window size of 10
- Compare forecasts via coefficient of determination
23Empirical Evaluation Candidate Selection
- Neural networks
- For each training method, data series, and
architecture, 3 candidates were trained - Also, average of 3 candidates forecasts was
taken forecasting by committee - Best forecast was selected based on coefficient
of determination - K-nearest-neighbor
- For each data series, k, and window size, only
one search was performed (only one needed)
24Empirical Evaluation Original Data Series
Simple NN
Heuristic NN
Smaller NN
K-N-N
25Empirical Evaluation Less Noisy Data Series
Simple NN
Heuristic NN
K-N-N
26Empirical Evaluation More Noisy Data Series
Simple NN
Heuristic NN
K-N-N
27Empirical Evaluation Ascending Data Series
Simple NN
Heuristic NN
28Empirical Evaluation Longer Forecast
Heuristic NN
29Empirical Evaluation Sunspots Data Series
Simple NN K-N-N
30Empirical Evaluation Discussion
- Heuristic training method observations
- Networks train longer (more epochs) on smoother
data series like the original and ascending data
series - The total squared error and unscaled error are
higher for noisy data series - Neither the number of epochs nor the errors
appear to correlate well with the coefficient of
determination - In most cases, the committee forecast is worse
than the best candidate's forecast - When actual values are unavailable, choosing the
best candidate is difficult!
31Empirical Evaluation Discussion
- Simple training method observations
- The total squared error and unscaled error are
higher for noisy data series with the exception
of the 35101 network trained on the more noisy
data series - The errors do not appear to correlate well with
the coefficient of determination - In most cases, the committee forecast is worse
than the best candidate's forecast - There are four networks whose coefficient of
determination is negative, compared with two for
the heuristic training method
32Empirical Evaluation Discussion
- General observations
- One training method did not appear to be clearly
better - Increasingly noisy data series increasingly
degraded the forecasting performance - Nonstationarity in the mean degraded the
performance - Too few hidden units (e.g., 3521) forecasted
well on simpler data series, but failed for more
complex ones - Excessive numbers of hidden units (e.g, 35201)
did not hurt performance - Twenty-five network inputs was not sufficient
- K-nearest-neighbor was consistently better than
the neural networks - Feed-forward neural networks are extremely
sensitive to architecture and parameter choices,
and making such choices is currently more art
than science, more trial-and-error than absolute,
more practice than theory!
33Data Preprocessing
- First-difference
- For ascending data series, a neural network
trained on first-difference can forecast near
perfectly - In that case, it is better to train and forecast
on first-difference - FORECASTER reconstitutes forecast from its
first-difference - Moving average
- For noisy data series, moving average would
eliminate much of the noise - But would also smooth out peaks and valleys
- Series may then be easier to learn and forecast
- But in some series, the noise may be important
data (e.g., utility load forecasting)
34Contributions
- Filled a void within feed-forward neural network
time series forecasting literature know how
networks respond to various data series
characteristics in a controlled environment - Showed that k-nearest-neighbor is a better
forecasting method for the data series used in
this research - Reaffirmed that neural networks are very
sensitive to architecture, parameter, and
learning method changes - Presented some insight into neural network
architecture selection selecting number of
network inputs based on data series - Presented a neural network training heuristic
that produced good results
35Future Work
- Upgrade FORECASTER to work with classification
problems - Add more complex network types, including wavelet
networks for time series forecasting - Investigate k-nearest-neighbor further
- Add other forecasting methods, (e.g., decision
trees for classification)
36Conclusion
- Presented
- Time series forecasting
- Neural networks
- K-nearest-neighbor
- Empirical evaluation
- Learned a lot about the implementation details of
the forecasting techniques - Learned a lot about MFC programming
Thank You
37Demonstration
Various files can be found at http//w3.uwyo.edu/
eplummer
38Unit Output, Error, and Weight Change Formulas
39Forecast Error Formulas
40Related Work
- Drossu and Obradovic (1996) hybrid stochastic
and neural network approach to time series
forecasting - Zhang and Thearling (1994) parallel
implementations of neural networks and
memory-based reasoning - Geva (1998) multiscale fast wavelet transform
and an array of feed-forward neural networks - Lawrence, Tsoi, and Giles (1996) encodes the
series with a self-organizing map and uses
recurrent neural networks - Kingdon (1997) automated intelligent system for
financial forecasting and uses neural networks
and genetic algorithms