Title: STOCK TREND PREDICTION WITH NEURAL NETWORK TECHNIQUES
1STOCK TREND PREDICTION WITH NEURAL NETWORK
TECHNIQUES
- Seminar Presentation
- Mohd Haris Lye Abdullah
- haris.lye_at_mmu.edu.my,
- Supervisor
- Professor Dr Y. P. Singh
- y.p.singh_at_mmu.edu.my
2Outline
- Introduction/Research Objective
- Stock Trend Prediction
- Neural network
- Support vector machine
- Feature selection
- Experiments and Result
- Conclusion
3Objectives
- a) Evaluate the performance of the neural network
techniques on the task of stock trend prediction. - Multilayer Perceptron (MLP), Radial Basis
Function (RBF) network and Support Vector Machine
(SVM) are evaluated. - b) Stock prediction is formulated and evaluated
as a 2 class classification and regression
problem. - c) Study pattern rejection technique to improve
prediction performance.
4Stock Prediction
- Stock prediction is a difficult task due to the
nature of the stock data which is very noisy and
time varying. - The efficient market hypothesis claim that future
price of the stock is not predictable based on
publicly available information. - However theory has been challenged by many
studies and a few researchers have successfully
applied machine learning approach such as neural
network to perform stock prediction
5Is the Market Predictable ?
- Efficient Market Hypothesis (EMH) (Fama, 1965)
- Stock market is efficient in that the current
market prices reflect all information available
to traders, so that future changes cannot be
predicted relying on past prices or publicly
available information. - Fama et al. (1988) showed that 25 to 40 of the
variance in - the stock returns over the period of three to
five years is - predictable from past return
- Pesaran and Timmerman (1999) conclude that the UK
stock market is - predictable for the past 25 years.
- Saad (1998) has successfully employed different
neural network models - to predict the trend of various stocks on a
short-term range
6Implementation
- In this paper we propose to investigate SVM, MLP
and RBF network for the task of predicting the
future trend of the 3 major stock indices - a) Kuala Lumpur Composite Index (KLCI)
- b) Hongkong Hangseng index
- c) Nikkei 225 stock index
- using input based on technical indicators.
- This paper approach the problem based on 2 class
pattern classification formulated specifically to
assist investor in making trading decisions - The classifier is asked to recognise investment
opportunities that can give a return of r or
more within the next h days. r3 h10 days
7System Block Diagram
- The classifier is to predict if the trend of the
stock index increment of more than 3 within the
next 10 days period can be achieved.
Data from daily historical data converted into
technical analysis indicator
Increment Achievable ??
Classifier
Yes / No
8Classification Vs Forecasting
- Forecasting
- Predict actual future value
- Classification
- Assign pattern to different class categories.
- Classification class give future trend direction
predicted.
9Data Used
- Kuala Lumpur Stock Index (KLCI) for the period of
1992-1997. -
10Data Used
- Hangseng index (20/4/1992-1/9/1997)
11Data Used
- Nikkei 225 stock index (20/4/1982-1/9/1987)
12Input to Classifier
TABLE 1 DESCRIPTION OF INPUT TO CLASSIFIER xi
i1,2,3 .12 n15 DLN (t)
signq(t)-q(t-N) ln (q(t)/q(t-N) 1) (1)
q(t) is the index level at day t and DLN (t) is
the actual input to the classifier.
13Prediction Formulation
Consider ymax(t) as the maximum upward movement
of the stock index value within the period t and
t ?. y(t) represents the stock index level at
day t
14Prediction Formulation
- Classification
- The prediction of stock trend is formulated as a
two class - classification problem.
- yr(t) gt r gtgt Class 2
- yr(t) ? r gtgt Class 1
15Prediction Formulation
- Classification
- Let (xi , yi ) 1ltiltN be a set of N training
examples, each input example xi ? Rn n15 being
the dimension of the input space, belongs to a
class labelled by yi ? ?1,-1?.
Yi -1
Yi 1
16Prediction Formulation
- Regression
- In the regression approach, the target output is
represented by a scalar value yr that represents
the predicted maximum excess return within the
period ? days ahead.
17Neural Network
- According to Haykin, S. (1994), Neural Networks
A Comprehensive Foundation, NY Macmillan, p. 2 - A neural network is a massively parallel
distributed processor that has a natural
propensity for storing experiential knowledge and
making it available for use. - Knowledge is acquired by the network through a
learning process either supervised learning or
unsupervised learning.This paper use supervised
learning where the training pattern and its
target pattern are presented to the neural
network during the learning process.
18Neural Network
- Advantages of Neural Networks
- The advantages of neural networks are due to its
adaptive and - generalization ability.
- a) Neural networks are adaptive methods that can
learn without any prior assumption of the
underlying data. - b) Neural network, namely the feed forward
multilayer perceptron and radial basis function
network have been proven to be a universal
functional approximators. - c) Neural networks are non-linear model with
good generalization ability.
19Neural Network
- Taxonomy of Neural Network Architecture
The architecture of the neural network refers to
the arrangement of the connection between
neurons, processing element, number of layers,
and the flow of signal in the neural network.
There are mainly two category of neural network
architecture feed-forward and feedback
(recurrent) neural networks
20Neural Network
- Feed-forward network, Multilayer Perceptron
21Neural Network
22Multilayer Perceptron (MLP)
Input Layer
Neuron processing element
x1
x1
Hidden Layer
h1
w1
x2
Output Layer
y
Input Vector
F(y)
O1
w2
x3
x2
x4
h2
.
wn
.
xn
.
F(y)
xn
MLP Structure
y
23Multilayer Perceptron (MLP)
- Training MLP Network
- The multilayer perceptron (MLP) network uses the
back propagation learning algorithm to obtain the
weight of the network. - Simple back propagation algorithm use the
steepest gradient descent method to make changes
to the weights. - The objective of training is to minimize the
training mean square error Emse for all the
training patterns. -
-
- To speed up training, the faster
Levenberg-Marquardt Back propagation Algorithm is
used.
24Multilayer Perceptron (MLP)
- MLP Network Setup
- Number of hidden layers
- Number of hidden neuron
- Number of input neurons
- Activation function
25RBF Network
- RBF network consist of 3 layer feed forward
structure consisting of an input layer, single
hidden layer with locally tuned hidden units and
an output layer as a linear combiner.
26RBF Network
- RBF Network Training
- The orthogonal least-square (OLS) proposed by
Chen, S. et al (1991) is a learning method that
provide a systematic selection of the centre
nodes in order to reduce the size of the RBF
network. The learning task involve finding the
appropriate centres and then the corresponding
weight. This method is adopted. - RBF centres are selected from a set of training
data. - The orthogonal least square (OLS) method is
employed as a forward regression procedure to
select the centres of RBF nodes from the
candidate set. At each step the centre that
maximize the error reduction is selected.
27Support Vector Machine
- Support Vector Machine is a special neural
network technique based on structural risk
minimisation (SRM) principle. In SRM both the
capacity of the learning machines is to be
minimized together with the training error. - In empirical risk minimization (ERM) used in
conventional neural network such as the MLP and
RBF network, only training error is minimized. - SVM was first introduced by Vapnik and
Chervonenkis in 1995.
28Support Vector Machine
- SVM demonstrate good generalization performance.
- It has sparse representation of solution. The
solution to the problem is only dependent on a
subset of training data points called support
vector. - Training of SVM is equivalent to solving a
linearly constrained quadratic programming
problem. The solution is always unique , globally
optimal and free from local minima problem.
29Support Vector Machine
- Many decision boundaries can separate these two
classes - Which one should we choose ?
Class 2
Class 1
30Support Vector Machine
Class 2
m
Class 1
In SVM the optimal separating hyperplane is
chosen to maximize the separation margin m and
minimize error.
31Optimization Problem in SVM
- Let x1, ..., xn be our data set and let yi Î
1,-1 be the class label of xi - The decision boundary should classify all points
correctly Þ - A constrained optimization problem
32Support Vector Machine
- For non linear boundry , SVM map the training
data into a higher dimension feature space using
a kernel function K(x,xi ) . - In this feature space SVM construct a separating
hyperplane which maximise the margin or distance
from the closest data points to the hyperplane
and minimizing misclassification error at the
same time. - Gaussian radial basis kernel is used and defined
as follow. - K(x,xi) exp (- ? x-xi 2 )
- The optimum separating hyperplane (OSH) is
represented by F(x)sign ( ?i yi K(x , x i )
b ) - The sign give the class label.
33Tolerance to Noise
- To allow misclassification error
- yi (w . xi b)gt 1- gt 0
- The following equation is minimized in order
to obtain the optimum hyperplane - w2 C
-
- ? is the slack variable introduced to allow
certain level of misclassified points. C is the
regularisation parameter that trade off between
misclassification error and margin maximisation.
34 - For Uneven Class Distribution
- w2 C C-
-
- Different misclassification cost can be applied
to data with different class label. - Receiver operating curve (ROC) can be
obtained by varying C and C-
35Support Vector Regression
- In the regression problem the desired output to
be predicted is real valued whereas in the
classification problems the desired output is
discreet value representing the class/categories.
- The output to be predicted is the strength of the
trend. - SVM approximate the regression function with the
following form.
36Parameter for SVM
- a) Classifier
- Regularisation constant C
- Kernel parameter
- b) Regressor
- Parameter ? for the ?-insensitive loss function
- Regularisation constant C
- Kernel parameter
37Feature Selection
- Feature selection is a process whereby a subset
of the potential predictor variables are selected
based on a relevance criterion in order to reduce
the input dimension. - Typical feature selection will involve the
following steps - Step 1. Search algorithm
- Step 2. Evaluation of generated subset
- Step 3. Evaluation of generated subset
- Step 1,2 and 3 are repeated until the stopping
criterions are met such as when the minimum
number of features is included or minimum
accepted prediction accuracy achieved.
38Feature Selection
- General Approach for Feature Selection
- a) Wrapper approach
- The wrapper approach makes use of the induction
algorithm to evaluate the relevance of the
features. - Relevance measure is based on solving the
related problem, usually the prediction accuracy
of the induction algorithm when the features are
used. -
- b) Filter approach
- Filter method selects the feature subset
independent of the induction algorithm. Features
correlation is usually used. -
39Feature Selection
- Feature Subset Selection
- The feature subset selection (FSS) algorithm can
be categorized into three categories of search
algorithms - a) exponential
- b) randomised
- c) sequential.
- Forward Sequential Selection (FSS)
- Backward Sequential Selection (BSS)
40Feature Selection
- Sequential selection technique
- a) Forward Sequential Selection (FSS)
- b) Backward Sequential Selection (BSS)
- Both BSS and FSS is used.
- Features are selected based on subset
- that gives the best predictor performance when
BSS and FSS is - used.
-
41Feature Subset Selection
- Sequential selection result
42Performance Measure
- True Positive (TP) is the number of positive
class predicted correctly as positive class. - False Positive (FP) is the number of negative
class predicted wrongly as positive class. - False Negative (FN) is the number of positive
class predicted wrongly as negative class. - True Negative (TN) is the number of negative
class predicted correctly as negative class.
43Performance Measure
- Accuracy TPTN / (TPFPTNFN)
- Precision TP/(TPFP)
- Recall rate (sensitivity) TP/(TPFN)
- F1 2 Precision Recall/(Precision Recall)
44Testing Method
Rolling Window Method is Used to Capture Training
and Test Data
Test
Train
Train 600 data Test 400 data
45Experiment and Result
- Experiments are conducted to predict the stock
trend of three major stock indexes, KLCI,
Hangseng and Nikkei. - SVM, MLP and RBF network is used in making trend
prediction based on classification and regression
approach. - A hypothetical trading system is simulated to
find out the annualized profit generated based on
the given prediction.
46Experiment and Result
47Trading Performance
- A hypothetical trading system is used
- When a positive prediction is made, one unit of
money was invested in a portfolio reflecting the
stock index. If the stock index increased by more
than r (r3) within the next h days (h10) at
day t, then the investment is sold at the index
price of day t. If not, the investment is sold on
day t1 regardless of the price. A transaction
fee of 1 is charged for every transaction made. - Use annualised rate of return .
48Trading Performance
- Classifier Evaluation Using Hypothetical Trading
System
49Trading Performance
50Experiment and Result
51Experiment and Result
- The result shows better performance of neural
network techniques when compared to K nearest
neighbour classifier. SVM shows the overall
better performance on average than MLP and RBF
network in most of the performance metric used
52Experiment and Result
- Comparison of Receiver Operating Curve (ROC)
53Experiment and Result
54Experiment and Result
55Experiment and Result
- The Accuracy-Reject (AR) curve can be plotted to
see the accuracy improvement of the classifier
due to various rejection rates. The AR curve is a
plot of the classifier operating points showing
the possible trade-off between the accuracy of
the classifier versus the rejection rate
implemented.
56Accuracy-Reject (AR) curve
57Accuracy-Reject (AR) curve
58Compare Regression Performance
- The SVM, RBF and MLP network are used as the
predictors.
59Compare Regression Performance
60Conclusion
- We have investigated the SVM, MLP and RBF network
as a classifier and regressor to assess it's
potential in the stock trend prediction task - Support vector machine (SVM) has shown better
performance when compared to MLP and RBF . - SVM classifier with probabilistic output
outperform MLP and RBF network in terms of
error-reject tradeoff - Both the classification and regression model can
be used for a profitable trend prediction system.
The classification model has the advantage in
which pattern rejection scheme can be
incorporated.
61