Title: Genetic Programming for Financial Trading
1Genetic Programming for Financial Trading
- Nicolas NAVET
- INRIA, France
- AIECON NCCU, Taiwan
- http//www.loria.fr/nnavet
Tutorial at CIEF 2006, Kaohsiung, Taiwan,
08/10/2006
2Outline of the talk (1/2)
- PART 1 Genetic programming (GP) ?
- GP among machine learning techniques
- GP on the symbolic regression problem
- Pitfalls GP
- PART 2 GP for financial trading
- Various schemes
- How to implement it ?
- Experimentations GP at work
3Outline of the talk (2/2)
- PART 3 Analyzing GP results
- Why GP results are usually inconclusive?
- Benchmarking with
- Zero-intelligence trading strategies
- Lottery Trading
- Answering the questions
- is there anything to learn on the data at hand
- is GP effective at this task
- PART 4 Perspectives
4GP is a Machine Learning technique
- Ultimate goal of machine learning is the
automatic programming, that is computers
programming themselves .. - More achievable goal Build computer-based
systems that can adapt and learn from their
experience - ML algorithms originate from many fields
mathematics (logic, statistics), bio-inspired
techniques (neural networks), evolutionary
computing (Genetic Algorithm, Genetic
Programming), swarm intelligence (ant, bees)
5Evolutionary Computing
- Algorithms that make use of mechanisms inspired
by natural evolution, such as - Survival of the fittest among an evolving
population of solutions - Reproduction and mutation
- Prominent representatives
- Genetic Algorithm (GA)
- Genetic Programming (GP) GP is a branch of GA
where the genetic code of a solution is of
variable length - Over the last 50 years, evolutionary algorithms
have proved to be very efficient for finding
approximate solutions to algorithmically complex
problems
6Two main problems in Machine Learning
- Classification model output is a prediction
whether the input belongs to some particular
class - Examples Human being recognition in image
analysis, spam detection, credit scoring, market
timing decisions - Regression prediction of the systems output
for a specific input - Example predict tomorrow's opening price for a
stock given closing price, market trend, other
stock exchanges,
7 Functioning scheme of ML
Learning on a training interval
8GP basics
9Genetic programming
- GP is the process of evolving a population of
computer programs, that are candidate solutions,
according to the evolutionary principles (e.g.
survival of the fittest)
Generate a population of random programs
10In GP, programs are represented by trees (1/3)
- Trees are a very general representation form
11In GP, programs are represented by trees (2/3)
12In GP, programs are represented by trees (3/3)
- Trading rule formula BUY IF (VOLgt10) AND
(Moving Average(25) gt Moving Average(45))
Picture from BhPiZu02
13Preliminary steps of GP
- The user has to define
- the set of terminals
- the set of functions
- how to evaluate the quality of an individual
the fitness measure - parameters of the run e.g. number of
individuals of the population - the termination criterion
14Symbolic regression a problem GP is good at
- Symbolic regression find a function that fits
well a set of experimental data points
- Symbolic means that one looks for both
- the functional form
- - the value of the parameters, e.g.
- Differs from other regressions where one solely
looks for the best coefficient values for a
pre-fixed model. Usually the choice of the model
is the most difficult issue !
15Symbolic regression
- Find the function s.t. as far as
possible
- Possible fitness function
16GP Operators biologically inspired
- Recombination (aka crossover) 2 individuals
share genetic material and create one or several
offsprings - Mutation introduce genetic diversity by random
changes in the genetic code - Reproduction individual survives as is in the
next generation
17Selection Operators for Crossover/reproduction
- General principles in GP the fittest
individuals should have more chance to survive
and transmit their genetic code
18Standard Recombination (aka crossover)
- Standard recombination exchange two randomly
chosen sub-trees among the parents
19Mutation Operator 1 standard mutation
- Standard mutation replacement of a sub-tree
with a randomly generated one
20Mutation Operator 2 swap sub-tree mutation
- Swap sub-tree Mutation swap two sub-trees of
an individual
21Mutation Operator 3 shrink mutation
- Shrink Mutation replacing a branch (a node
with one or more arguments) with one of his child
node
22Other Mutation Operators
- Swap mutation (? swap sub-tree mutation)
exchanging the function associated to a node by
one having the same number of arguments - Headless Chicken crossover mutation
implemented as a crossover between a program and
a newly generated random program - .
23Reproduction / Elitism Operators
- Reproduction an individual is reproduced in
the next generation without any modification
- Elitism the best n individuals are kept in the
next generation
24GP is no silver bullet
25GP Issue 1 how to choose the function set ?
- The problem cannot be solved if the set of
functions is not sufficient - But Non-relevant functions increases uselessly
the search space
- Problem there is no automatic way to decide a
priori the relevant functions and to build a
sufficient function sets
26Problem cannot be solved if the set of functions
is not sufficient illustration
27Results with sin(x) in the function set ?
Typical outcome
28Results without sin(x) in the function set ?
Typical outcome
29Yes, sin(x) can be approximated by its Taylors
series ..
Sin(x) and taylor approximation of degree 1, 3 ,
5, 7, 9, 11, 13 image Wikipedia
- Problem 1 there is little hope to discover
that ..
- Problem 2 what happens outside the training
interval ?
30Composition of the function set is crucial
illustration
- Subset is
extraneous in this context
- Same experimental setup as before
31Function set containing redundant functions ?
(1/2)
Typical outcome
32Function set containing redundant functions ?
(2/2)
- On average, with the extraneous functions the
best solution is 10 farther from the curve in
the training interval (much more outside!)
- With the extraneous functions, the average
solution is better .. because the tree is more
likely to contain a trigonometric function
33GP Issue 2 code bloat
- Solutions increase in size over generations
Same experimental setup as before
34GP Issue 2 code bloat
- Much of the genetic code has no influence on the
fitness .. but may constitute a useful reserve
of genetic material
non-effective code !! aka introns
35Code bloat why is it a problem ?
- Solutions are hard to understand
- learning something from huge solutions is almost
impossible .. - One has no confidence using programs one does not
understand ! - Much of the computing power is spent manipulating
non-contributing code, which may slow down the
search
36Countermeasures .. (1/2)
- Static limit of the tree depth
- Dynamic maximum tree depth SiAl03 the limit
is increased each time an outstanding individual
deeper than the current limit is found - Limit the probability of longer-than-average
individuals to be chosen by reducing their
fitness - Apply operators than ensure limited code growth
- Discard newly created individuals whose
behavior is too close to the ones of their
parents (e.g. behavior for regression pb could
be position of the points Str03) -
37Countermeasures .. (2/2)
- Possible symbolic simplification of the tree
can be simplified into
- Needs to be further investigated ! preliminary
experiments TeHe04 show that simplification
does not necessarily help (introns may
constitute a useful reserve of genetic materials)
38GP Issue 3 GP can be disappointing outside the
training set
and such a behavior can hardly be predicted
39GP Issue 3 explanation (1/2)
- Usually GP functions are implemented to have the
closure property each function must be able to
handle every possible value - What to do with
- division by 0 ?
- sqrt(x) with x lt 0 ?
-
- Solution protected operators, eg. the
division - if (abs(denominator) lt value-near-0) return 1
40GP Issue 3 explanation (2/2)
- in our case, fragment of the best GP tree
- Why did it not occur on the training interval ?
- not training points chosen such that
41GP Issue 4 standard GP is not good at finding
numerical constants (1/3)
- Where do numerical values come from ?
- Ephemeral random constants random values
inserted at the leafs of the GP trees during the
creation of initial population - Use of arithmetic operators on existing
numerical constants - Generation by combination of variables/functions
- Lately, many studies show that standard GP is
not good at finding constants
42GP Issue 4 standard GP is not good at finding
numerical constants (2/2)
- Experiment find a constant function equal to
the numeric constant 3.141592
43GP Issue 4 standard GP is not good at finding
numerical constants (3/3)
- There are several more efficient schemes for
constants generation in GP Dem95 - - local optimization ZuPiMa01,
- numeric mutation EvFe98,
-
- One of them should be implemented otherwise 1)
computation time is lost searching for constants
2) solutions may tend to be bigger
44Some (personal) conclusions on GP (1/3)
- GP is undoubtedly a powerful technique
- Efficient for predicting / classifying .. but
not more than other techniques - Symbolic representation of the created solutions
may help to give good insight into the system
under study .. not only the best solutions are
interesting but also how the population has
evolved over time - GP is a tool to learn knowledge
45Some (personal) conclusions on GP (2/3)
- Powerful tool but ...
- a good knowledge of the application field is
required for choosing the right functions set - prior experience with GP is mandatory to avoid
common mistakes there is no theory to tell us
what to do ! - it tend to create solutions too big to be
analyzable -gt countermeasures should be
implemented - fine-tuning the GP parameters is very
time-consuming
46Some (personal) conclusions on GP (3/3)
- How to analyze the results of GP ?
- efficiency can hardly be predicted, it varies
- from problem to problem
- and from GP run to GP run
- if results are not very positive
- is it because there is no good solution ?
- or GP is not effective and further work is
needed ?
- There are solutions part 3 of the talk
47Part 2 GP for financial trading
48Why GP is an appealing technique for financial
trading ?
- Easy to implement / robust evolutionary technique
- Trading rules (TR) should adapt to a changing
environment GP may simulate this evolution - Solutions are produced under a symbolic form that
can be understood and analyzed - GP may serve as a knowledge discovery tool (e.g.
evolution of the market)
49GP for financial trading
- GP for composing portfolio (not discussed here,
see Lag03 ) - GP for evolving the structure of neural networks
used for prediction (not discussed here, see
GoFe99 ) - GP for predicting price evolution (briefly
discussed here, see Kab02 ) - Most common GP for inducing technical trading
rules
50Predicting price evolution general comments ..
- Long term forecast of stock prices remain a
fantasy Kab02
- Swing trading or intraday trading
CIEF Tutorial 1 by Prof. Fyfe today 1h30 pm !
- 2 excellent starting points
- Kab02 single-day-trading-strategy based on
the forecasted spread - SaTe01 winner of the CEC2000 Dow-Jones
Prediction Prediction t1, t2, t3,, th - a
solution has one tree per forecast horizon
51Predicting price evolution fitness function
- Definition of the fitness function has been shown
to be crucial e.g. SaTe01, there are many
possible
- (Normalized) Mean square error
- Mean Absolute Percentage Error
- (1-?) statistic 1 - MAPE / MAPE-Randow-Walk
- Directional symmetry index (DS)
- DS weighted by the direction and amplitude of
the error -
- Issue a meaningful fitness function is not
always GP friendly
52Inducing technical trading rules
53Steps of the algorithm (1/3)
1. Extracting training time series from the
database
2. Preprocessing cleaning, sampling, averaging,
normalizing,
54Steps of the algorithm (2/3)
3. GP on the training set 3.1 Creation of the
individuals 3.2 Evaluation
Trading Rules Interpreter
Trading Sequence Simulator
3.3 Selection of the individuals
4. Analysis of the evolution statistics, html
files
55Steps of the algorithm (3/3)
5. Evaluate selected individuals on the
validation set
6. Evaluate best individual out-of sample
56GP at work Demo on the Taiwan Capitalization
Weighted Stock Index
57Part 3 Analyzing GP results
58One may cast doubts on GP efficiency ..
- Highly heuristic no theory ! Problems on which
GP has been shown not to be significantly better
than random search - Few clear-cut successes reported in the financial
literature - GP embeds little domain specific knowledge yet ..
- Doubts on the efficiency of GP to use the
available computing time - code bloat
- bad at finding numerical constants
- best solutions are sometimes found very early in
the run .. - Variability of the results ! e.g. returns
-0.160993, 0.0526153, 0.0526153, 0.0526153,
0.0526153, -0.0794787, 0.0526153, -0.0794787,
0.132354, 0.364311, -0.0990995, -0.0794787,
-0.0855786, -0.094433, 0.0464288, -0.140719,
0.0526153, 0.0526153, -0.0746189, 0.418075, .
59 Possible pretest measure of predictability of
the financial time-series
- Actual question how predictable for a given
horizon with a given cost function?
- Serial correlation
- Kolmogorov complexity
- Lyapunov exponent
- Unit root analysis
- Comparison with results on surrogate data
shuffled series (e.g. Kaboudan statistics) - ...
60In practice, some predictability does not imply
profitability ..
- Prediction horizon must be large enough!
- Volatility may not be sufficient to cover
round-trip transactions costs!
- Not the right trading instrument at hand ..
typically short selling not available
61Pretest methodology
- Compare GP with several variants of
- Random search algorithms
- Zero-Intelligence Strategies - ZIS
- Random trading behaviors
- Lottery trading - LT
Issue how to best constrain randomness ?
- Statistical hypotheses testing
- Null GP does not outperform ZIS
- Null GP does not outperform LT
62Pretest 1 GP versus Zero-Intelligence
strategies(Equivalent search intensity Random
Search (ERS) with validation stage)
- Null hypothesis H1,0 GP does not outperform
equivalent random search - Alternative
hypothesis is H1,1
63Pretest 1 GP vs zero-intelligence strategies
ERS
- H1,0 cannot be rejected interpretation
- There is nothing to learn or GP is not very
effective
64Pretest 4 GP vs lottery trading
- Lottery trading (LT) random trading behavior
according the outcome of a r.v. (e.g. Bernoulli
law) - Issue 1 if LT tends to hold positions (short,
long) for less time that GP, transactions costs
may advantage GP .. - Issue 2 it might be an advantage or an
disadvantage for LT to trade much less or much
more than GP. - ex downward oriented market with no short-sell
65Frequency and intensity of a trading strategy
- Frequency average number of transactions per
unit of time - Intensity proportion of time where a position
is held - For pretest 4
- We impose that average frequency and intensity
of LT is equal to the ones of GP - Implementation generate random trading
sequences having the right characteristics
0,0,1,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,1,1,0,1,0,0,0,
0,0,0,1,1,1,1,1,1,
66Pretest 4 implementation
67Answering question 1 is there anything to learn
on the training data at hand ?
68Question 1 pretests involved
- Starting point if a set of search algorithms do
not outperform LT, it gives evidence that there
is nothing to learn .. - Pretest 4 GP vs Lottery Trading
- Null hypothesis H4,0 GP does not outperform LT
- Pretest 5 Equivalent Random Search (ZIS) vs
Lottery Trading - Null hypothesis H5,0 ERS does not outperform LT
69Question 1 some answers ...
- ?R means that the null hypothesis Hi,0 cannot be
rejected R means we should favor Hi,1
H4,0 H5,0 Interpretation
Case 1 ?R ?R
Case 2 R R
Case 3 R ?R
Case 4 ?R R
- there is nothing to learn
there is something to learn
there may be something to learn -ERS might not be
powerful enough
there may be something to learn GP evolution
process is detrimental
70Answering question 2 is GP effective ?
71Question 2 some answers ...
- Question 2 cannot be answered if there is nothing
to learn (case 1) - Case 4 provides us with a negative answer ..
- In case 2 and 3, run pretest 1 GP vs Equivalent
random search - Null hypothesis H1,0 GP does not outperform ERS
- If one cannot reject H1,0 GP shows no evidence
of efficiency
72Pretests at work Methodology Draw conclusions
from pretests using our own programs and compare
with results in the literature ChKuHo06 on the
same time series
73Setup GP control parameters - same as in
ChKuHo06
74Setup statistics, data, trading scheme
- Hypothesis testing with student t-test with a 95
confidence level - Pretests with samples made of 50 GP runs, 50 ERS
runs and 100 LT runs - Data indexes of 3 stock exchanges Canada,
Taiwan and Japan - Daily trading with short selling
- Training of 3 years Validation of 2 years
- Out-of-sample periods 1999-2000, 2001-2002,
2003-2004 - Data normalized with a 250 days moving average
75Results on actual data (1/2)
- Evidence that there is something to learn 4
markets out of 9 (C3,J2,T1,T3) - Experiments in ChKuHo06, with another GP
implementation, show that GP performs very well
on these 4 markets - Evidence that there is nothing to learn 3
(C1,J3,T2) - In ChKuHo06, there is only one (C1) where GP
has positive return (but less than BH)
76Results on actual data (2/2)
- GP effective 3 markets out of 6
- In these 3 markets, GP outperforms Buy and Hold
same outcome as in ChKuHo06 - Preliminary conclusion one can rely on pretests
.. - When there is nothing to learn, no GP
implementation did good (except in one case) - When there is something to learn, at least one
implementation did good (always) - When our GP is effective, GP in ChKuHo06 is
effective too (always)
77Further conclusion
- Our GP implementation is
- is more efficient than random search no case
where ERS outperform LT and GP did not - But only slightly more efficient one would
expect much more cases where GP does better than
LT and not ERS - Our GP is actually able to take advantage of
regularities in data but only of simple ones
78Part 4 Perspectives in the field of GP for
financial trading
79Rethinking fitness functions
From LaPo02
- Fitness functions accumulated return,
risk-adjusted return, - Issue on some problems LaPo02, GP is only
marginally better than random search because
fitness function induces a difficult" landscape
- Come up with GP-friendly fitness functions
80Preprocessing of the data still an open issue
- Studies in forecasting show the importance of
preprocessing for GP, often, normalization with
MA(250) is used - with benefits ChKuHo06 - Length of MA should change according to markets
volatility, regime changes, etc ? - Why not consider MACD, Exponential MA,
differencing, rate of change, log value, FFT,
wavelet,
81Data division scheme
- There is evidence that GP performs poorly when
the characteristics of the training interval are
very different from the out-of-sample interval - Characterization of the current market condition
mean reverting, trend following ... - Relearning on a smaller interval if needed ?
82More extensive tests are needed .. automating the
test
- A comprehensive test for daily indexes done in
ChKuHo06, none exists for individual stocks and
intraday data - Automated testing on several hundred of stocks
is fully feasible but require a software
infrastructure and much computing power
83Ensemble methods combining trading rules
- In ML, ensemble methods have proven to be very
effective - Majority rule tested in ChKuHo06 with some
success - Efficiency requirement accuracy (better than
random) and diversity (uncorrelated errors)
what does it mean for trading rules? - More fine grained selection / weighting scheme
may lead to better results
84Embed more domain specific knowledge
- Black-box algorithms are usually outperformed by
domain-specific algorithms - Domain-specific language is limited as yet
- Enrich primitive set with volume, indexes,
bid/ask spread, - Enrich function set with cross-correlation,
predictability measure,
85References (1/2)
- ChKuHo06 S.-H. Chen and T.-W. Kuo and K.-M.
Hoi. Genetic Programming and Financial Trading
How Much about "What we Know. In 4th NTU
International Conference on Economics, Finance
and Accounting, April 2006. - ChNa06 S.-H. Chen and N. Navet. Pretests for
genetic-programming evolved trading programs
zero-intelligence strategies and lottery
trading, Proc. ICONIP2006. - SiAl03 S. Silva and J. Almeida, Dynamic
Maximum Tree Depth - A Simple Technique for
Avoiding Bloat in Tree-Based GP, GECCO 2003,
LNCS 2724, pp. 17761787, 2003. - Str03 M.J. Streeter, The Root Causes of Code
Growth in Genetic Programming, EuroGP 2003, pp.
443 - 454, 2003. - TeHe04 M.D. Terrio, M. I. Heywood, On Naïve
Crossover Biases with Reproduction for Simple
Solutions to Classification Problems, GECCO
2004, 2004. - ZuPiMa01 G. Zumbach, O.V. Pictet, and O.
Masutti, Genetic Programming with Syntactic
Restrictions applied to Financial Volatility
Forecasting, Olsen Associates, Research
Report, 2001. - EvFe98 M. Evett, T. Fernandez, Numeric
Mutation Improves the Discovery of Numeric
Constants in Genetic Programming, Genetic
Programming 1998 Proceedings of the Third Annual
Conference, 1998.
86References (2/2)
- Kab02 M. Kaboudan, GP Forecasts of Stock
Prices for Profitable Trading, Evolutionary
computation in economics and finance, 2002. - SaTe02 M. Santini, A. Tettamanzi, Genetic
Programming for Financial Series Prediction,
Proceedings of EuroGP'2001, 2001. - BhPiZu02 S. Bhattacharyya, O. V. Pictet, G.
Zumbach, Knowledge-Intensive Genetic Discovery
in Foreign Exchange Markets, IEEE Transactions
on Evolutionary Computation, vol 6, n 2, April
2002. - LaPo02 W.B. Langdon, R. Poli, Fondations of
Genetic Programming, Springer Verlag, 2002. - Kab00 M. Kaboudan, Genetic Programming
Prediction of Stock Prices, Computational
Economics, vol16, 2000. - Wag03 L. Wagman, Stock Portfolio Evaluation
An Application of Genetic-Programming-Based
Technical Analysis, Genetic Algorithms and
Genetic Programming at Stanford 2003, 2003. - GoFe99 W. Golubski and T. Feuring, Evolving
Neural Network Structures by Means of Genetic
Programming, Proceedings of EuroGP'99, 1999. - Dem05 I. Dempsey, Constant Generation for the
Financial Domain using Grammatical Evolution,
Proceedings of the 2005 workshops on Genetic and
evolutionary computation 2005, pp 350 353,
Washington, June 25 - 26, 2005.
87?