Title: Breeding%20Decision%20Trees%20Using%20Evolutionary%20Techniques
1Breeding Decision Trees Using Evolutionary
Techniques
- Papagelis Athanasios - Kalles
DimitriosComputer Technology Institute AHEAD RM
2Introduction
- We use GAs to evolve simple and accurate binary
decision trees - Simple genetic operators over tree structures
- Experiments with UCI datasets
- very good size
- competitive accuracy results
- Experiments with synthetic datasets
- Superior accuracy results
3Current tree induction algorithms
- .. Use greedy heuristics
- To guide search during tree building
- To prune the resulting trees
- Fast implementations
- Accurate results on widely used benchmark
datasets (like UCI datasets) - Optimal results ?
- No
- Good for real world problems?
- There are not many real world datasets available
for research.
4More on greedy heuristics
- They can quickly guide us to desired solutions
- On the other hand they can substantially deviate
from optimal - WHY?
- They are very strict
- Which means that they are VERY GOOD just for a
limited problem space
5Why GAs should work ?
- GAs are not
- Hill climbers
- Blind on complex search spaces
- Exhaustive searchers
- Extremely expensive
- They are
- Beam searchers
- They balance between time needed and space
searched - Application on bigger problem space
- Good results for much more problems
- No need to tune or derive new algorithms
6Another way to see it..
- Biases
- Preference bias
- Characteristics of output
- We should choose about it
- e.g small trees
- Procedural bias
- How we will search?
- We should not choose about it
- Unfortunately we have to
- Greedy heuristics make strong hypotheses about
search space - GAs make weak hypotheses about search space
7The real world question
- Are there datasets where hill-climbing techniques
are really inadequate ? - e.g unnecessarily big misguiding output
- Yes there are
- Conditionally dependent attributes
- e.g XOR
- Irrelevant attributes
- Many solutions that use GAs as a preprocessor so
as to select adequate attributes - Direct genetic search can be proven more
efficient for those datasets
8The proposed solution
- Select the desired decision tree characteristics
(e.g small size) - Adopt a decision tree representation with
appropriate genetic operators - Create an appropriate fitness function
- Produce a representative initial population
- Evolve for as long as you wish!
9Initialization procedure
- Population of minimum decision trees
- Simple and fast
- Choose a random value as test value
- Choose two random classes as leaves
A2
Class2
Class1
10Genetic operators
11Payoff function
- Balance between accuracy and size
- set x depending on the desired output
characteristics. - Small Trees ? ? x near 1
- Emphasis on accuracy ? ? x grows big
-
12Advanced System Characteristics
- Scalled payoff function (Goldberg, 1989)
- Alternative crossovers
- Evolution towards fit subtrees
- Accurate subtrees had less chance to be used for
crossover or mutation. - Limited Error Fitness (LEF) (Gathercole Ross,
1997) - significant CPU timesavings and insignificant
accuracy loses
13Second Layer GA
- Test the effectiveness of all those components
- coded information about the mutation/crossover
rates and different heuristics as well as a
number of other optimizing parameters - Most recurring results
- mutation rate 0.005
- crossover rate 0.93
- use a crowding avoidance technique
- Alternative crossover/mutation techniques did not
produce better results than basic
crossover/mutation
14Search space / Induction costs
- 10 leaves,6 values,2 classes
- Search space gt50,173,704,142,848 (HUGE!)
- Greedy feature selection
- O(ak) aattributes,kinstances (Quinlan 1986)
- O(a2k2) one level lookahead (Murthy and Salzberg,
1995) - O(adkd) for d-1 levels of lookahead
- Proposed heuristic
- O(gen k2a).
- Extended heuristic
- O(genka)
15How it works? An example (a)
- An artificial dataset with eight rules (26
possible value, three classes) - First two activation-rules as below
- (15.0 ) c1 ? A(a or b or t) B(a or h or q or
x) - (14.0) c1 ? B(f or l or s or w) C(c or e or
f or k) - Huge Search Space !!!
16How it works? An example (b)
17Illustration of greedy heuristics problem
- An example dataset (XOR over A1A2)
A1 A2 A3 Class
T F T T
T F F T
F T F T
F T T T
F F F F
F F F F
T T T F
T T F T
18C4.5 result tree
A3t
A1t
A2f
A2t
t
f
t
f
Totally unacceptable!!!
19More experiments towards this direction
Name Attrib. Class Function Noise Instanc. Random Attributes
Xor1 10 (A1 xor A2) or (A3 xor A4) No 100 6
Xor2 10 (A1 xor A2) xor (A3 xor A4) No 100 6
Xor3 10 (A1 xor A2) or (A3 and A4) or (A5 and A6) 10 class error 100 4
Par1 10 Three attributes parity problem No 100 7
Par2 10 Four attributes parity problem No 100 6
20Results for artificial datasets
C4.5 GATree
Xor1 6712.04 1000
Xor2 5318.57 9017.32
Xor3 796.52 788.37
Par1 7024,49 1000
Par2 636.71 857.91
21Results for UCI datasets
22C4.5 / OneR deficiencies
- Similar preference biases
- Accurate, small decision trees
- This is acceptable
- Not optimized procedural biases
- Emphasis on accuracy (C4.5)
- Not optimized trees size
- Emphasis on size (OneR)
- Trivial search policy
- Pruning as a greedy heuristic has similar
disadvantages
23Future work
- Minimize evolution time
- crossover/mutation operators change the tree from
a node downwards - we can classify only the instances that belong to
the changed-nodes subtree. - But we need to maintain more node statistics
24Future work (2)
- Choose the output class using a majority vote
over the produced tree forest (experts voting) - Pruning is a greedy heuristic
- A GAs pruning?