Breeding%20Decision%20Trees%20Using%20Evolutionary%20Techniques - PowerPoint PPT Presentation

About This Presentation

Title:

Breeding%20Decision%20Trees%20Using%20Evolutionary%20Techniques

Description:

To prune the resulting trees. Fast implementations ... Good results for much more problems. No need to tune or derive new algorithms ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 25

Provided by: papagelisa

Category:

more less

Transcript and Presenter's Notes

Title: Breeding%20Decision%20Trees%20Using%20Evolutionary%20Techniques

1
Breeding Decision Trees Using Evolutionary
Techniques

Papagelis Athanasios - Kalles
DimitriosComputer Technology Institute AHEAD RM

2
Introduction

We use GAs to evolve simple and accurate binary
decision trees
Simple genetic operators over tree structures
Experiments with UCI datasets
very good size
competitive accuracy results
Experiments with synthetic datasets
Superior accuracy results

3
Current tree induction algorithms

.. Use greedy heuristics
To guide search during tree building
To prune the resulting trees
Fast implementations
Accurate results on widely used benchmark
datasets (like UCI datasets)
Optimal results ?
No
Good for real world problems?
There are not many real world datasets available
for research.

4
More on greedy heuristics

They can quickly guide us to desired solutions
On the other hand they can substantially deviate
from optimal
WHY?
They are very strict
Which means that they are VERY GOOD just for a
limited problem space

5
Why GAs should work ?

GAs are not
Hill climbers
Blind on complex search spaces
Exhaustive searchers
Extremely expensive
They are
Beam searchers
They balance between time needed and space
searched
Application on bigger problem space
Good results for much more problems
No need to tune or derive new algorithms

6
Another way to see it..

Biases
Preference bias
Characteristics of output
We should choose about it
e.g small trees
Procedural bias
How we will search?
We should not choose about it
Unfortunately we have to
Greedy heuristics make strong hypotheses about
search space
GAs make weak hypotheses about search space

7
The real world question

Are there datasets where hill-climbing techniques
are really inadequate ?
e.g unnecessarily big misguiding output
Yes there are
Conditionally dependent attributes
e.g XOR
Irrelevant attributes
Many solutions that use GAs as a preprocessor so
as to select adequate attributes
Direct genetic search can be proven more
efficient for those datasets

8
The proposed solution

Select the desired decision tree characteristics
(e.g small size)
Adopt a decision tree representation with
appropriate genetic operators
Create an appropriate fitness function
Produce a representative initial population
Evolve for as long as you wish!

9
Initialization procedure

Population of minimum decision trees
Simple and fast
Choose a random value as test value
Choose two random classes as leaves

A2
Class2
Class1
10
Genetic operators
11
Payoff function

Balance between accuracy and size

set x depending on the desired output
characteristics.
Small Trees ? ? x near 1
Emphasis on accuracy ? ? x grows big

12
Advanced System Characteristics

Scalled payoff function (Goldberg, 1989)
Alternative crossovers
Evolution towards fit subtrees
Accurate subtrees had less chance to be used for
crossover or mutation.
Limited Error Fitness (LEF) (Gathercole Ross,
1997)
significant CPU timesavings and insignificant
accuracy loses

13
Second Layer GA

Test the effectiveness of all those components
coded information about the mutation/crossover
rates and different heuristics as well as a
number of other optimizing parameters
Most recurring results
mutation rate 0.005
crossover rate 0.93
use a crowding avoidance technique
Alternative crossover/mutation techniques did not
produce better results than basic
crossover/mutation

14
Search space / Induction costs

10 leaves,6 values,2 classes
Search space gt50,173,704,142,848 (HUGE!)
Greedy feature selection
O(ak) aattributes,kinstances (Quinlan 1986)
O(a2k2) one level lookahead (Murthy and Salzberg,
1995)
O(adkd) for d-1 levels of lookahead
Proposed heuristic
O(gen k2a).
Extended heuristic
O(genka)

15
How it works? An example (a)

An artificial dataset with eight rules (26
possible value, three classes)
First two activation-rules as below
(15.0 ) c1 ? A(a or b or t) B(a or h or q or
x)
(14.0) c1 ? B(f or l or s or w) C(c or e or
f or k)
Huge Search Space !!!

16
How it works? An example (b)
17
Illustration of greedy heuristics problem

An example dataset (XOR over A1A2)

A1 A2 A3 Class
T F T T
T F F T
F T F T
F T T T
F F F F
F F F F
T T T F
T T F T
18
C4.5 result tree

A3t
A1t
A2f
A2t
t
f
t
f
Totally unacceptable!!!
19
More experiments towards this direction
Name Attrib. Class Function Noise Instanc. Random Attributes
Xor1 10 (A1 xor A2) or (A3 xor A4) No 100 6
Xor2 10 (A1 xor A2) xor (A3 xor A4) No 100 6
Xor3 10 (A1 xor A2) or (A3 and A4) or (A5 and A6) 10 class error 100 4
Par1 10 Three attributes parity problem No 100 7
Par2 10 Four attributes parity problem No 100 6
20
Results for artificial datasets
C4.5 GATree
Xor1 6712.04 1000
Xor2 5318.57 9017.32
Xor3 796.52 788.37
Par1 7024,49 1000
Par2 636.71 857.91
21
Results for UCI datasets
22
C4.5 / OneR deficiencies