Breeding%20Decision%20Trees%20Using%20Evolutionary%20Techniques - PowerPoint PPT Presentation

About This Presentation
Title:

Breeding%20Decision%20Trees%20Using%20Evolutionary%20Techniques

Description:

To prune the resulting trees. Fast implementations ... Good results for much more problems. No need to tune or derive new algorithms ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 25
Provided by: papagelisa
Category:

less

Transcript and Presenter's Notes

Title: Breeding%20Decision%20Trees%20Using%20Evolutionary%20Techniques


1
Breeding Decision Trees Using Evolutionary
Techniques
  • Papagelis Athanasios - Kalles
    DimitriosComputer Technology Institute AHEAD RM

2
Introduction
  • We use GAs to evolve simple and accurate binary
    decision trees
  • Simple genetic operators over tree structures
  • Experiments with UCI datasets
  • very good size
  • competitive accuracy results
  • Experiments with synthetic datasets
  • Superior accuracy results

3
Current tree induction algorithms
  • .. Use greedy heuristics
  • To guide search during tree building
  • To prune the resulting trees
  • Fast implementations
  • Accurate results on widely used benchmark
    datasets (like UCI datasets)
  • Optimal results ?
  • No
  • Good for real world problems?
  • There are not many real world datasets available
    for research.

4
More on greedy heuristics
  • They can quickly guide us to desired solutions
  • On the other hand they can substantially deviate
    from optimal
  • WHY?
  • They are very strict
  • Which means that they are VERY GOOD just for a
    limited problem space

5
Why GAs should work ?
  • GAs are not
  • Hill climbers
  • Blind on complex search spaces
  • Exhaustive searchers
  • Extremely expensive
  • They are
  • Beam searchers
  • They balance between time needed and space
    searched
  • Application on bigger problem space
  • Good results for much more problems
  • No need to tune or derive new algorithms

6
Another way to see it..
  • Biases
  • Preference bias
  • Characteristics of output
  • We should choose about it
  • e.g small trees
  • Procedural bias
  • How we will search?
  • We should not choose about it
  • Unfortunately we have to
  • Greedy heuristics make strong hypotheses about
    search space
  • GAs make weak hypotheses about search space

7
The real world question
  • Are there datasets where hill-climbing techniques
    are really inadequate ?
  • e.g unnecessarily big misguiding output
  • Yes there are
  • Conditionally dependent attributes
  • e.g XOR
  • Irrelevant attributes
  • Many solutions that use GAs as a preprocessor so
    as to select adequate attributes
  • Direct genetic search can be proven more
    efficient for those datasets

8
The proposed solution
  • Select the desired decision tree characteristics
    (e.g small size)
  • Adopt a decision tree representation with
    appropriate genetic operators
  • Create an appropriate fitness function
  • Produce a representative initial population
  • Evolve for as long as you wish!

9
Initialization procedure
  • Population of minimum decision trees
  • Simple and fast
  • Choose a random value as test value
  • Choose two random classes as leaves

A2
Class2
Class1
10
Genetic operators
11
Payoff function
  • Balance between accuracy and size
  • set x depending on the desired output
    characteristics.
  • Small Trees ? ? x near 1
  • Emphasis on accuracy ? ? x grows big

12
Advanced System Characteristics
  • Scalled payoff function (Goldberg, 1989)
  • Alternative crossovers
  • Evolution towards fit subtrees
  • Accurate subtrees had less chance to be used for
    crossover or mutation.
  • Limited Error Fitness (LEF) (Gathercole Ross,
    1997)
  • significant CPU timesavings and insignificant
    accuracy loses

13
Second Layer GA
  • Test the effectiveness of all those components
  • coded information about the mutation/crossover
    rates and different heuristics as well as a
    number of other optimizing parameters
  • Most recurring results
  • mutation rate 0.005
  • crossover rate 0.93
  • use a crowding avoidance technique
  • Alternative crossover/mutation techniques did not
    produce better results than basic
    crossover/mutation

14
Search space / Induction costs
  • 10 leaves,6 values,2 classes
  • Search space gt50,173,704,142,848 (HUGE!)
  • Greedy feature selection
  • O(ak) aattributes,kinstances (Quinlan 1986)
  • O(a2k2) one level lookahead (Murthy and Salzberg,
    1995)
  • O(adkd) for d-1 levels of lookahead
  • Proposed heuristic
  • O(gen k2a).
  • Extended heuristic
  • O(genka)

15
How it works? An example (a)
  • An artificial dataset with eight rules (26
    possible value, three classes)
  • First two activation-rules as below
  • (15.0 ) c1 ? A(a or b or t) B(a or h or q or
    x)
  • (14.0) c1 ? B(f or l or s or w) C(c or e or
    f or k)
  • Huge Search Space !!!

16
How it works? An example (b)
17
Illustration of greedy heuristics problem
  • An example dataset (XOR over A1A2)

A1 A2 A3 Class
T F T T
T F F T
F T F T
F T T T
F F F F
F F F F
T T T F
T T F T
18
C4.5 result tree

A3t
A1t
A2f
A2t
t
f
t
f
Totally unacceptable!!!
19
More experiments towards this direction
Name Attrib. Class Function Noise Instanc. Random Attributes
Xor1 10 (A1 xor A2) or (A3 xor A4) No 100 6
Xor2 10 (A1 xor A2) xor (A3 xor A4) No 100 6
Xor3 10 (A1 xor A2) or (A3 and A4) or (A5 and A6) 10 class error 100 4
Par1 10 Three attributes parity problem No 100 7
Par2 10 Four attributes parity problem No 100 6
20
Results for artificial datasets
C4.5 GATree
Xor1 6712.04 1000
Xor2 5318.57 9017.32
Xor3 796.52 788.37
Par1 7024,49 1000
Par2 636.71 857.91
21
Results for UCI datasets
22
C4.5 / OneR deficiencies
  • Similar preference biases
  • Accurate, small decision trees
  • This is acceptable
  • Not optimized procedural biases
  • Emphasis on accuracy (C4.5)
  • Not optimized trees size
  • Emphasis on size (OneR)
  • Trivial search policy
  • Pruning as a greedy heuristic has similar
    disadvantages

23
Future work
  • Minimize evolution time
  • crossover/mutation operators change the tree from
    a node downwards
  • we can classify only the instances that belong to
    the changed-nodes subtree.
  • But we need to maintain more node statistics

24
Future work (2)
  • Choose the output class using a majority vote
    over the produced tree forest (experts voting)
  • Pruning is a greedy heuristic
  • A GAs pruning?
Write a Comment
User Comments (0)
About PowerShow.com