Instruction Matrix based Genetic Programming for Classification

About This Presentation

Title:

Description:

Number of Views:53

Avg rating:3.0/5.0

Slides: 18

Provided by: appsrvCs

Category:

more less

Transcript and Presenter's Notes

Title: Instruction Matrix based Genetic Programming for Classification

1
Instruction Matrix based Genetic Programming for
Classification

2
Outline

3
IMGP - representation

4
IMGP-Algorithm
5
Classification

Given a set of training data (xi,ti), we need
to learn a classifier y f(xT), such that the
following error function is minimized
Most learning methods have fixed models, they
learn by changing the parametersT

6
Gradient Descent

It is a numerical method to adjust the arguments
to minimize the objective function
An example in Neural Network

7
GP for Classification

The power (or weakness?) of GP is that it
searches for the optimal structure and parameters
simultaneously
The program tree is a mathematical form with the
constants as the parameters
Can we optimize the constants given a fixed tree
structure?

8
Gradient Descent for GP

Usually gradient descent is used when the
mathematical form is given, so we derive the
updating rule offline
However, we can calculate the gradient by
traversing the program tree recursively

9
Computation Cost

The computation cost is relatively large
We need to calculate the gradient of the internal
nodes besides their values
The constants needs to be updated for a few times
before it stabilizes
Gradient Descent is therefore only applied on the
current best individual for a few steps

10
Program Tree Complexity

How to enhance the generalization?
Occams razor or Minimum Description Length
A penalty of the program tree complexity is added
to the fitness (smaller is better)
fitness fitness w complexity

11
Tree Size?

12
Experiment

13
(No Transcript)
14
(No Transcript)
15
Second Derivative

Intuitively, the ruggedness of the function curve
reflects the function complexity (board)
Analytically, the second derivative of the
function measures the function complexity
(axb)0
(ax2bxc)2a (constant)
(ax3bx2cxd)6ax2b (variable)
fitness fitness w sum(f(xi))

16
A simpler way?