Title: CSE 634 Data Mining Concepts
1CSE 634Data Mining Concepts TechniquesProf
Anita WasilewskaGenetic Algorithms (GAs)By
Group 1Abhishek Sharma, Mikhail Rubnich, George
Iordache, Marcela Boboila
2General descriptionof the method
By Abhishek Sharma
3References
- DATA MINING Concepts and Techniques Jiawei
Han, Micheline Kamber Morgan Kaufman Publishers,
2003 - Data Mining Techniques Class Lecture Notes and
PP Slides. - http//cs.felk.cvut.cz/xobitko/ga/
- Massachusetts Institute of Technology - Prof. de
Weck and Prof. Willcox, Multidisciplinary System
Design Optimization Course Lecture Notes on
Heuristic Techniques, A Basic Introduction to
Genetic Algorithms http//ocw.mit.edu/NR/rdonlyr
es/Aeronautics-and-Astronautics/16-888Spring-2004/
D66C4396-90C8-49BE-BF4A-4EBE39CEAE6F/0/MSDO_L11_GA
.pdf
4History of Genetic Algorithms
- Evolutionary Computing was introduced in the
1960s by I. Rechenberg. - Professor John Holland at the University of
Michigan came up with book "Adaptation in Natural
and Artificial Systems" explored the concept of
using mathematically-based artificial evolution
as a method to conduct a structured search for
solutions to complex problems. - Dr. David E. Goldberg. In his 1989 landmark text
"Genetic Algorithms in Search, Optimization and
Machine Learning, suggested applications for
genetic algorithms in a wide range of engineering
fields.
5What Are Genetic Algorithms (GAs)?
- Genetic Algorithms are search and optimization
techniques based on Darwins Principle of Natural
Selection. - problems are solved by an evolutionary process
resulting in a best (fittest) solution (survivor)
, - -In Other words, the solution is evolved
- 1. Inheritance Offspring acquire
characteristics - 2. Mutation Change, to avoid similarity
- 3. Natural Selection Variations improve
survival - 4. Recombination - Crossover
6Genetics
- Chromosome
- All Living organisms consists of cells. In each
cell there is a same set of Chromosomes. - Chromosomes are strings of DNA and consists of
genes, blocks of DNA. - Each gene encodes a trait, for example color of
eyes. - Reproduction
- During reproduction, recombination (or crossover)
occurs first. Genes from parents combine to form
a whole new chromosome. The newly created
offspring can then be mutated. The changes are
mainly caused by errors in copying genes from
parents. -
- The fitness of an organism is measure by success
of the organism in its life (survival)
Citation http//ocw.mit.edu/NR/rdonlyres/Aeronau
tics-and-Astronautics/16-888Spring-2004/D66C4396-9
0C8-49BE-BF4A-4EBE39CEAE6F/0/MSDO_L11_GA.pdf
7Principle Of Natural Selection
- Select The Best, Discard The Rest
- Two important elements required for any problem
before a genetic algorithm can be used for a
solution are - Method for representing a solution (encoding)
- ex string of bits, numbers, character
- Method for measuring the quality of any proposed
solution, using fitness function - ex Determining total weight
8GA Elements
Citation http//ocw.mit.edu/NR/rdonlyres/Aeronau
tics-and-Astronautics/16-888Spring-2004/D66C4396-9
0C8-49BE-BF4A-4EBE39CEAE6F/0/MSDO_L11_GA.pdf
9Search Space
- If we are solving some problem, we are usually
looking for some solution, which will be the best
among others. The space of all feasible solutions
(it means objects among those the desired
solution is) is called search space (also state
space). Each point in the search space represent
one feasible solution. Each feasible solution can
be "marked" by its value or fitness for the
problem. - Initialization
- Initially many individual solutions are randomly
generated to form an initial population, covering
the entire range of possible solutions (the
search space) - Each point in the search space represents one
possible solution marked by its value( fitness) - Selection
- A proportion of the existing population is
selected to bread a new bread of generation. - Reproduction
- Generate a second generation population of
solutions from those selected through genetic
operators crossover and mutation. - Termination
- A solution is found that satisfies minimum
criteria - Fixed number of generations found
- Allocated budget (computation, time/money)
reached - The highest ranking solutions fitness is
reaching or has reached
10Methodology Associated with GAs
Begin
Initialize population
Evaluate Solutions
T 0 (first step)
Optimum Solution?
N
Selection
Y
TT1 (go to next step)
Stop
Crossover
Mutation
Citation http//cs.felk.cvut.cz/xobitko/ga/
11Creating a GA on Computer
Simple_Genetic_Algorithm() Initialize the
Population Calculate Fitness Function While(F
itness Value ! Optimal Value) Selection//Na
tural Selection, Survival Of Fittest Crossover
//Reproduction, Propagate favorable
characteristics Mutation//Mutation Calculate
Fitness Function
12Nature Vs Computer - Mapping
13Encoding
- The process of representing the solution in the
form of a string that conveys the necessary
information. - Just as in a chromosome, each gene controls a
particular characteristic of the individual,
similarly, each element in the string represents
a characteristic of the solution.
14Encoding Methods
- Binary Encoding Most common method of encoding.
Chromosomes are strings of 1s and 0s and each
position in the chromosome represents a
particular characteristic of the problem. - Permutation Encoding Useful in ordering
problems such as the Traveling Salesman Problem
(TSP). Example. In TSP, every chromosome is a
string of numbers, each of which represents a
city to be visited.
15Encoding Methods (contd.)
- Value Encoding Used in problems where
complicated values, such as real numbers, are
used and where binary encoding would not suffice. - Good for some problems, but often necessary
to develop some specific crossover and mutation
techniques for these chromosomes.
16Encoding Methods (contd.)
- Tree Encoding This encoding is used mainly for
evolving programs or expressions, i.e. for
Genetic programming. - Tree Encoding - every chromosome is a tree of
some objects, such as values/arithmetic operators
or commands in a programming language.
( x ( / 5 y ) )
( do_until step wall )
Citation http//ocw.mit.edu/NR/rdonlyres/Aeronau
tics-and-Astronautics/16-888Spring-2004/D66C4396-9
0C8-49BE-BF4A-4EBE39CEAE6F/0/MSDO_L11_GA.pdf
17GA Operators
By Mikhail Rubnich
18References
- DATA MINING Concepts and Techniques Jiawei
Han, Micheline Kamber Morgan Kaufman Publishers,
2003 - http//www.ai-junkie.com/ga/intro/gat2.html
- http//www.faqs.org/faqs/ai-faq/genetic/part2/
- http//en.wikipedia.org/wiki/Genetic_algorithms
19Citation http//www.ewh.ieee.org/soc/es/May2001/1
4/GA.GIF
20Basic GA Operators
- Recombination
-
- Crossover - Looking for solutions near
existing solutions -
- Mutation - Looking at completely new
areas of search space
21Fitness function
- quantifies the optimality of a solution (that is,
a chromosome) that particular chromosome may be
ranked against all the other chromosomes - A fitness value is assigned to each solution
depending on how close it actually is to solving
the problem. - Ideal fitness function correlates closely to goal
quickly computable. - For instance, knapsack problem
- Fitness Function Total value of the things in
the knapsack
22Recombination
- Main idea "Select The Best, Discard The Rest.
-
- The process that chooses solutions to be
preserved and allowed to reproduce and selects
which ones must to die out. - The main goal of the recombination operator is
to emphasize the good solutions and eliminate the
bad solutions in a population ( while keeping the
population size constant )
23So, how to select the best?
- Roulette Selection
- Rank Selection
- Steady State Selection
- Tournament Selection
24Roulette wheel selection
- Main idea the fitter is the solution with
the most chances to be chosen - HOW IT WORKS ?
25Example of Roulette wheel selection
Citation www.cs.vu.nl/gusz/
26Roulette wheel selection
All you have to do is spin the ball and grab the
chromosome at the point it stops ?
27Crossover
- Main idea combine genetic material ( bits ) of
2 parent chromosomes ( solutions ) and
produce a new child possessing characteristics
of both parents. -
- How it works ?
-
- Several methods .
-
28Crossover methods
- Single Point Crossover- A random point is chosen
on the individual chromosomes (strings) and the
genetic material is exchanged at this point. -
- Citation http//www.ewh.ieee.org/soc/es/May2001/
14/CROSS0.GIF
29Crossover methods
- Two-Point Crossover- Two random points are chosen
on the individual chromosomes (strings) and the
genetic material is exchanged at these points.
NOTE These chromosomes are different from the
last example.
30Crossover methods
- Uniform Crossover- Each gene (bit) is selected
randomly from one of the corresponding genes of
the parent chromosomes.
NOTE Uniform Crossover yields ONLY 1 offspring.
31Crossover (contd.)
- Crossover between 2 good solutions MAY NOT ALWAYS
yield a better or as good a solution. - Since parents are good, probability of the child
being good is high. - If offspring is not good (poor solution), it will
be removed in the next iteration during
Selection.
32Elitism
- Main idea copy the best chromosomes (solutions)
to new population before applying crossover and
mutation -
- When creating a new population by crossover or
mutation the best chromosome might be lost. - Forces GAs to retain some number of the best
individuals at each generation. - Has been found that elitism significantly
improves performance.
33Mutation
- Main idea random inversion of bits in
solution to maintain diversity in population set -
-
- Ex. giraffes - mutations could be beneficial.
-
- Citation http//www.ewh.ieee.org/soc/es/May2001/1
4/MUTATE0.GIF
34Advantages and disadvantages
- Advantages
- Always an answer answer gets better with time
- Good for noisy environments
- Inherently parallel easily distributed
- Issues
- Performance
- Solution is only as good as the evaluation
function (often hardest part) - Termination Criteria
35Applications - Genetic programming and data
mining
By George Iordache
36- A.A. Freitas. A survey of evolutionary
algorithms for data mining and knowledge
discovery, Pontificia Universidade Catolica do
Parana, Brazil. In A. Ghosh and S. Tsutsui,
editors, Advances in Evolutionary Computation,
pages 819--845. Springer-Verlag,
2002.http//citeseer.ist.psu.edu/cache/papers/cs/
23050/httpzSzzSzwww.ppgia.pucpr.brzSzalexzSzpub_
papers.dirzSzAdvEC-bk.pdf/freitas01survey.pdf - Anita Wasilewska, Course Lecture Notes (2007 and
previous years) on Classification (Data Mining
book Chapters 5 and 7) - - http//www.cs.sunysb.edu/cse634/lecture_notes/07
classification.pdf - J. Han, and M. Kamber. Data Mining Concepts
and Techniques 2nd ed., Morgan Kaufmann
Publishers, March 2006. ISBN 1-55860-901-6 - R. Mendes, F. Voznika, A. Freitas, and J.
Nievola. Discovering fuzzy classification rules
with genetic programming and co-evolution,
Pontificia Universidade Catolica do Parana,
Brazil. In L. de Raedt and A. Siebes, editors,
5th European Conference on Principles and
Practice of Knowledge Discovery in Databases
(PKDD'01), volume 2168 of LNAI, pages 314--325.
Springer Verlag, 2001. http//citeseer.ist.psu.edu
/cache/papers/cs/23050/httpzSzzSzwww.ppgia.pucpr.
brzSzalexzSzpub_papers.dirzSzPKDD-2001.pdf/mendes
01discovering.pdf - John R. Koza, Medical Informatics, Department of
Medicine, Department of Electrical Engineering,
Stanford University, Genetic algorithms and
genetic programming, Lecture notes, 2003. - www.genetic-programming.com/c2003lecture1modified
.ppt
37Genetic Programming
- A program in C
- int foo (int time)
-
- int temp1, temp2
- if (time gt 10)
- temp1 3
- else
- temp1 4
- temp2 temp1 1 2
- return (temp2)
-
- Equivalent expression (similar to a
classification rule in data mining) - ( 1 2 (IF (gt TIME 10) 3 4))
Citation www.genetic-programming.com/c2003lecture
1modified.ppt
38Program tree
( 1 2 (IF (gt TIME 10) 3 4))
Citation www.genetic-programming.com/c2003lecture
1modified.ppt
39Given data
Citation www.genetic-programming.com/c2003lecture
1modified.ppt
40Problem description
Citation www.genetic-programming.com/c2003lecture
1modified.ppt
41Generation 0
Population of 4 randomly created individuals
x
x 1
x2 1
2
Citation examples taken from www.genetic-program
ming.com/c2003lecture1modified.ppt
42S
S
S
S
Fitness
4.40
6.00
9.48
15.40
Best in Gen 0
43Mutation
Mutation picking 2 as mutation point
/
Citation part of the pictures used as examples
are taken from www.genetic-programming.com/c2003l
ecture1modified.ppt
44Crossover
Crossover picking subtree and leftmost x
as crossover points
Citation example taken from www.genetic-programm
ing.com/c2003lecture1modified.ppt
45Generation 1
/
Citation part of the examples is taken from
www.genetic-programming.com/c2003lecture1modified.
ppt
46S
S
S
S
Fitness
4.40
6.00
15.40
0.00
Found!
47GA Classification
Classify customers based on number of children
and salary
Citation data table is taken from prof. Anita
Wasilewska previous years course slides
48GA Classification Rules
- A classification rule is of the form (the rule is
in a predicate form see course lectures) - IF formula THEN classci
- Antecedent Consequence
49Formula representation
- Possible rule
- If (NOC 2) AND ( S gt 80000) then GOOD
(customer)
Formula
Class
AND
gt
NOC
2
S
80000
Citation the example is taken from prof. Anita
Wasilewska previous years course slides
50Initial data table
51Initial data (written as rules inferred from the
initial table)
- Rule 1 If (NOC 2) AND ( S gt 80000) then C
GOOD - Rule 2 If (NOC 1) AND ( S gt 30000) then C
GOOD - Rule 3 If (NOC 0) AND ( S 50000) then C
GOOD - Rule 4 If (NOC gt 2) AND ( S lt 10000) then C
BAD - Rule 5 If (NOC 10) AND ( S 30000) then C
BAD - Rule 6 If (NOC 5) AND ( S lt 30000) then C
BAD
52Generation 0
- Population of 3 randomly created individuals
- If (NOC gt 3) AND ( S gt 10000) then C GOOD
- If (NOC gt 1) AND ( S gt 30000) then C GOOD
- If (NOC gt 0) AND ( S lt 40000) then C GOOD
- We want to find a more general (if it is possible
the most general) characteristic description
for class GOOD gt assign predicted class GOOD for
all individuals
53Generation 0
AND
Individual 1
gt
gt
NOC
3
S
10000
(NOC gt 3) AND ( S gt 10000)
AND
AND
Individual 2
Individual 3
gt
lt
gt
gt
NOC
0
S
40000
NOC
1
S
30000
(NOC gt 0) AND ( S lt 40000)
(NOC gt 1) AND ( S gt 30000)
54Fitness function
- For one rule (IF A THEN C)
- CF (Confidence factor)
- A number of records that satisfy A
- AUC number of records that satisfy A and are
in predicted class C
AUC A
Citation the confidence formula is taken from
class slides http//www.cs.sunysb.edu/cse634/lec
ture_notes/07association.pdf
55Fitness function Generation 0
- Rule 1 If (NOC 2) AND ( S gt 80000) then GOOD
- Rule 2 If (NOC 1) AND ( S gt 30000) then GOOD
- Rule 3 If (NOC 0) AND ( S 50000) then GOOD
- Rule 4 If (NOC gt 2) AND ( S lt 10000) then BAD
- Rule 5 If (NOC 10) AND ( S 30000) then BAD
- Rule 6 If (NOC 5) AND ( S lt 30000) then BAD
- Fitness of Individual 1 If (NOC gt 3) AND ( S gt
10000) then GOOD - A 2 (Rule 5 6), AUC 0, CF 0 /
2 0 - Fitness of Individual 2 If (NOC gt 1) AND ( S gt
30000) then GOOD - A 1 (Rule 1), AUC 1, CF 1 / 1
1 - Fitness of Individual 3 If (NOC gt 0) AND ( S lt
40000) then GOOD - A 4 (Rule 2 4 5 6), AUC 1,
CF 1 / 4 0.25
Best in Gen 0
56Mutation
Mutation
AND
AND
gt
lt
gt
lt
NOC
0
S
40000
NOC
0
S
90000
(NOC gt 0) AND ( S lt 40000)
(NOC gt 0) AND ( S lt 90000)
57Crossover
AND
AND
gt
gt
gt
lt
S
30000
S
40000
NOC
1
1
NOC
(NOC gt 1) AND ( S lt 40000)
(NOC gt 1) AND ( S gt 30000)
Crossover
AND
AND
gt
lt
gt
gt
NOC
0
S
40000
NOC
0
S
30000
(NOC gt 0) AND ( S lt 40000)
(NOC gt 0) AND ( S gt 30000)
58Generation 1
AND
Individual 1
AND
Individual 2
gt
gt
gt
lt
NOC
0
S
30000
S
40000
1
NOC
(NOC gt 1) AND ( S lt 40000)
(NOC gt 0) AND ( S gt 30000)
AND
Individual 3
gt
lt
NOC
0
S
90000
(NOC gt 0) AND ( S lt 90000)
59Fitness function Generation 1
- Rule 1 If (NOC 2) AND ( S gt 80000) then GOOD
- Rule 2 If (NOC 1) AND ( S gt 30000) then GOOD
- Rule 3 If (NOC 0) AND ( S 50000) then GOOD
- Rule 4 If (NOC gt 2) AND ( S lt 10000) then BAD
- Rule 5 If (NOC 10) AND ( S 30000) then BAD
- Rule 6 If (NOC 5) AND ( S lt 30000) then BAD
- Individual 1 If (NOC gt 1) AND ( S lt 40000) then
GOOD - A 2 (Rule 4 5 6), AC 0, CF
0 / 2 0 - Individual 2 If (NOC gt 0) AND ( S gt 30000) then
GOOD - A 3 (Rule 1 2 3), AC 3, CF
3 / 3 1 - Individual 3 If (NOC gt 0) AND ( S lt 90000) then
GOOD - A 5 (Rule 1 2 4 5 6), AC
1, CF 1 / 5 0.2
Best in Gen 1
60GA Operators on Rules Flockhartss paper
approach
By Marcela Boboila
61- I.W. Flockhart and N.J. Radcliffe. GA-MINER
parallel data mining with hierarchical genetic
algorithms - final report. EPCC-AIKMS-GAMINER
-Report 1.0. University of Edinburgh, UK, 1995. - http//coblitz.codeen.org3125/citeseer.ist.psu.e
du/cache/papers/cs/3487/httpzSzzSzwww.quadstone.c
o.ukzSzianzSzaikmszSzreport.pdf/flockhart95gamine
r.pdf - I. W. Flockhart and N. J. Radcliffe, "A genetic
algorithm-based approach to data mining," in The
Second International Conference on Knowledge
Discovery and Data Mining (KDD-96), (Portland,
OR), p. 299-302, AAAI Press, Aug. 2-4 1996. - http//citeseer.ist.psu.edu/cache/papers/cs/3487/
httpzSzzSzwww.quadstone.co.ukzSzianzSzaikmszSzkd
d96a.pdf/flockhart96genetic.pdf
62From rules to subset descriptions
- Step 1 We have the following rules, that
describe part of the data table - Rule 1 A1 gt C
- Rule 2 A2 gt C
-
- Rule n An gt C
- Step 2 (A1 U A2 U An) gt C
- Step 3 We look only at the antecedent to get the
subset description - (A1 U A2 U An)
63Part of the data table. An example
Rule 1 If Age 20 .. 30 AND Hobby dancing
then GOOD Rule 2 If Age 25 .. 55 AND Hobby
reading then GOOD
A1
C
A2
C
A1 U A2 C
64From rules to subset descriptions. An example
- Step 1 We have the rules
- Rule 1 If Age 20 .. 30 AND Hobby dancing
then GOOD - Rule 2 If Age 25 .. 55 AND Hobby reading
then GOOD - Step 2 We combine the antecedent part to form a
single rule describing the subset of
individuals in the same class - If ((Age 20 .. 30 AND Hobby dancing) OR (Age
25 .. 55 AND Hobby reading)) then GOOD - Step 3 subset description antecedent part
- Age 20 .. 30 AND Hobby dancing OR Age 25
.. 55 AND Hobby reading
65Subset description
- or
- and and
- Age 20 .. 30 Hobby dancing
Age 25 .. 55 Hobby reading
Term
Clause
66Subset description
- Chromosomes represented as subset descriptions.
- Subsets consist of disjunction and conjunction of
attribute value or attribute range constraints - Subset Description Clause or Clause
- Clause Term
and Term - Term Attribute in Value
Set - Attribute in Range
- E.g. Age 20 .. 30 and Hobby dancing or
Age 25 .. 55 and Hobby reading
67Crossover
- Apply crossover at all levels, successively
- Subset description crossover
- Clause crossover (uniform or single-point)
- Term crossover
68Subset description crossover
Clause A1
Clause A2
Clause A3
OR
OR
Clause Crossover
Clause Crossover
rBias
Clause B1
Clause B2
Clause B4
OR
OR
(1 rBias)
Clause C1
Clause C2
Clause C3
Clause C4
OR
OR
OR
69Subset description crossover
- Consider the following 2 descriptors
(chromosomes) - A Clause A1 or Clause A2 or Clause A3
- B Clause B1 or Clause B2 or Clause B4
- Apply clause crossover (uniform or single-point)
to cross clause A1 with B1, and A2 with B2. - For clauses with no partner
- Include A3 with probability rBias (first parent).
- Include B4 with probability 1-rBias (second
parent).
70Uniform clause crossover
Age 20 .. 30
Height 1.5 .. 2.0
AND
Term Crossover
rBias
Age 0 .. 25
Hobby dancing
AND
(1 rBias)
Hobby dancing
Height 1.5 .. 2.0
Age ..
AND
AND
71Uniform clause crossover
- Consider the clauses
- A Age 20 .. 30 and Height 1.5 .. 2.0
- B Hobby dancing and Age 0 .. 25
- Align clauses with respect to terms
- A Age 20 .. 30 and Height
1.5 .. 2.0 - B Hobby dancing and Age 0 .. 25
- Apply term crossover between Age terms
- Include
- Height term (with no partner) in the child with
probability rBias. - Hobby term (with no partner) in the child with
probability (1rBias).
72Single-point clause crossover
Age 20 .. 30
Height 1.5 .. 2.0
AND
Crossover Point
Age 0 .. 25
Hobby dancing
AND
Age 0 .. 25
From first child
From second child
73Single-point clause crossover
- Consider the clauses
- A Age 20 .. 30 and Height 1.5 .. 2.0
- B Hobby dancing and Age 0 .. 25
- Align clauses with respect to terms
- A Age 20 .. 30 and Height
1.5 .. 2.0 - B Hobby dancing and Age 0 .. 25
- E.g. consider crossover point between Hobby and
Age - child takes terms to the left of the crossover
point in clause A, and terms to the right of the
crossover point in clause B - Child C Age 0 .. 25
74Term crossover value terms
Hobby dancing, singing
rBias
Hobby dancing, hiking
(1 rBias)
Hobby dancing, singing, hiking
75Term crossover range terms
Age 20 .... 30
rBias
rBias
Age 0 .... 25
(1 rBias)
(1 rBias)
Age low limit, high limit
76Term crossover
- Used to combine two terms concerning the same
attribute. - Consider the clauses
- A Hobby dancing, singing and Age 20 .. 30
- B Hobby hiking, dancing and Age 0 .. 25
- How to form child
- Value terms
- Include values common to both parents e.g.
dancing - Include values unique to one parent with a
probability e.g. rBias for singing and 1-rBias
for hiking - Range terms
- Select low and high limit with a probability
- Low limit for Age rBias for value 20 and 1-rBias
for value 0 - High limit for Age rBias for value 30 and
1-rBias for value 25 - Later prune (get rid of) non-valid ranges.
77Mutation
- Apply mutation at all levels, successively
- Subset description mutation
- Clause mutation
- Term mutation
78Subset description mutation
Clause A1
Clause A2
Clause A3
OR
OR
Clause mutation
Clause mutation
Clause mutation
Clause A1
Clause A2
Clause A3
OR
OR
Do add/delete clause ?
Add or delete ?
pCls
50 (equal prob)
Add
Clause A1
Clause A2
Clause A3
OR
OR
Clause A4
OR
Clause A1
Clause A2
Clause A3
OR
OR
Delete
79Subset description mutation
- Consider the following descriptor (chromosome)
- A Clause A1 or Clause A2 or Clause A3 or Clause
A4 - Steps
- 1. Apply clause mutation on each clause on A1,
A2, A3 and A4. - 2. Decide with probability pCls to do or not do
an add/delete clause operation. - 3. If add/delete has been decided, either add a
new clause or delete an existing clause with
equal probability (50) - deletion pick a clause at random and delete it.
- adding generate a new clause at random (from
random possible attributes with random
values/ranges assigned)
80Clause mutation
Hobby dancing
Age 20 .. 30
AND
Term mutation
Term mutation
Term Age
Term Hobby
AND
Do add/delete term ?
Add or delete ?
pTerm
50 (equal prob)
Add
Term Hobby
Term Age
Term X
AND
AND
Term Hobby
Term Age
AND
Delete
81Clause mutation
- Consider the following clause
- Hobby dancing and Age 20 .. 30
- Steps
- 1. Apply term mutation on each term.
- 2. Decide with probability pTerm to do or not do
an add/delete term operation. - 3. If add/delete has been decided, either add a
new term or delete an existing term with equal
probability (50) - deletion pick a term at random and delete it.
- adding generate a new term at random
82Term mutation - Value
Hobby dancing
Do term mutation?
rMutTerm
Attribute or value/range?
(1 rAvr) Value mutation
rAvr Attribute mutation
Occupation student
Hobby swimming
83Term mutation - Range
Age 10 .. 50
Do term mutation?
rMutTerm
Attribute or value/range?
(1 rAvr) Range mutation
rAvr Attribute mutation
Occupation student
Age 3 .. 25
84Term mutation
- First decide with a probability rMutTerm to
mutate this term or not. - If term mutation decided, do with a probability
either attribute mutation, or value/range
mutation. - Consider the following term Hobby dancing
- Attribute mutation randomly choose another
attribute available, e.g. occupation, and a
random value for it e.g. student. New term
occupation student - Value mutation randomly choose another value for
current attribute. E.g. swimming. New term
Hobby swimming - Consider the following term Age 10 .. 50
- Range mutation randomly choose another range for
current attribute. E.g. 3 .. 25. New term Age
3 .. 25