Title: An Introduction to Genetic Algorithms
1An Introduction toGenetic Algorithms
Lecture 6 COMP4044/5318
2Outline of the lecture
- History of Evolutionary Algorithms
- Genetic Algorithm (GA)
- Overview
- The Basic GA Algorithm
- Two Simple Examples
- Why GA works? Schema Theorem
- Advantages Disadvantages
- GA for Rule Discovery
- Case Study of Applying GA to ML DM
- GABIL
3Biological Evolution
- Lamarck and others
- Species transmute over time
- Experience of an organism directly affects the
genetic makeup of its offspring - Darwin and Wallace
- Consistent, heritable variation among individuals
in population - Natural selection of the fittest
- Mendel and genetics
- A mechanism for inheriting traits
- Mapping of genotypes to phenotype
4The Big Picture (primitive organisms)
t
selection
reproduction
mutation
5The Big Picture (higher level organisms)
t
selection
reproduction
Sex is good
chromsomes
recombination
6The Metaphor
- Evolution
- Individual
- Fitness
- Environment
7Taxonomy
Computational Intelligence or Soft Computing
Neural Networks
Evolutionary Computation
Fuzzy Systems
Evolutionary Programming
Evolution Strategies
Genetic Algorithms
Genetic Programming
8Family of Evolutionary Computation
- Computational procedures patterned after
biological evolution - Search procedure that probabilistically applies
search operators to set of points in the search
space
9GA Overview
- Developed by John Holland
- Search algorithms based on the mechanics of
natural evolution survival of the fittest - Ability to create an initial population of
feasible solutions, then recombine them in such a
way to direct the path to the most promising
areas of the search space. - Each individual solution is encoded as a
chromosome (also called genotype) with binary
representation, a fitness function is used to
measure the fitness of the phenotype. - The fitness of a phenotype determines its chances
of survival.
10SGA The Basic Algorithm
t 0 Generate initial population, Pt Evaluate
all individuals in Pt using a fitness
function While not end of evolution t t
1 Reproduce Pt from Pt-1 Perform Crossover in
Pt Perform Mutation in Pt Evaluate all
individuals in Pt using a fitness function
11Simple GA (continued)
- Use probabilistic rules to evolve a population
from one generation to the next - Biased (not random) reproduction
- Crossover
- Mutation
- A few parameters to twist
- Population size
- Crossover rate
- Mutation rate
12GA Preliminary Considerations
- Representation
- Bit string
- Can be real numbers, integers, characters, list
of rules, matrices - All chromosomes in a population are of the same
type - Choice of alphabets
- Length of chromosome
- Chromosomes are of the same length
- Most GAs use haploid representation as compared
against humans diploid representation - Fitness Function
13GA Preliminary Considerations
- Population Size
- Remain constant throughout all generations
- What is the problem of a small or a large
population? - Too small efficient computation but premature
convergence, i.e. trapped in local optima - Too large greater chance to find the global
optimum, but higher computational cost
14GA Selection for Reproduction
- Fitness Proportionate Methods
- Roulette Wheel
- The classical selection operator for generational
GA as described by GA. - Each member of the pool is assigned space on a
roulette wheel proportional to its fitness. - The members with the greatest fitness have the
highest probability of selection. - Tournament
- Randomly choose two individuals, the fitter one
will be left in the next generation - Rank
- Sort the individuals according to their fitness
- The selection is based on their ranks
15GA Crossover
- Mate each chromosome randomly
- In each mating,
- randomly select the crossover positions
- genetic materials between two parents are swapped
- Various crossover techniques
- One-point
- Two points
- Uniform
16GA Crossover Example
Single Point Crossover Parent1 100
1001010 Parent2 001 0110111 Child1 100
0110111 Child2 001 1001010
Two Points Crossover Parent1 100 1001
010 Parent2 001 0110 111 Child1 100 0110
010 Child2 001 1001 111
Uniform Crossover Parent1 1001001010 Parent2 0010
110111 Template 1001110001 Child1 1011000000 Chil
d2 0000111011
17GA Mutation
- Randomly change the allele in a random locus
- Able to search the entire state-space if enough
time is given - Restore lost information or add information to
the population - Perform on a child after crossover
- Perform very infrequently, pm usually ? 0.01
Mutation Child 1001001010 After
Mutation 1001101010
18Generational versus Steady State
- Generational GA
- Replace the whole population with new individuals
in the next generation
- Steady State GA
- Keep the old population but replace the k weakest
individuals by new offspring
19GA versus Traditional Search Algorithm
st1
st
20GA versus Traditional Search Algorithm
- GA works from a population of strings instead of
a single point. - Application of GA operators causes information
from the previous generation to be carried over
to the next. - GA uses probabilistic transition rules, not
deterministic rules.
21The Search Mechanism
- A search is composed of exploration and
exploitation - The search in GA
- Exploration by
- Recombination
- Mutation
- Exploitation by
- Selection
22An Example
- f(x) 4cos(x) x 2.5
- 0 ? x ? 31
- Representation a 5-bit binary string
- Parameter setting
- Population size 8
- Crossover rate 0.75
- Mutation rate 0.001
- Max. generation 40
23f(x) when 0 ? x ? 31
best 28.33144
24Another Example
- Same f(x)
- 0 ? x ? 232-1
- Representation a 32-bit binary string
- Parameter setting
- Population size 20
- Crossover rate 0.75
- Mutation rate 0.001
- Max. generation 50
25f(x) when 0 ? x ? 232-1
26f(x) when 0 ? x ? 232-1
Best 2.0708049813083885E9
27Examplef(x) 4cos(x) x 2.5
Selection Using Roulette Wheel
28Why it works?
- An abstract way to view the complexities of
crossover - Schema string of 0, 1, (dont care)
- Consider a 6-bit representation
- 0 represents a subset of 32 strings
- 100 represents a subset of 8 strings
- Let H represent a schema such as 11
- Order o(H), the number of fixed positions in the
schema, H - o(1) 1
- o(100) 3
- Length delta(H), the distance between sentinel
fixed positions in H - ?(11) 4 1 3
- ?(1) 0
29Why it works?- Consider just the selection (1)
m(s, t) no. of instances having schema s in
population at time t
30Why it works?- Consider just the selection (2)
Probability of selecting h in one selection step
31Why it works?- Schema Theorem
- m(s,t) instances of schema s in pop. at time t
- avg. fitness of pop. at time t
- avg. fitness of instances of s at time t
- pc probability of single point crossover
- pm probability of mutation
- l length of single bit strings
- o(s) number of defined (non ) bits in s
- ?(s) distance between the leftmost and
rightmost in s
32GA (and other EAs) Advantages
- A robust search technique
- No (little) knowledge (assumption) w.r.t. the
problem space - Fairly simple to develop low development costs
- Easy to incorporate with other methods
- Solutions are interpretable
33GA Advantages (continued)
- Can be run interactively, i.e. accommodate user
preference - Provide many alternative solutions
- Acceptable performance at acceptable costs on a
wide range of problems - Intrinsic parallelism (robustness, fault
tolerance)
34GA Disadvantages
- No guarantee for optimal solution within a finite
time - Weak theoretical basis
- Interdependency of genes
- Parameter tuning is an issue
- Often computationally expensive, i.e. slow
35GP An Example
36GP - Crossover
37GA for Rule Discovery
- Representation
- How rules are encoded?
- Rule antecedent
- Rule consequent
- Genetic Operators
- Selection
- Generalisation/Specialisation
- Fitness Function
38GA for Rule Discovery
- Representation how rules are encoded?
- Michigan versus Pittsburgh Approach
- Michigan each individual encodes a single rule
- Pittsburgh each individual encodes a set of
rules
39GA Application
40GA for Rule Discovery- Representing the Rule
Consequent
- 3 ways
- Encode the predicted class in the genome
- Associate all individuals of the population with
the same predicted class - Choose the predicted class most suitable for a
rule - Can be the class that has more representatives
- The class that maximizes the individual fitness
41GABIL Representation
a1 a2 c a1 a2 c 10 01 1 11 10 0
a1 a2 c a1 a2 c 01 11 0 10 01 0
- Binary String
- Conjunctive forms with internal disjunction
- The LHS of each rule consists of a conjunction of
one or more tests involving feature values - Concept represented as a disjunctive set of
overlapping classification rules, i.e. in
Disjunctive Normal Form
42GA for Rule Discovery- Genetic Operators
- Selection
- In the Michigan approach
- Avoid the convergence to the same single rule
- Forming niches encouraging the evolution of
several different rules (each covering a
different part of the data space) - Generalizing/Specialization
- Can be implemented via bitwise logical function
- Generalization
- Subtraction of a small quantity from the value
- deletion of a condition
- Specialization
- Add a small quantity to a value
- Addition of another condition in the antecedent
43GA for Rule Discovery- Fitness Function (1)
- The discovered rules should
- Have high predicted accuracy
- Be comprehensible
- Be interesting
- For a rule A ? C, accuracy is measured using the
confidence factor, A?C/A - But such a simple measurement may lead to
overfitting the data, e.g. - just entertaining one example in the training set
44GA for Rule Discovery- Fitness Function (2)
Actual class
Predicted class
- Confidence TP/(TPFP)
- Completeness TP/(TPFN)
- Simplicity ? 1/num_conditions_in_antecedent
- Fitness w1 x confidence x completeness w2 x
simplicity
45An Example GABIL (DeJong, 1993)
- Learn disjunctive set of propositional rules,
competitive with C4.5 - Fitness Fitness(h) (correct(h))2
- Representation
- If a1 T ? a2 F Then c T If a2 T Then c
F they can be represented by
a1 a2 c 10 01 1
a1 a2 c 11 10 0
46GABIL Crossover with Variable-Length Bit-strings
h1
h2
a1 a2 c a1 a2 c 10 01 1 11 10 0
a1 a2 c a1 a2 c 01 11 0 10 01 0
- Choose crossover points for h1, e.g. after bits
1, 8 - Restrict points in h2 to those that produce
bitstrings with well-defined semantics, e.g.
lt1,3gt, lt1,8gt, lt6,8gt - If we choose lt1, 3gt, result is
h3
h4
a1 a2 c 10 11 0
a1 a2 c a1 a2 c a1 a2 c 01 01 1 11 10 0
10 01 0
47GABIL Extensions
- New genetic operators (applied probabilistically)
- AddAlternative generalize by changing a
constraint ai from 0 to 1 - DropCondition generalize by changing a
constraint on ai from every 0 to 1
48GABIL Results
- Performance of GABIL comparable to symbolic
rule/tree learning methods C4.5, ID5R, AQ14 - Average performance on a set of 12 synthetic
problems - GABIL without AC DC 92.1 accuracy
- GABIL with AC DC 95.2 accuracy
- Symbolic learning methods ranged from 91.2 to
96.6
49Books
- T. Mitchell, Machine Learning (Ch.9).
McGraw-Hill. 1997 - M. Mitchell, An Introduction to Genetic
Algorithms, MIT Press, 1996. - J. Koza, Genetic Programming, MIT Press, 1992.
- D.E. Goldberg, Genetic Algorithms in Search,
Optimization and Machine Learning,
Addison-Wesley, 1989. - T. Bäck. Evolutionary Algorithms in Theory and
Practice. Oxford University Press, 1996. - D.B. Fogel, Evolutionary Computation, IEEE, 1995.
- Z. Michalewics, Genetic Algorithms Data
Structures Evolution Programs, Springer, 3rd
ed, 1996.
50Papers
- Freitas, Alex (2002). A Survey of Evolutionary
Algorithms for Data Mining and Knowledge
Discovery. In A Ghosh and S Tsutsui, editors,
Advances in Evolutionary Computation, pages
819-845. Springer-Verlag, August 2002 - DeJong, K., Spears, W. and Gordon, D. (1993).
Using Genetic Algorithms for Concept Learning.
Machine Learning, 13, pp.161-188. - Weiss, G. (1999). Timeweaver A Genetic Algorithm
for Identifying Predictive Patterns in Sequences
of Events. Proceedings of the Genetic and
Evolutionary Computation Conference (GECCO-99),
Morgan Kaufmann, San Francisco, CA, 718-725. - Kim, Y, and Street, W. and Menczerin, F. (2003).
Feature Selection in Data Mining in Data Mining
Opportunities and Challenges. (ed) John Wang,
published by Hershey, Pa London. 2003. pp.80-105 - Mitra, S. and Pal, S. (2000). Data Mining in Soft
Computing Framework A Survey. IEEE Trans. on
Neural Networks. 13(1).