Title: ICT619%20Intelligent%20Systems%20Topic%205:%20Genetic%20Algorithms
1ICT619 Intelligent SystemsTopic 5 Genetic
Algorithms
2Genetic Algorithms
- Introduction
- How GAs work
- The TSP as an example
- Business Applications of GA
- Advantages of GA systems
- Some issues related to GA based systems
- Case Study
3What is a genetic algorithm?
- GA part of the broader soft computing (aka
"computational intelligence") paradigm known as
evolutionary computation - First introduced by Holland (1975)
- Inspired by possibility of problem solving
through a process of evolution
4What is a GA? (contd)
- GA mimics biological evolution to generate
better solutions from existing solutions through - survival of the fittest
- crossbreeding and
- mutation
5What is a GA? (contd)
- A GA is capable of finding solutions for many
problems for which no usable algorithmic
solutions exist - GA methodology particularly suited for
optimization - Optimization searches a solution space consisting
of a large number of possible solutions - GA reduces the search space through evolution of
solutions, conceived as individuals in a
population
6Intelligence and Evolution
- One way of understanding intelligence is as the
capability of a creature to adapt itself to an
ever-changing environment - We normally think of adaptation as changes in the
characteristics (including behaviours) of a
single animal in response to experiences over its
history - But adaptation is also change in the
characteristics of a species, over the
generations, in response to environmental change - An individual creature is in competition with
other individuals of the same species for
resources, mates etc. - There is also rivalry from other species which
may be a direct (predator)or indirect (food,
water, land, etc.) threat - In nature, evolution operates on populations of
organisms, ensuring by natural selection that
characteristics that serve the members well tend
to be passed on to the next generation, while
those that dont die out
7Evolution as Optimisation
- Evolution can be seen as a process leading to the
optimisation of a populations ability to survive
and thus reproduce in a specific environment. - Evolutionary fitness - the measure of the ability
to respond adequately to the environment, is the
quantity that is actually optimised in natural
life - Consider a normal population of rabbits. Some
rabbits are naturally faster than others. Any
characteristic has a range of variation that is
due to i) sexual reproduction and ii) mutation - We may say that the faster rabbits possess
superior fitness, since they have a greater
chance of avoiding foxes, surviving and then
breeding - If two parents have superior fitness, there is a
good chance that a combination of their genes
will produce an offspring with even higher
fitness. We say that there is crossover between
the parents genes - Over successive generations, the entire
population of rabbits tends to become faster to
meet their environment challenges in the face of
foxes
8How GAs work
- A population of candidate solutions -
mathematical representations - is repeatedly
altered until an optimal solution is found - The GA evolutionary cycle
- Starts with a randomly generated initial
population of solutions (1st generation) - Selects a population of better solutions (next
generation) by using a measure of 'goodness' (a
fitness evaluation function) - Alters new generation population through
crossbreeding and mutation -
- Processes of selection (step 2) and alteration
(step 3) lead to a population with a higher
proportion of better solutions
9How GAs work (contd)
- The GA evolutionary cycle continues until an
acceptable solution is found in the current
population, - or
- some control parameter such as the maximum number
of generations is exceeded
10How solutions are represented
- A series of genes, known as a chromosome,
represents one possible solution - Each gene in the chromosome represents one
component of the solution pattern - Each gene can have one of a number of possible
values known as alleles - The process of converting a solution from its
original form into a chromosome is known as
coding
11How solutions are represented (contd)
- The most common form of representing a solution
as a chromosome is a string of binary digits (aka
a binary vector) eg 1010110101001 - Each bit in this string is a gene with two
alleles 0 and 1 - Other forms of representation are also used, eg,
integer vectors - Solution bit strings are decoded to enable them
to be evaluated using a fitness measure
12GA Selection
- Selection in GA based on a process analogous to
that of biological evolution - Only the fittest survive and contribute to the
gene pool of the next generation - Fitness proportional selection
- Each chromosomes likelihood of being selected is
proportional to its fitness value. - Solutions failing selection are bad, and are
discarded
13Alteration Crossover Mutation
- Alteration refines good solutions from current
generation to produce next generation of
solutions - Carried out by performing crossover and mutation
- Crossover by splicing two chromosomes at a
crossover point and swapping the spliced parts - A better chromosome may be created by combining
genes with good characteristics from one
chromosome with some good genes in the other
chromosome - Crossover carried out with a probability
typically 0.7 - Chromosomes not crossed over are cloned
14Crossover and Mutation
- Mutation
- A random adjustment in the genetic composition
- Can be useful for introducing new characteristics
in a population - May be counterproductive
- Probability kept low typically 0.001 to 0.01
15 An albino is a common mutation
16The typical Genetic Algorithm
- Represent the solution as a chromosome of fixed
length, choose size of population N, crossover
probability pc and mutation probability pm. - Define a fitness function f for measuring fitness
of chromosomes. - Create an initial solution population randomly of
size N x1, x2, , xN - Use the fitness function f to evaluate the
fitness value of each solution in the current
generation f(x1), f(x2), , f(xN)
17The typical Genetic Algorithm (contd)
- Select good solutions based on fitness value.
Discard rest of the solutions. - If acceptable solution(s) found in the current
generation or maximum number of generations is
exceeded then stop. - Alter the solution population using crossover and
mutation to create a new generation of solutions
with population size N. - Go to step 4.
18The typical Genetic Algorithm (contd)
19The Travelling Salesperson Problem
- Given a set of n cities (A, B, C, ...) find a
closed tour of all cities with the shortest total
distance d - Tour 'cost' may be something other than distance
d - This is an optimization problem with following
constraints - 1. Each city to be visited once and only once
- 2. Total distance travelled must be shortest
possible - The time required to find a solution by
exhaustive search increases exponentially - the
problem is NP-hard - Possible number of tours for n cities n!/2n
- 1 million centuries for 50 cities at the rate of
1 billion tours per sec!
20The Travelling Salesperson Problem
In 1987, Martin Groetschel and Olaf Holland found
an optimal tour of 666 interesting places in the
world. Source http//www.tsp.gatech.edu//index.ht
ml
21The TSP example (contd)
- Representation and coding of TSP solutions
- The representation might be an ordered list of
numbers each representing a city nominally (known
as order-based GA) -
- 1) London 3) Dunedin 5) Beijing 7) Tokyo
- 2) Venice 4) Singapore 6) Phoenix 8) Victoria
- CityList1 (3 5 7 2 1 6 4 8)
- CityList2 (2 5 7 6 8 1 3 4)
- Alternatively, the representation of the solution
may be encoded in binary on a matrix...
22The TSP example (contd)
- Representation and coding of TSP solutions
- A solution to the TSP problem is an ordered list
of the n cities - Each city is assigned 1 out of n possible
positions - Representation of the solution may be visualised
as a table - Each row represents a city
- Each column associated with a tour position for
cities
23The TSP example (contd)
- The tour represented above is CAEBDC
- One possible bit string code for this solution
- 01000 00010 1000 00001 00100
- (rows written end to end)
- Binary bit strings can produce "faulty"
chromosomes needing repair - An integer vector scheme produced a 100 city tour
9.4 above optimal cost
24An Optimal 100-City Tour
25Business Applications of GA
- Increasing number of industrial and business
applications of GA since late 1980s - In business, applications include (Kingdon 1997)
- Portfolio optimisation
- Bankruptcy prediction
- Financial forecasting
- Fraud detection
- Scheduling
- Design of complex
- machines eg. jet engines
26Business Applications of GA (contd)
- First Quadrant - investment firm in California
- Started using GA technique in 1993
- Uses GA to manage US5 billion worth of
investments in 17 different countries - Their evolved model earns, on average, 255 for
every 100 invested over six years, as opposed to
205 for other types of modeling systems (Begley
Beals, 1995)
27Advantages of GA systems
- Useful when no algorithms or heuristics are
available for solving a problem - No formulation of the solution is required - only
"recognition" of a good solution - A GA system can be built as long as a solution
representation and an evaluation scheme can be
worked out - So minimal domain expert access is required
28Advantages of GA systems
- GA can act as an alternative to -
- Expert Systems if
- number of rules is too large or
- the nature of the knowledge-base too dynamic
- Traditional optimization techniques if
- constraints and objective functions are
non-linear and/or discontinuous
29Advantages of GA systems (cont'd)
- GA does not guarantee optimal solutions, but
produce near optimal solutions which are likely
to be very good - Solution time with GA is highly predictable
Determined by - Size of the population
- Time taken to decode and evaluate a solution and
- Number of generations of population
- GA uses simple operations to solve problems that
are computationally prohibitive otherwise - Example the TSP problem
30Advantages of GA systems (cont'd)
- Because of simplicity, GA software are
- Reasonably sized and self-contained
- Easier to embed them as a module in another
system - GA can also aid in developing intelligent
business systems that use other methodologies,
eg, - Building the rule base of an expert system
- Finding optimal neural networks
31Some issues related to GA based systems
- Level of explainability
- Capability to explain why a particular solution
was arrived at is practically nil - The system does not know what a fitness value
really means - Scalability
- Moderately scalable
- Accommodates increased number of variables by
increasing the length of the chromosome - But
- A longer chromosome means a larger population
space (more potential combinations of genes) - More time required for decoding and fitness
evaluation
32Some issues related to GA based systems (contd)
- Data requirements
- In general, GA do not require extensive access to
data but some applications may need it to
evaluate solutions - This makes the quality and quantity of data is
important - Local maxima
- Local maxima are regions that hold good solutions
relative to regions around them, but which do not
necessarily contain the best overall solutions - The region(s) that contain the best solutions are
called global maxima - GAs are less prone to being trapped in local
maxima because of the use of mutation and
crossover
33Some issues related to GA based systems (contd)
- Premature convergence
- A GA is said to have converged prematurely if it
explores a local maximum extensively - It may be then dominated by very similar
solutions within the region - Most significant factor leading to such
convergence is a mutation rate which is too slow - Mutation interference is an effect opposite to
that of premature convergence
34Some issues related to GA based systems (contd)
- Mutation interference
- Finding a mutation rate which allows the GA to
converge but which also allows adequate
exploration of the solution space is essential
for satisfactory performance - Mutation interference occurs when mutation rates
in a GA are too high, and as a result - Solutions are frequently or drastically mutated
- The algorithm never manages to explore any region
of the space thoroughly - Any good solutions found tend to be destroyed
rapidly
35Case Study - Help Desk Task Scheduling (Dhar
Stein 1997, pp.219-227)
- GA based system developed at Moodys for
scheduling service tasks to its customer service
representatives - Major constraints
- The system
- Must minimise computer downtime and customer
dissatisfaction - Must integrate with existing database system
which kept track help desk requests.
36Case study constraints (contd)
- Must be flexible to
- Accommodate new types of task definitions and
changes in employee, training etc. - Allow administrator to modify solutions
- Must generate and reevaluate schedules quickly
(under 15 minutes) and consistently - Must not take administrator or CSRs away from
their jobs for any extended period of time - Must be developed quickly
37Case study constraints (contd)
- Should be scalable in case of future growth in
number of requests for help and the number of
CSRs - Must not be too complicated for its users the
administrator and CSRs - The main difficulties in meeting the constraints
- the large number of tasks
- the large number of CSRs
- the varying capabilities of CSRs, and
- the wide variety of tasks
38Case Study - Variables and issues needing to be
considered
- The priority of a task, which is determined by
the severity of the problem - The length of time required to perform the task
and how it would affect the servicing of other
users - The ability of various CSRs to perform different
levels of tasks (expertise must match the
complexity) - Low priority tasks must not be kept waiting
indefinitely - The measure of goodness of a schedule to be based
on amount of downtime each schedule cost the
organisation.
39Possible solution methodologies considered
- Traditional linear programming (a numerical
optimisation technique) - A rule based expert system
- A GA based system
- ES ruled out because
- Expertise to solve this problem not expressible
as a set of rules - Help desk administrator not available for
knowledge extraction - Linear programming ruled out because
- It fails if no optimal solution can be found
- It does not produce any sub-optimal solutions,
which is the case with GA.
40Case Study - the solution
- SOGA (Schedule Optimising for GA)
- A hybrid system consisting of GA and fuzzy system
components - The GA component deals with the scheduling task.
- Each task in the queue is represented by a gene
- The entire task list forms the chromosome
- Each chromosome is decoded by a scheduling module
that assigns tasks to available CSRs who can
perform them
41Case Study - the solution (contd)
- Fitness of each chromosome is determined by
calculating the amount of downtime that would
result based on the schedule represented by the
chromosome. - Schedules generated by the GA component are
modified by the FS component - SOGA runs in the background behind the help
request tracking system - Updates schedules based upon a predefined time
interval (eg, every 10 or 15 minutes) - CSRs access their current job queue through their
interface to accept jobs.
42Case Study - Results
- The system is timely generating schedules in
about 5 minutes. - The solutions are found to be good by the help
desk administrator - The system is flexible enough to allow for task
definitions - The system scales up well to larger domains
(higher number of tasks) - The SOGA system was developed in two months using
one programmer overseen by its designers
43REFERENCES
- Begley, S. and Beals, G. "Software au naturel."
Newsweek, May 8, 1995, p.70 - Dhar, V., Stein, R., Seven Methods for
Transforming Corporate Data into Business
Intelligence., Prentice Hall 1997, pp. 126-148,
203-210. - Goldberg, D. E., Genetic and Evolutionary
Algorithms Come of Age, Communications of the
ACM, Vol.37, No.3, March 1994, pp.113-119. - Holland, J. H., Adaptation in Natural and
Artificial Systems, Univ. of Michigan Press,
1975. - Kingdon, J., Intelligent Systems and Financial
Forecasting, Springer Verlag, London 1997. - Medsker,L., Hybrid Intelligent Systems, Kluwer
Academic Press, Boston 1995. - Michalewicz, Z., Genetic Algorithms Data
Structures Evolution Programs, Springer-Verlag,
Berlin 1996. - Negnevitsky, M. Artificial Intelligence A Guide
to Intelligent Systems, Addison-Wesley 2005.