Title: Genetic Algorithms
1Genetic Algorithms
2Quick and Dirty Definition of Genetic Algorithms
(courtesy of Wikipedia)
Choose Initial Population
Evaluate Fitness of Each individual
Select Individuals To Reproduce
Breed New Generation
Evaluate Fitness Of Offspring
Report Best Solution
Introduce Offspring Into Population
3How do we breed a new generation?
- Mutation Having a probability p that a bit b in
a solution will be changed from its original
state - Crossover Combining two solutions to produce a
third solution
Original
Mutation
Parent 1
Parent 2
Child
4NP-Hard Problems
- Definition An NP-Hard problem is a problem that
cannot be solved in polynomial time. - Non-Example
- Sorting a list of numbers
- Examples
- Traveling Salesman Problem
- Subset-Sum Problem
- Multiple Alignment
- Building a Phylogenetic Tree
5Example The Traveling Salesman Problem
- Definition Given a set of n cities and distances
between the cities, find the shortest way to
visit each city and return to your original
destination - Note This problem is not solved using a genetic
algorithm. It is currently solved using the
Lin-Kernighan Heuristic
6TSP (local version)
Baltimore
Rockville
Annapolis
Silver Spring
7TSP (local version)
Baltimore
Rockville
Annapolis
Silver Spring
8TSP (local version)
Baltimore
Rockville
Annapolis
Silver Spring
9TSP (local version)
Baltimore
Rockville
Laurel
Annapolis
Silver Spring
10Scary Numbers
- Solving the Traveling Salesman Problem for n
cities requires looking at (n-1)! solutions - 5 city version has 24 possible solutions
- 11 city version has over 3 million solutions
- 28 city version has over 1028 solutions
11Solving TSP with a genetic algorithm
Seattle
Boston
Minneapolis
Buffalo
Detroit
New York
Chicago
Cleveland
Salt Lake City
Philadelphia
Omaha
Pitt
San Francisco
Denver
Washington D.C.
Indianapolis
St. Louis
K.C.
Louisville
L.A.
Memphis
Phoenix
Birmingham
Dallas
El Paso
New Orleans
Houston
Miami
12Step 1 Pick an initial population
13Step 2a Mutation
Phoenix
Memphis
Phoenix
Memphis
Denver
Detroit
Detroit
Kansas City
Cleveland
Kansas City
Cleveland
Philadelphia
Salt Lake City
Philadelphia
Salt Lake City
New Orleans
Washington DC
New Orleans
Washington DC
Houston
Buffalo
Buffalo
Dallas
Miami
Dallas
Miami
St. Louis
Louisville
Louisville
Indianapolis
Indianapolis
New York
New York
Omaha
Pittsburgh
San Francisco
Minneapolis
Minneapolis
Seattle
Seattle
Chicago
El Paso
El Paso
Boston
Boston
Los Angeles
Los Angeles
Birmingham
Birmingham
14Step 2a Mutation
Phoenix
Memphis
Phoenix
Memphis
Denver
Detroit
Detroit
Denver
Kansas City
Cleveland
Kansas City
Cleveland
Philadelphia
Salt Lake City
Philadelphia
Salt Lake City
New Orleans
Washington DC
New Orleans
Washington DC
Buffalo
Buffalo
Houston
Dallas
Miami
Dallas
Miami
St. Louis
Louisville
Louisville
St. Louis
Indianapolis
Indianapolis
New York
New York
Omaha
Omaha
Pittsburgh
Pittsburgh
San Francisco
San Francisco
Minneapolis
Minneapolis
Seattle
Seattle
Chicago
Chicago
El Paso
El Paso
Boston
Houston
Boston
Los Angeles
Los Angeles
Birmingham
Birmingham
15Step 2b Crossover
Phoenix
Memphis
Washington DC
Indianapolis
Detroit
Denver
Pittsburgh
Phoenix
Kansas City
Cleveland
Birmingham
Buffalo
Philadelphia
Salt Lake City
Los Angeles
Miami
New Orleans
Washington DC
New Orleans
St. Louis
Buffalo
Houston
Philadelphia
Kansas City
Dallas
Miami
Chicago
El Paso
Louisville
St. Louis
Minneapolis
San Francisco
Indianapolis
Memphis
New York
Boston
Omaha
Seattle
Pittsburgh
Salt Lake City
San Francisco
Cleveland
Minneapolis
Omaha
Seattle
Detroit
Chicago
Louisville
El Paso
New York
Boston
Denver
Los Angeles
Dallas
Birmingham
Houston
16Step 2b Crossover
Phoenix
Memphis
Washington DC
Indianapolis
Detroit
Denver
Pittsburgh
Phoenix
Kansas City
Cleveland
Birmingham
Buffalo
Philadelphia
Salt Lake City
Los Angeles
Miami
New Orleans
Washington DC
New Orleans
St. Louis
Buffalo
Houston
Philadelphia
Kansas City
Dallas
Miami
Chicago
El Paso
Louisville
St. Louis
Minneapolis
San Francisco
Indianapolis
Memphis
New York
Boston
Omaha
Seattle
Pittsburgh
Salt Lake City
San Francisco
Cleveland
Minneapolis
Omaha
Seattle
Detroit
Chicago
Louisville
El Paso
New York
Boston
Denver
Los Angeles
Dallas
Birmingham
Houston
17Step 3 Combine New Generation with Population
- Find the mileage of each route in the new
generation - Pick a set of routes with comparatively good
mileage - Replace routes in population that have
comparatively bad mileage with the set of routes
from the new generation.
18Step 4 Repeat
- Continue generating new generations until
- We have generated a certain number of generations
- We have run the algorithm for a certain amount of
time - Our best solution is no longer improving
- Our best solution is good enough
19Some Parameters
- Best Results occurred when
- Initial population was 1024
- Probability of solution being mutated .35
- Probability of bit being mutated .1
- Probability of crossover 1
- Number of generations 100
- Time to run 33 seconds
20Best Solution (maybe)
Seattle
Boston
Minneapolis
Buffalo
Detroit
New York
Chicago
Cleveland
Salt Lake City
Philadelphia
Omaha
Pitt
San Francisco
Denver
Washington D.C.
Indianapolis
Louisville
St. Louis
K.C.
L.A.
Memphis
Phoenix
Birmingham
Dallas
El Paso
New Orleans
Houston
Miami
Route Length 10,966 Miles
21Paul Lewis Algorithm for creating phylogenetic
trees
- Improvements from generic algorithm
- Dont always put best solutions from next
generation into population - Always save best solution
- Use gamma distribution to mutate branch length
- Crossover is accomplished by pruning a random
subtree from a parent and regrafting it onto
another tree
22Next Level GARLI
- Improvements over Paul Lewis
- Implemented other topological mutations besides
subtree pruning/regrafting - Optimized branch lengths
- GARLI examines parameters after every 100
generations and refines them - Branch lengths are adjusted before algorithm even
begins - There are three different stopping conditions
23Multiple Alignment
- TTCAGATAAA TCTTCATTCC ATTCGTAACG ACTTCCGTTC
GACTTGCATG - ACTAATCA.. .....ATTCT TTAAGCGTAA ATTTTCGTTC
GACTTGCATG - .......... .....ATCTT CCG...AAGA AGATTCGTTC
GACTTGCATG - ATGAAATG.. .....TTTCC .......... .....CGTTC
GACTTGCATG - CCGTCATA.. .....ACTTC A......... ....TCGTTC
GACTTGCATG - GTGGCATA.. .....ACCCT TCG...GGGA GTGAGCCGTC
GA.......A - GTGAGCTA.. .....ACTTT T......AGA GGCAGCAGTC
GA.......A - GTGACCCA.. .....ACCTT T.....TGGA GGGAGCTGTC
GA.......A - GTGACCTA.. .....ACTGT A.....AAGA AGGAGCTGCC
GA.......A - GTGACCCA.. .....ACCGT A.....AGGA GGGAGCTGCC
GA.......A - GTGACCCA.. .....ACCGT A.....AGGA GGGAGCTGCC
GA.......A - GTGAGGTA.. .....ACCGC A.....AGGA GCCAGCCGTC
GA.......A - GTGAGGTA.. .....ACCGC A.....AGGA GCCAGCTGCC
GA.......A
24Strategy
- Start with an alignment generated by an alignment
program - Move gaps around randomly (mutation)
- Combine alignments by randomly choosing sequences
from each (crossover) - Score alignments and combine with population
(with higher probability of choosing an alignment
with a better score)
25Results
- Population size 64
- Generations 200
- Probability of mutation 1
- Probability of crossover .15
- 7.5 improvement in score
- 50 seconds to run
26Improved Alignment?
- TTCAGATAAA TCTTCATTCC ATTCGTAACG ACTTCCGTTC
GACTTGCATG - ACTAATCA.A ......TTCT TTAAGCGTAA ATTTTCGTTC
GACTTGCATG - .........A ......TCTT CCG..A.AGA AGATTCGTTC
GACTTGCATG - ATGAAATG.. .....TTTCC .......... .....CGTTC
GACTTGCATG - CCGTCATA.A ....C..TTC A......... ....TCGTTC
GACTTGCATG - GTGGCATA.A ....C..CCT TCG...GGGA GTGAGCCGTC
GA.....A.. - GTGAGCTA.A ....C..TTT .T.....AGA GGCAGCAGTC
GA.....A.. - GTGACCCA.A ....C..CTT .T...TG.GA GGGAGCTGTC
GA.....A.. - GTGACCTA.A ....C..TGT A....A.AGA AGGAGCTGCC
GA.....A.. - GTGACCCA.A ....C..CGT A.....AGGA GGGAGCTGCC
GA.....A.. - GTGACCCA.A ....C..CGT A.....AGGA GGGAGCTGCC
GA.....A.. - GTGAGGTA.A ....C..CGC A.....AGGA GCCAGCCGTC
GA.....A.. - GTGAGGTA.A ....C..CGC A.....AGGA GCCAGCTGCC
GA.....A..
27Possible Improvements
- Replace current scoring system with one that has
an affine gap penalty - Try to optimize the probability of crossover
- Try to find the optimal way of moving a gap (i.e.
dont just move it a random number of space) - Try to determine good terminating criteria
- Adjust parameters based on alignment