Title: Applying a Genetic Algorithm to Reconfigurable Hardware
1Applying a Genetic Algorithm to Reconfigurable
Hardware a Case Study
B. Earl Wells, Clint Patrick, Luis Trevino, John
Weir and Jim Steincamp NASA Marshall Space
Flight Center Huntsville, Alabama
University of Alabama in Huntsville, Huntsville,
Alabama
2Project Motivation Objectives
- To evaluate the technology of reconfigurable
computing -- determine its level of maturity and
suitability for use in future NASA applications - To implement a nontrivial test bed type
application on a Star Bridge Hypercomputer Model
36 - Chosen Application a simple Genetic Algorithm
3Targeted Hardware Platform
- Starbridge HC-36 Hypercomputer System
- Employs Xilinx Virtex II 6000 Series FPGAs
4Development Environment
- Development Environment
- VIVA Graphical User Interface
- Structural Design Philosophy with Behavioral
Attributes - Polymorphism
- Object Overload
- Recursion
- Data flow and data driven type synchronization
between objects (Go, Done, Busy, Wait
protocol) - Large library of high end objects
- Environment falls somewhere between hardware
description languages and schematic capture
packages
5Polymorphism, Overloading, Recursion, and
Synchronization
Example Object to Determine Number of 1s in a
Binary Number
Terminal Case
Recursive Case
6Genetic Algorithms
- Biologically Inspired Search Techniques
- Employs Selection, Replication (crossover),
Mutation, and Replacement - Iterative method -- very time intensive
- Regularly Structured
- Large Amounts of Concurrency Present that can be
Exploited
7Genetic Algorithm Implementation
Top Level View
Run Time Environment
8GA Characteristics
- 2 Way Tournament Selection
- No Elitism
- Single Point Cross Over with bit-wise mutation
- Weight Encoded Chromosome (weight translated into
rank ordering of cities) - Adjustable Parameters
- Population Size 2 to 512 (powers of 2),
Number of Generations, Probability of Mutation,
Probability of Crossover
9Block Diagram Level View ofGenetic Algorithm
Implementation
10Replacement Chromosome Storage
11Selection
12Standard Single Point Crossover Operation
(Weighted Chromosomes)
Crossover Point 4
25,17,10,20,33,14,7,29
Chromosome 2
44,12,17,38,20,5,70,13
Offspring Chromosome
25,17,10,20, 20,5,70,13
13Standard Single Point Crossover Operation
(Weighted Chromosomes)
14Single Point Mutation (Weighted Chromosomes)
Original Chromosome
25,17,10,20,20,5,70,13
Mutated Element 5
Mutated Chromosome
25,17,10,20,55,5,70,13
15Traveling Salesman Problem (TSP)
- Given a specified number of cities along with
the cost of travel between each pair of them,
find the cheapest way of visiting all the cities
and returning to the first city visited - Asymmetric Case direction traveled between any
two cities matters (i.e. cost is different) - Possible solutions (n-1)! where n is the number
of cities
16Traveling Salesman Problem (TSP)
- Well understood NP Complete optimization problem
- Academic literature contains many test problems
- Chose for test purposes an Asymmetric TSP with 65
cities (TSP 65) - Used a modified weight encoded chromosome
representation
University of Heidelberg, http//www.iwr.uni-heid
elberg.de/groups/comopt/software/TSPLIB95
17Equivalent TSP Chromosome Representations
Weighted Chromosome
City No. 0 1 2 3 4 5 6 7
25,17,10,20,55, 5,70,13
weights
Rank Ordering 5, 3, 1, 4, 6, 0, 7, 2
Visit Order Permutation Chromosome
City Visit Order 1st 2nd 3rd 4th 5th 6th 7th
8th
5, 2, 7, 1, 3, 0, 4, 6
city numbers
18TSP Objective Function
- Systolic sort of chromosome weights
- Summation of segments
- Replacement of weights with rank orderings
19Single Point Permutation Preserving Crossover
Operation
Crossover Point 4
1,7,3,2,5,6,0,4
Chromosome 2
0,2,4,1,6,5,7,3
Offspring Chromosome
1,7,3,2,0,4,6,5
20Modified Crossover Operator
21Permutation Altering Mutation
Original Chromosome
1,7,3,2,0,4,6,5
Mutation Removal Point 6
Insertion Point 3
Mutated Chromosome
1,7,4,3,2,0,6,5
Note No change in Mutation Operator Needed
22(No Transcript)
23Comparison with Instruction Set Processor, ISP,
Implementations
- Implemented TSP using a high-end 3.2 GHz Intel
Xeon Processor with 3-level Cache - Encoded Problem in C using pointers for maximum
efficiency - OS Redhat Enterprise Linnx v 3 (Kernal 2.4.21
SMP) -- single user - Basic Methodology Required 1.6 mS/per
Generation (population size 512) - Optimized Version Required 0.8ms/per Generation
(population size 512)
24Parallelization Strategies
- Initial Basic Reconfigurable Implementation on
the Starbridge System required 1.1 mS/per
Generation! - slower than the optimized ISP
implementation - (population size 512, Clock speed 66 MHz)
- MORE PARALLELIZATION WAS NEEDED!
25Parallelization Strategies
- Exploiting Concurrency in a Common Population
- Temporal Parallelism via pipelining
- Spatial Parallelism via replicating functional
units - Processing Isolated Subpopulations
- With chromosome migration (very promising for
Starbridge system but not yet completed)
26Applying Temporal Parallelism
27Applying Spatial Parallelism
28(No Transcript)
29Resource Requirements
- Non-pipelined 1 TSP Implementation
- Number of SLICES 10910 out of 33792 32
- Number of Block RAMs 40 out of 144
27 - Total equivalent gate count 2,767,231
- Pipelined 1 TSP Implementation
- Number of SLICES 10957 out of 33792 32
- Number of Block RAMs 40 out of 144
27 - Total equivalent gate count 2,770,741
30Resource Requirements
- Pipelined 2 TSP Implementation
- Number of SLICES 13738 out of 33792 40
- Number of Block RAMs 45 out of 144
31 - Total equivalent gate count 3,149,966
- Pipelined 4 TSP Implementation
- Number of SLICES 19685 out of 33792 58
- Number of Block RAMs 55 out of 144
38 - Total equivalent gate count 3,908,362
- Pipelined 6 TSP Implementation
- Number of SLICES 25728 out of 33792 76
- Number of Block RAMs 65 out of 144
45 - Total equivalent gate count 4,664,262
31Problems Encountered
- Synthesis Time Issues
- (within Viva and within Xilinx)
- Maturity/Robustness of CAD Tools
- Learning Curve
- Timing Issues
- I/O Pin Limitations
32Summary Conclusion
- A simple genetic algorithm was implemented on
reconfigurable hardware using the Viva paradigm - Significant but not spectacular speedups have
been obtained for the TSP using a combination of
temporal and spatial parallel processing methods - Many other opportunities exist to improve
processing through put - The concept of isolated subpopulations is very
promising method to further improve performance