Applying a Genetic Algorithm to Reconfigurable Hardware - PowerPoint PPT Presentation

About This Presentation
Title:

Applying a Genetic Algorithm to Reconfigurable Hardware

Description:

Applying a Genetic Algorithm to Reconfigurable Hardware a Case Study B. Earl Wells*, Clint Patrick, Luis Trevino, John Weir and Jim Steincamp – PowerPoint PPT presentation

Number of Views:237
Avg rating:3.0/5.0
Slides: 33
Provided by: Bure98
Learn more at: http://klabs.org
Category:

less

Transcript and Presenter's Notes

Title: Applying a Genetic Algorithm to Reconfigurable Hardware


1
Applying a Genetic Algorithm to Reconfigurable
Hardware a Case Study
B. Earl Wells, Clint Patrick, Luis Trevino, John
Weir and Jim Steincamp NASA Marshall Space
Flight Center Huntsville, Alabama
University of Alabama in Huntsville, Huntsville,
Alabama
2
Project Motivation Objectives
  • To evaluate the technology of reconfigurable
    computing -- determine its level of maturity and
    suitability for use in future NASA applications
  • To implement a nontrivial test bed type
    application on a Star Bridge Hypercomputer Model
    36
  • Chosen Application a simple Genetic Algorithm

3
Targeted Hardware Platform
  • Starbridge HC-36 Hypercomputer System
  • Employs Xilinx Virtex II 6000 Series FPGAs

4
Development Environment
  • Development Environment
  • VIVA Graphical User Interface
  • Structural Design Philosophy with Behavioral
    Attributes
  • Polymorphism
  • Object Overload
  • Recursion
  • Data flow and data driven type synchronization
    between objects (Go, Done, Busy, Wait
    protocol)
  • Large library of high end objects
  • Environment falls somewhere between hardware
    description languages and schematic capture
    packages

5
Polymorphism, Overloading, Recursion, and
Synchronization
Example Object to Determine Number of 1s in a
Binary Number
Terminal Case
Recursive Case
6
Genetic Algorithms
  • Biologically Inspired Search Techniques
  • Employs Selection, Replication (crossover),
    Mutation, and Replacement
  • Iterative method -- very time intensive
  • Regularly Structured
  • Large Amounts of Concurrency Present that can be
    Exploited

7
Genetic Algorithm Implementation
Top Level View
Run Time Environment
8
GA Characteristics
  • 2 Way Tournament Selection
  • No Elitism
  • Single Point Cross Over with bit-wise mutation
  • Weight Encoded Chromosome (weight translated into
    rank ordering of cities)
  • Adjustable Parameters
  • Population Size 2 to 512 (powers of 2),
    Number of Generations, Probability of Mutation,
    Probability of Crossover

9
Block Diagram Level View ofGenetic Algorithm
Implementation
10
Replacement Chromosome Storage
11
Selection
12
Standard Single Point Crossover Operation
(Weighted Chromosomes)
  • Chromosome 1

Crossover Point 4
25,17,10,20,33,14,7,29
Chromosome 2
44,12,17,38,20,5,70,13
Offspring Chromosome
25,17,10,20, 20,5,70,13
13
Standard Single Point Crossover Operation
(Weighted Chromosomes)
14
Single Point Mutation (Weighted Chromosomes)
Original Chromosome
25,17,10,20,20,5,70,13
Mutated Element 5
Mutated Chromosome
25,17,10,20,55,5,70,13
15
Traveling Salesman Problem (TSP)
  • Given a specified number of cities along with
    the cost of travel between each pair of them,
    find the cheapest way of visiting all the cities
    and returning to the first city visited
  • Asymmetric Case direction traveled between any
    two cities matters (i.e. cost is different)
  • Possible solutions (n-1)! where n is the number
    of cities

16
Traveling Salesman Problem (TSP)
  • Well understood NP Complete optimization problem
  • Academic literature contains many test problems
  • Chose for test purposes an Asymmetric TSP with 65
    cities (TSP 65)
  • Used a modified weight encoded chromosome
    representation

University of Heidelberg, http//www.iwr.uni-heid
elberg.de/groups/comopt/software/TSPLIB95
17
Equivalent TSP Chromosome Representations
Weighted Chromosome
City No. 0 1 2 3 4 5 6 7
25,17,10,20,55, 5,70,13
weights
Rank Ordering 5, 3, 1, 4, 6, 0, 7, 2
Visit Order Permutation Chromosome
City Visit Order 1st 2nd 3rd 4th 5th 6th 7th
8th
5, 2, 7, 1, 3, 0, 4, 6
city numbers
18
TSP Objective Function
  • Systolic sort of chromosome weights
  • Summation of segments
  • Replacement of weights with rank orderings

19
Single Point Permutation Preserving Crossover
Operation
  • Chromosome 1

Crossover Point 4
1,7,3,2,5,6,0,4
Chromosome 2
0,2,4,1,6,5,7,3
Offspring Chromosome
1,7,3,2,0,4,6,5
20
Modified Crossover Operator
21
Permutation Altering Mutation
Original Chromosome
1,7,3,2,0,4,6,5
Mutation Removal Point 6
Insertion Point 3
Mutated Chromosome
1,7,4,3,2,0,6,5
Note No change in Mutation Operator Needed
22
(No Transcript)
23
Comparison with Instruction Set Processor, ISP,
Implementations
  • Implemented TSP using a high-end 3.2 GHz Intel
    Xeon Processor with 3-level Cache
  • Encoded Problem in C using pointers for maximum
    efficiency
  • OS Redhat Enterprise Linnx v 3 (Kernal 2.4.21
    SMP) -- single user
  • Basic Methodology Required 1.6 mS/per
    Generation (population size 512)
  • Optimized Version Required 0.8ms/per Generation
    (population size 512)

24
Parallelization Strategies
  • Initial Basic Reconfigurable Implementation on
    the Starbridge System required 1.1 mS/per
    Generation!
  • slower than the optimized ISP
    implementation
  • (population size 512, Clock speed 66 MHz)
  • MORE PARALLELIZATION WAS NEEDED!

25
Parallelization Strategies
  • Exploiting Concurrency in a Common Population
  • Temporal Parallelism via pipelining
  • Spatial Parallelism via replicating functional
    units
  • Processing Isolated Subpopulations
  • With chromosome migration (very promising for
    Starbridge system but not yet completed)

26
Applying Temporal Parallelism
27
Applying Spatial Parallelism
28
(No Transcript)
29
Resource Requirements
  • Non-pipelined 1 TSP Implementation
  • Number of SLICES 10910 out of 33792 32
  • Number of Block RAMs 40 out of 144
    27
  • Total equivalent gate count 2,767,231
  • Pipelined 1 TSP Implementation
  • Number of SLICES 10957 out of 33792 32
  • Number of Block RAMs 40 out of 144
    27
  • Total equivalent gate count 2,770,741

30
Resource Requirements
  • Pipelined 2 TSP Implementation
  • Number of SLICES 13738 out of 33792 40
  • Number of Block RAMs 45 out of 144
    31
  • Total equivalent gate count 3,149,966
  • Pipelined 4 TSP Implementation
  • Number of SLICES 19685 out of 33792 58
  • Number of Block RAMs 55 out of 144
    38
  • Total equivalent gate count 3,908,362
  • Pipelined 6 TSP Implementation
  • Number of SLICES 25728 out of 33792 76
  • Number of Block RAMs 65 out of 144
    45
  • Total equivalent gate count 4,664,262

31
Problems Encountered
  • Synthesis Time Issues
  • (within Viva and within Xilinx)
  • Maturity/Robustness of CAD Tools
  • Learning Curve
  • Timing Issues
  • I/O Pin Limitations

32
Summary Conclusion
  • A simple genetic algorithm was implemented on
    reconfigurable hardware using the Viva paradigm
  • Significant but not spectacular speedups have
    been obtained for the TSP using a combination of
    temporal and spatial parallel processing methods
  • Many other opportunities exist to improve
    processing through put
  • The concept of isolated subpopulations is very
    promising method to further improve performance
Write a Comment
User Comments (0)
About PowerShow.com