Title: Evolvable Hardware Techniques for Autonomous Repair of FPGAs
1Evolvable Hardware Techniques for Autonomous
Repair of FPGAs
5 October 2003
Ronald F. DeMaraDepartment of Electrical and
Computer EngineeringUniversity of Central
Florida Jason D. Lohn, Gregory A. Larchev
Computational Sciences DivisionNASA Ames
Research Center
2 What is Evolvable Hardware???
Intelligent Search
Hardware Design
Combining two fields to enable complex
dynamic electronics applications
Bayesian
Amplifiers
Simulated Annealing
Filters
Genetic Algorithms
FPGAs
Nearest Neighbor
Antennas
Evolvable Hardware
Applications
- Automated Construction develop Electronic
Circuits by Intelligent Search - Applications support
Design, Optimization, or Failure Recovery
phases - Research Focus configuration of
Field Programmable Gate Arrays (FPGAs)
using Genetic Algorithms (GAs) with
applications to
Autonomous Repair of permanent faults
3Evolvable Hardware (EHW)
Biological Models of Genetic Representations
and Evolutionary
Principles
Conceptual Inspiration
Science
CACM
- Powerful technique for multi-objective
optimization problems - power consumption, weight, size, cost, speed, or
reliability - Faster design cycle can use to optimize or
repair human-generated designs - Excellent for difficult-to-design systems
- adaptive systems and dynamic devices in
unpredictable environments
4EHW in the Big Picture
Intelligent Search
Machine Intelligence Techniques
other sub-disciplines
Adaptive/Soft Computing
Evolutionary Computation
Fuzzy Systems
Neural Networks
Genetic Algorithms
Cellular Automata
Simulated Annealing
Application Domains
Numerical Optimization
Mechanical Design
Evolvable FPGAs
5Operational Flow of EHW Techniques
- 1. Objective for EHW procedure is specified
- realize a 8-bit adder circuit or program a
digital chip to perform a function
such as tone discrimination - Relative ranking called Fitness Function is
defined - 2. Population of alternative designs is created
- completely at random or seeded with hand designed
- 3. Genetic Algorithm invoked to evolve each
alternative - Fitness evaluated for alternatives using FPGA
- FPGA contains programmable logic and interconnect
resources to realize arbitrary number of circuits
- Genetic Operators used to increase fitness
- 4. Fitness Exit Criteria checked
- If max(fitness)ltthreshold then repeat Step 3
- 5. Best design represents desired hardware
configuration
FPGA Configuration
CIRCUIT OUTPUT
AND OR XOR NOR
Buses Muxes Pass Transistors
CIRCUIT INPUT
- FPGA final configuration implements the
circuit
PC
config
Example GA running on PC platform
configures a reprogrammable
Static RAM based FPGA
FPGA
results
6Genetic Algorithms (GAs)
- Mechanism coarsely modeled after neo-Darwinism
(natural selection genetics)
start
replacement
offspring
population of candidate solutions
evaluate fitness of individuals
Fitness function
mutation
crossover
selection of parents
parents
Goal reached
7Genetic Mechanisms
- Guided trial-and-error search techniques using
principles of Darwinian evolution - iterative selection, survival of the fittest
- genetic operators -- mutation, crossover,
- implementor must define fitness function
- GAs frequently use strings of 1s and 0s to
represent candidate solutions - if 100101 is better than 010001 it will have more
chance to breed and influence future population - GAs cast a net over entire solution space to
find regions of high fitness - Can invoke Elitism Operator (E1, E2 )
- guarantees monotonically increasing fitness of
best individual over all generations
8GA Success Stories
- Commercial Applications
- Nextel frequency allocation for cellular phone
networks -- 15M predicted savings in
NY market - Pratt Whitney turbine engine design ---
engineer 8 weeks
GA 2 days w/3x improvement - International Truck production scheduling
improved by 90 in 5 plants - NASA superior Jupiter trajectory optimization,
antennas, FPGAs - Koza 25 instances showing human-competitive
performance such as analog circuit design,
amplifiers, filters
9Representing Candidate Solutions
- Representation of an individual can be using
discrete values (binary, integer, or any other
system with a discrete set of values) -
- Example of Binary DNA Encoding
Individual (Chromosome)
GENE
10Genetic Operators
t
t 1
selection
reproduction
11Crossover Operator
Population
offspring
12Mutation Operator
Boolean
Biology
Representation
1 1 1 1 1 1 1
before
z
mutated gene
13Visualizing GA Operation
Roadmap to animation on the next slide
14Visualizing GA Operation
current population
new population
2 parent individuals potentially undergo crossover
Individual is potentially mutated
15EHW Environments
- Evolvable Hardware (EHW) Environments enable
experimental methods to research soft
computing intelligent search techniques - EHW operates by repetitive reprogramming of
real-world physical devices using an iterative
refinement process
Extrinsic Evolution
Intrinsic Evolution
Application
Two modes of Evolvable Hardware
or
Genetic Algorithm
Genetic Algorithm
Stardust Satellite gt100 FPGAs onboard
hostile environment radiation, thermal
stress How to achieve reliability to avoid
mission failure???
Simulation in the loop
Hardware in the loop
Done? Build it
software model
new approach to Autonomous Repair of failed
devices
device design-time refinement
device run-time refinement
16Our Goal Autonomous FPGA Repair
An alternative to redundancy for increased
reliability without carrying spare hardware
- Redundancy
- increases with amount
- of spare capacity
-
- restricted at design-time
-
-
- based on time required to select spare
resource - determined by adequacy of spares available (?)
-
- yes
Repair independent of number
of viable spares variable at
recovery-time based on time required to find
suitable repair affected by multiple
characteristics ( or -) yes
everyday example
automobile spare tire
can of fix-a-flat
?
Overhead from Unutilized Spares weight, size,
power Granularity of Fault Coverage
resolution where fault handled
Fault-Resolution Latency availability or
downtime required to handle fault Quality
of Repair likelihood and completeness
Autonomous Operation fix without outside
intervention
?
?
?
?
?
17Autonomous Repair
new approach to Autonomous Repair of failed
reprogrammable devices
- UCF has developed an evolutionary fault-recovery
system for FPGAs - Employs a genetic representation that can
accommodate both logic and interconnect
failures - Experiments were run using Xilinx Virtex FPGA
- Demonstrate that a complete repair of some
combinational and sequential circuits is
realizable - Contribution of new evolutionary procedures for
repair and novel insights to fault occlusion,
resource recycling, and parameter optimization
18Related Work
- Evolutionary Design Techniques for FPGA
Fault-Tolerance - Evolve redundancy into design before the
anticipated failure occurs - Messy Gate Approach Miller 2001
- logic functions contain redundant terms as
functional boundaries change and overlap - Fault-tolerant Oscillator Design Canham and
Tyrrell 2002 - designs evolved under a range of faults during
fitness assessment - population-based approach with fitness function
corresponding to operation without faults - additional pass evaluates tolerance to a range of
faults - Design with Potentially Faulty Components
Thompson 1997 - evolution of designs with redundant capabilities
- range of fault cases introduced
- individuals able to exploit whatever component
behaviors exist, even faulty ones - Evolutionary Fault Recovery for FPGA Fault
Handling - Evolve recovery from a specific
failure after (and if) it actually occurs - Evolutionary Repair of 4x4 Multiplier Vigander
2001 - attempts to restore functionality after random
faults injected into FPGA CLBs - completely correct repair not achieved although
excellent partial repairs - voting mechanism proposed using alternative
partially repaired circuits
?
19Fault-Handling Techniques for SRAM-based FPGAs
Device Failure
Characteristics
Duration
Transient SEU
Permanent SEL, Oxide Breakdown, Electron
Migration
Device Configuration
Processing Datapath
Device Configuration
Processing Datapath
Target
BIST
Evolutionary
Repetitive Readback
Approach
TMR
STARS
CED
Vigander
UCF
Methods
Supplementary Testbench
Duplex Output Comparison
Duplex Output Comparison
Detection
(not addressed)
Cartesian Intersection
Isolation
(not addressed)
Bitwise Comparison
Majority Vote
unnecessary
Fast Run-time Location
Worst-case Clock Period Dilation
Diagnosis
unnecessary
unnecessary
Population-based GA using Extrinsic
Fitness Evaluation
Evolutionary Algorithm using Intrinsic
Fitness Evaluation
Recovery
Replicate in Spare Resource
Select Spare Resource
Invert Bit Value
Ignore Discrepancy
20Quadrature Decoder
- Applications requiring determination of angular
translation (or speed) - Example DC-motor to drive system for a mobile
robot we may wish to move forward (or reverse) by
a fixed distance - Decoder determines rotation direction
21Quadrature Decoder
22Genetic Representation
- Representation how we represent FPGA
configurations in the GA - Goals
- Allow all possible LUT configurations
- Allow all possible CLB interconnections given
constraints of routing support - Disallow illegal FPGA configurations
- Make it easy for crossover to combine good
configurations - Minimize non-coding introns (junk DNA)
- Bitstring representation is natural choice,
though may not scale well (investigating
generative reps) - Representation is specific to Xilinx Virtex FPGA
23Genetic Representation
- Logic bits in the LUTs
- Routing bits specify how to connect LUT outputs
to LUT inputs
LUT 0
LUT 2
? ? ?
LUT 1
LUT 3
CLB 0
CLB 1
CLB n
24Experimental Setup
- Software and Hardware Testbeds
- ECJ
- Xilinx JBits
- Xilinx Virtex DS simulator
- JBuilder Java SDK
- Evaluation
- Input stream of 100 bit pairs
- Output stream of 110 bits sampled across 4 CLBs
- Stuck-at-zero fault on CLB2 F1 slice 0
- Fitness percentage of correct output bits,
taking the max - across 100-bit sliding windows
- across CLBs
25FPGA with Fault Injected
26GA Parameters
- Generational GA
- Popsize 40
- Crossover 80
- Mutation up to 0.2 per bit
- Elitism 2 individuals
- Gen 0 Seeding 20 individuals seeding with
hand-designed Quad Decoder
27Temperature map of FPGAlogic cells during
evolution
HW Xilinx Virtex XCV1000 FPGA Ckt Quadrature
Decoder Exp 3
28Evolving a Complete Repair
elitist
average
Fitness
generation
29Results
- Genetic algorithm is able to consistently find
quad decoders operating at 100 accuracy with a
single injected stuck-at fault - Out of sample test yields 97 accuracy (expected
to rise as fitness test case length increases) - The stuck-at fault is used in the solutions found
(GA is exploiting the fault) - Most runs converge after 1500-2000 circuit
evaluations - Average population fitness increases until
convergence (useful search)
30Recent Publications
- Evolvable Hardware Technical Papers
- J D. Lohn, G. Larchev, and R. F. DeMara,
Evolutionary Fault Recovery in a Virtex FPGA
Using a Representation That Incorporates
Routing, In Proceedings of the 10th
Reconfigurable Architectures Workshop (RAW 2003),
Nice, France, April 22, 2003. - J. D. Lohn, G. Larchev, and R. F. DeMara, A
Genetic Representation for Evolutionary Fault
Recovery in Virtex FPGAs, In Proceedings of the
5th International Conference on Evolvable Systems
(ICES), Trondheim, Norway, March 17 - 20, 2003. - J. D. Lohn and R. F. DeMara, A Co-evolutionary
Genetic Algorithm for Autonomous Fault-Handling
in FPGAs, accepted to International Conference
on Military and Aerospace Programmable Logic
Devices, Laurel, MD, September 10 - 12, 2002. - Machine Learning (EHW subcomponent) Curriculum
and Educational - M. Georgiopoulos, J. Castro, A. Wu, R. DeMara,
E. Gelenbe, A. Gonzalez, M. Kysilka, M.
Mollaghasemi, CRCD in Machine Learning at the
University of Central Florida Preliminary
Experiences, In Proceedings of 8th Annual
Conference on Innovation and Technology in
Computer Science Education, University of
Macedonia, Thessaloniki, Greece, June 30 - July
2, 2003. - M. Georgiopoulos, I. Russell, J. Castro, A. Wu,
M. Kysilka, R. DeMara, A.Gonzalez, E. Gelenbe, M.
Mollaghasemi, A CRCD Experience Integrating
Machine Learning Concepts into Introductory
Engineering and Science Programming Courses, In
Proceedings of 2003 American Society for
Engineering Education (ASEE) Annual Conference
and Exposition, Nashville, Tennessee, June 22 -
25, 2003.
31 GA Advantages
- Widely applicable
- Low development costs (engineering ready)
- Creativity - surprising solutions
- Can be run interactively, accommodate user
proposed solutions - Provide many alternative solutions design time
fault tolerance - Abundant intrinsic parallelism
- Scales with Moores Law -)10x in 5
32Conclusion
- One of the first studies to look at evolving
interconnect for fault-recovery in FPGAs - Output results encouraging
- Current work
- Reducing execution time for autonomous recovery
- Scaling to complex problems
- Robustness of evolved solutions
- On-line experiments that can safeguard the FPGA
- Integrating Machine Learning EHW into UCF
curriculum - EHW in EEL4851, EEL4972, EEL6763
- Subpart of multi-year NSF CRCD Award
(Georgiopoulos, DeMara, Gelenbe, Gonzalez,
Kysilka, Wu)