Single Event Upset SEU Mitigating Techniques in a Space Radiation Environment for the FPGA based Ite - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Single Event Upset SEU Mitigating Techniques in a Space Radiation Environment for the FPGA based Ite

Description:

Single Event Upset (SEU) Mitigating Techniques in a Space Radiation Environment ... FPGAs are being used in space applications because of: Low cost over ASICs ... – PowerPoint PPT presentation

Number of Views:306
Avg rating:3.0/5.0
Slides: 52
Provided by: ece164
Category:

less

Transcript and Presenter's Notes

Title: Single Event Upset SEU Mitigating Techniques in a Space Radiation Environment for the FPGA based Ite


1
Single Event Upset (SEU) Mitigating Techniques in
a Space Radiation Environment for the FPGA based
Iterative Repair Processor
  • Group Presentation (11/30/2007)
  • Jeffrey M. Carver

2
Outline
  • Introduction
  • Background
  • Fault Tolerant Techniques
  • Configuration Frames
  • DMRH and Fan-out design
  • Iterative Repair Processor Fault Protected
  • SEU Simulator
  • Current Results
  • Conclusions and Program of Study
  • Publications

3
Outline
  • Introduction
  • Background
  • Fault Tolerant Techniques
  • Configuration Frames
  • DMRH and Fan-out design
  • Iterative Repair Processor Fault Protected
  • SEU Simulator
  • Current Results
  • Conclusions and Program of Study
  • Publications

4
Space Applications
  • FPGAs are being used in space applications
    because of
  • Low cost over ASICs
  • Reconfigurable ability
  • Can be optimized for a specific application
  • Problems that occur in space
  • Single Event Upsets (SEUs) occur when a memory
    cell changes values because of the radiation in
    the environment.
  • Radiation also plagues combinational logic by
    causing a temporary glitch that has been measured
    lasting from .3ns to 1.3ns.
  • For FPGAs this means that fault tolerant
    techniques need to be applied to protect the
    storage memory, configuration memory, and
    combinational logic on an FPGA.

5
Research Goal
  • To find and apply fault tolerant techniques for
    a system designed for space applications
    (Iterative Repair Processor).
  • Once the fault techniques to apply have been
    identified, an SEU Simulator for testing the
    robustness of the technique will be developed and
    used. The techniques will then be applied and
    tested.

6
Outline
  • Introduction
  • Background
  • Fault Tolerant Techniques
  • Configuration Frames
  • DMRH and Fan-out design
  • Iterative Repair Processor Fault Protected
  • SEU Simulator
  • Current Results
  • Conclusions and Program of Study
  • Publications

7
Triple Modular Redundancy (TMR)
  • Is triplication of the module
  • with a voting circuit to vote on
  • the correct output of the device.
  • Variants of this concept are used.
  • Analog component to use for voting circuit
  • Using 2-3 voting circuits
  • with tri-state buffer.
  • TMR in time

8
Hamming Codes
  • Hamming code is to
  • insert check bits
  • throughout the word.
  • Improved Hamming Code can require an extra check
    bit, but it appends check bits onto the end of
    the word.
  • Both can correct a single error in a word.
  • Hamming Relationship check bits required

9
TMR vs. Hamming
  • TMR
  • Requires at least a 200 percent increase in
    space.
  • It is good for small memory and state machines.
  • Hamming Codes
  • Good for large memories.
  • Requires check bits, Hamming Encoder, and Hamming
    Decoder.
  • Seen to increase timing delay over TMR.

10
DWC-CED
  • Double Redundancy with Comparison combined with
    Concurrent Error Detection (DWC-CED)
  • Two modules perform the same operation and their
    output is compared. (savings of area)
  • If the outputs do not match then it takes one
    more clock cycle to run the concurrent error
    detection method that finds which module is
    correct.
  • Problem is finding a test that detects all
    possible errors that can occur in a module.

11
Other Techniques
  • Other techniques for SEUs and even Multiple Event
    Upsets (MEUs) in memory.
  • Cross Parity
  • Reed-Muller
  • Reed Solomon
  • Reed Solomon with Hamming Codes
  • Problem is the resource requirement to pull off
    these techniques.

12
Outline
  • Introduction
  • Background
  • Fault Tolerant Techniques
  • Configuration Frames
  • DMRH and Fan-out design
  • Iterative Repair Processor Fault Protected
  • SEU Simulator
  • Current Results
  • Conclusions and Program of Study
  • Publications

13
Configuration Frames
  • 1 bit wide
  • Span an HCLK Row
  • 16 CLBs in Height
  • Size is 41 32-bit words
  • Block Types
  • CLBs/CLKs/DSPs/IOBs
  • BRAM Interconnect
  • BRAM Contents
  • Multiple minor frames per major column

14
Major Frames Numbering
  • Starts from 0 on the left and increases as going
    to the right
  • SX35 Example
  • CLBs/CLKs/DSPs/IOBs
  • CLBs 1-6, 8-15, 17-30, 32-39, 41-46
  • CLKs 24
  • DSP 7, 16, 31, 40
  • IOBs 0, 23, 47
  • BRAM Interconnect 0-7
  • BRAM Content 0-7

15
Minor Frames per Major Frame
  • There are multiple minor frames per major frame.
    The number of minor frames depends on the type of
    major frame writing to.
  • Information for total minor frames per column
    type is from file xhwicap_i.h.
  • CLBs 22 total minor frames
  • DSPs 21 total minor frames
  • IOBs 30 total minor frames
  • CLKs 3 total minor frames
  • BRAM Interconnect 20 total minor frames
  • BRAM Content 64 total minor frames
  • Numbering is from 0 to totalMinorFrames-1

16
Frame Layout
  • Size is 41 32-bit words (1312 bits total)
  • Frames in the bottom half are mirror images in
    the top half with the exception of the vertical
    HCLK rows that contain the global and regional
    clocks. (ug071.pdf Xilinx)
  • Top Half 1311 to 0
  • (word 40 to word 0)
  • Bottom Half 0 to 1311
  • (word 0 to 40)

17
Fault Correction Techniques
  • Techniques for repairing faults in the
    configuration frames of the FPGA
  • Scrubbing Just reload the configuration data
    from a device like an SEU-immune EEPROM.
  • Error Checking and Correcting (ECC) frames
  • Embed Hamming Codes inside the configuration
    frame
  • Available in the Virtex-4 devices
  • In order for these to be used, a device must not
    use resources that use the configuration frames
    for memory (ex. Shift Registers).

18
Outline
  • Introduction
  • Background
  • Fault Tolerant Techniques
  • Configuration Frames
  • DMRH and Fan-out design
  • Iterative Repair Processor Fault Protected
  • SEU Simulator
  • Current Results
  • Conclusions and Program of Study
  • Publications

19
DMRH
  • Double Modular Redundancy with Hold
  • When disagreement, send
  • signal to ICAP Controller
  • that will scan/fix-up errors
  • in areas of modules.
  • Disagreement signal also
  • sent to controller to pause
  • at the current iteration.
  • If transient error, it will
  • disappear in 1 clock cycle
  • Best for combinational logic and parallel designs
  • Problem is the delay of time to fix-up frame(s)

20
Fan-out design
  • Used in some of the multiplexers in the design.
  • Can tolerate a SEU in the LUTs
  • or 1 of lines after it is fanned out
  • to the slices.
  • The words being selected are
  • Hamming Code protected.
  • Reduces the need for redundancy
  • Problem is an upset that occurs
  • before the line is fanned out to
  • the different slices.

21
Outline
  • Introduction
  • Background
  • Fault Tolerant Techniques
  • Configuration Frames
  • DMRH and Fan-out design
  • Iterative Repair Processor Fault Protected
  • SEU Simulator
  • Current Results
  • Conclusions and Program of Study
  • Publications

22
Iterative Repair (IR) Processor Design
Testing Circuitry (Measures Change Of Behavior)
Timer avoids timeout if circuit can not complete
anymore due to SEU. HWICAP used to read and
write configuration frames.
23
Copy Processor
BEFORE
AFTER
24
Alter Processor
BEFORE
AFTER
25
Evaluate Process
  • Is comprised of three sub-processors
  • Dependency Graph Violation
  • Total Schedule Length
  • Resource Over-utilization

26
Dependency Graph Violation Sub-Processor
BEFORE
AFTER
27
Total Schedule Length Sub-Processor
BEFORE
AFTER
28
Resource Over-utilization Sub-Processor
  • Longest Stage thus only TMR so it wont increase
    delay seen in DMRHmight change in future design.
  • Measured it taking 18us
  • to write a frame
  • Measured it taking 30us
  • to read and write a frame
  • Max Latency of IR
  • Processor iteration is 235
  • clock cycles or 2.35us if10ns clock period.

29
Accept Processor
BEFORE
AFTER
30
Adjust Temperature Processor
BEFORE
AFTER
31
Outline
  • Introduction
  • Background
  • Fault Tolerant Techniques
  • Configuration Frames
  • DMRH and Fan-out design
  • Iterative Repair Processor Fault Protected
  • SEU Simulator
  • Current Results
  • Conclusions and Program of Study
  • Publications

32
BYU SEU Simulator
  • Requires 3 Virtex 1000 FPGAs
  • Does not directly corrupt flip-flops
  • Corrupts bits in bitstream

Sensitive Bits
FPGA Editor
33
Xilinx SEU Simulator (xapp714)
  • Requires 1 Virtex-4 FPGA
  • Does not directly corrupt flip-flops
  • Can not see what frame address and configuration
    bit is being corrupted. (Is stated to start from
    first bit in configuration memory)
  • Clunky interface to use for simulating SEUs
  • Uses embedded ECC frames
  • Corrupts every configuration frame on the board.
    Unknown how/if it actually corrupts BRAM
    Interconnect and Content frames.

34
USU SEU Simulator (Tool Flow)
  • Requires Xilinx tools ISE, EDK, PlanAhead, and
    TMRTool.
  • TMRTool removes the shift registers in the
    design.
  • Plan Ahead is used to map design to be tested in
    separate configuration frames from simulator
    circuit.

35
USU SEU Simulator
  • Uses 1 FPGA (Tester circuit and design to test on
    same circuit)
  • Corrupts all bits in
  • configuration frames in the
  • design to test area.
  • Tests corrupting FFs
  • 3 Techniques
  • GCAPTURE/GRESTORE
  • Intermediate Corruption
  • Stuck-At Tests

36
Flip-Flop Architecture
  • FFs share all lines
  • except D (Data) input,
  • and XQ/YQ output
  • SRINV mux controls
  • reset line given to FFs
  • SRMODE configuration
  • bit determines what FF
  • is set to on reset.
  • INIT bit is value of FF
  • when bitstream first loaded onto FPGA

37
GCAPTURE/GRESTORE Method
  • GCAPTURE loads the INIT bits of all FFs and
    Input/Output Buffer (IOB) registers with the
    current value of the register
  • GRESTORE sets all registers to their INIT bit
    values.
  • Put device into a paused state (where FFs are not
    changing, SR input to FFs low, and clock signal
    still active).
  • Then do a GCAPTURE, change INIT bit in desired
    FF. Follow with GRESTORE.

38
Intermediate Corruption Method
  • Put device into a paused state.
  • Issue a GCAPTURE command
  • Based on the INIT bits, set the SRMODE of the 2
    FFs in the slice.
  • Set the FF to change to set on reset to the
    opposite value it is at.
  • Set the other FF to reset to its current value
  • Change the SRINV multiplexer to select the other
    value. (This causes reset of FFs)
  • Fix-up the SRINV multiplexer, SRMODE bits.
  • Device can then be resumed.

39
Stuck-At Method
  • Device can be in a paused state.
  • In this method FFs are configured to be stuck at
    a desired value during operation of device.
  • Configure SRMODE bits to the desired value to be
    stuck at. Possible combos 00, 01, 10, 11
  • Change SRINV mux to select opposite line.
  • After device run, fix-up changes done.
  • Best if device never resets FFs during operation.
  • Helps reveal SEU sensitivity of specific FFs on
    any clock cycles.

40
Outline
  • Introduction
  • Background
  • Fault Tolerant Techniques
  • Configuration Frames
  • DMRH and Fan-out design
  • Iterative Repair Processor Fault Protected
  • SEU Simulator
  • Current Results
  • Conclusions and Program of Study
  • Publications

41
Designed Mapped from PlanAhead
Placement from PlanAhead
Resources Mapped
42
Bit Markup of Sensitive Resources
Placement from PlanAhead
  • Does not specify what resources have SEU
    sensitivity. It just gives a general idea.

Bit Markup
43
Map of Sensitive Resources
Placement from PlanAhead
Map of Sensitive Resources in Slice
Key of Resources
44
CLBs Tested
  • From testing every configuration bit in the
    frames that made up the CLBs, we found
  • 108395 bits out of 2193664 (4.9) caused a change
    of behavior in the IR Processor
  • When flying a satellite around the Earth some
    have observed around 1000 SEUs a day.
  • This means around 42 SEUs an hour
  • Of which 2 SEUs on average are problems
  • So if timing can be delayed on average every 30
    minutes, it can be beneficial to use DMRH to
    reduce power and area requirements.

45
DSPs, BRAMs Tested
  • Show images displaying the Bit Markup
  • CLBs Green
  • DSPs Purple
  • BRAM Interconnect Blue
  • BRAM Content Red (Intermediate Corruption
    Testing)
  • 127668 bits out of 4067200 (3.1) caused a change
    of behavior in the IR Processor
  • When trying to change BRAM content, the changes
    will not be accepted if writing a 1 to these
    bits offsets (ordering is word 0 to 40)
  • Top 136, 456, 808, 1128
  • Bottom 184, 504, 856, 1176

46
Outline
  • Introduction
  • Background
  • Fault Tolerant Techniques
  • Configuration Frames
  • DMRH and Fan-out design
  • Iterative Repair Processor Fault Protected
  • SEU Simulator
  • Current Results
  • Conclusions and Program of Study
  • Publications

47
Conclusions
  • Simulator Tool status
  • Simulates SEUs in CLBs, FFs, DSPs, BRAM
    interconnects, and BRAM content.
  • Needs to have a method to reload entire device
    when a permanent change in pattern is detected.
  • Need to test full TMR design
  • Need to test proposed fault tolerant design
  • Have fault techniques automatically applied when
    IR Processor is being generated
  • Thesis defense in August?

48
Outline
  • Introduction
  • Background
  • Fault Tolerant Techniques
  • Configuration Frames
  • DMRH and Fan-out design
  • Iterative Repair Processor Fault Protected
  • SEU Simulator
  • Current Results
  • Conclusions and Program of Study
  • Publications

49
Publications
  • Journal Articles under review
  • IET Transactions on Computers and Digital
    Techniques
  • Phillips, J., Sudarsanam, A., Kallam, R., Carver,
    J., and Dasu, A., Methodology to Derive
    Polymorphic Soft-IP Cores for FPGAs

50
Publications
  • Conference Papers under review
  • DAC 2008
  • Carver, J., Phillips, J., and Dasu, A., Improved
    SEU Simulator for Virtex 4 FPGAs

51
Publications
  • Planned Journal Papers
  • IEEE Design Test of Computers or IEEE
    Transactions on Reliability
  • Carver, J., Phillips, J., and Dasu, A., SEU
    Mitigating Techniques for a FPGA based Iterative
    Repair Processor
Write a Comment
User Comments (0)
About PowerShow.com