A cost/benefit framework for evaluating re-configurable FPGA SEU mitigation techniques - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

A cost/benefit framework for evaluating re-configurable FPGA SEU mitigation techniques

Description:

Title: Sample Title Slide Presentation Title Here Subject: Overhead template Rev.1 01/31/02 Author: Scott Vandenberg Last modified by: rk Created Date – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 17
Provided by: ScottVan4
Learn more at: http://klabs.org
Category:

less

Transcript and Presenter's Notes

Title: A cost/benefit framework for evaluating re-configurable FPGA SEU mitigation techniques


1
A cost/benefit framework for evaluating
re-configurable FPGA SEU mitigation
techniques
  • Carl Carmichael
  • Brendan Bridgford
  • Xilinx, Inc.

2
Introduction
  • Many designers assume that re-programmable FPGAs
    for space applications require
  • TMR
  • SEU correction (configuration scrubbing)
  • in all cases.
  • Mitigation can be costly you should ask
  • What is the reliability requirement?
  • What is the expected MTBF with no mitigation?
  • What is the expected MTBF with only scrubbing?
  • What is the expected MTBF with only XTMR?
  • What is the expected MTBF with the combination of
    both techniques?
  • Answers will vary with different mission
    characteristics.

3
Typical SEE Mitigation
  • Typical SEE Mitigation XTMR Scrubbing
  • XTMR and scrubbing are often used together
  • XTMR (Xilinx TMR) protects the design from any
    one upset
  • Configuration scrubbing prevents SEUs
    accumulating
  • When XTMR and scrubbing are used together, the
    overall functional failure rate is dominated by
    the SEFI rate
  • V2 SEFI GEO 60 years MTBF 1, continuous
    operation
  • SEFI rate is independent of device size

4
The Unstated Assumptions
  • Five basic assumptions lie behind the typical
    prescription of XTMR scrubbing
  • (1) No functional errors can be tolerated
  • (2) All functional errors are persistent
  • (3)The FPGA must operate continuously for an
    extended period of time
  • (4) A high upset rate can be expected
  • (5) Design goal is to be as reliable as
    possible

5
Assumption 1 No Functional Errors can be
Tolerated
  • Not always true Many systems can tolerate some
    errors.
  • Error correction may be built into the data.
  • The consequence of an error may be minor (a pixel
    is inverted, for example).

6
Assumption 2 All Functional Errors are
Persistent
  • Not always true Some structures do not
    experience persistent errors after a single-event
    upset 2.
  • Persistent errors cannot be cleared by scrubbing
    alone.
  • Example LFSRs, counters, other state logic
  • A SEU can cause these structures to go out to
    lunch, they must be scrubbed then reset. XTMR
    prevents these errors.
  • Non-persistent errors can be cleared by scrubbing
    alone.
  • Example multipliers, any feed-through logic
  • SEU can cause a few incorrect calculations,
    although scrubbing will restore operation of the
    circuit.

7
Assumption 3 Continuous, extended operation
required
  • Not always true Many systems only operate for
    minutes or hours at a time.
  • Polar, other orbits may only require brief
    periods of operation.
  • An unmitigated design that operates for only a
    few minutes at a time can have a very high MTBF.
  • A design with XTMR that operates for a few hours
    can have a very high MTBF, even without
    scrubbing.

8
Assumption 4 A High Upset Rate can be Expected
  • Not always true upset rates vary widely by orbit
  • 2V6000 SEU rate, GEO 6 SEUs/hour 1
  • 36km GEO, worst-case solar flare 300 SEUs/hr
    3
  • Of these, fewer than 1 in 10 will affect the
    design 4
  • Note configuration scrubbing can keep up with
    even worst-case SEU rates 3.
  • At lower upset rates, high MTBF can be achieved
    with less mitigation.

9
Assumption 5 Design goal is as reliable as
possible
  • As reliable as possible is not a reliability
    target!
  • This implies that no cost is too great for a
    marginal improvement in reliability.
  • A quantifiable reliability target is needed
  • A reliability target must be set for SEU-induced
    functional failures, else there is no way to
    evaluate different technologies and mitigation
    techniques.

10
XTMR Scrubbing costs and benefits
  • XTMR
  • Benefit Prevents persistent and non-persistent
    functional errors due to any SEU/SET.
  • Costs 3.5x increase in logic, pin utilization.
    Reduced timing performance.
  • Scrubbing
  • Benefit Prevents SEUs from accumulating. Clears
    non-persistent errors.
  • Costs Increased system complexity. Cannot use
    SRL16s or DistRAM (increases logic utilization).

11
Mitigation Alternatives costs and benefits
  • Alternatives to XTMR EDAC, periodic reset
  • Benefit prevents persistent functional errors
    from any SEU/SET.
  • Cost Must be able to tolerate non-persistent
    errors.
  • Alternative to Scrubbing Periodic full
    reconfiguration
  • Benefits Prevents SEUs from accumulating.
    Simpler than scrubbing. Does not preclude the
    use of SRL16s or DistRAM.
  • Costs Design is interrupted during
    reconfiguration.

12
Mission Characteristics
  • Can your design tolerate some functional errors?
  • If yes, how much time is available to recover
    operation? You may not need XTMR.
  • Does your design contain feedback structures?
  • If no, SEUs/SETs will not cause persistent
    errors. XTMR may not be required.
  • Does your design need to operate continuously?
  • If no, you may not need scrubbing or XTMR.
  • What is the expected SEU rate?
  • On the order of seconds? Minutes? Hours?
  • Lower upset rates mean less need for scrubbing
    and XTMR to achieve same reliability.
  • What is the MTBF requirement for functional
    errors?
  • Factors operating duration, SEU rate, error
    persistence, EDAC.

13
Mitigation Matrix
Data Criticality
Low
High
Error Persistence
Yes
No
XTMR
No Mitigation
Minutes
Red-undant devices
Scrubbing XTMR
Scrubbing
Days
Operating Window
Months
Continuous
14
Pick the Right Mitigation for the Job
  • Examine your assumptions critically a less
    costly implementation may be possible.
  • Lower upset rates gt less mitigation required
  • Error tolerance gt may not need XTMR
  • Error persistence gt partial or no XTMR
  • Short operating window gt may not need scrubbing
  • Your reliability target might be achievable
    without the combination of XTMR scrubbing.

15
Conclusion
  • Scrubbing XTMR is not the only solution!
  • There is a range of mitigation options to
    consider, including no mitigation at all.
  • The only way to know what you need is to have a
    quantifiable reliability target.
  • Use the simplest and least expensive option that
    will meet your needs.

16
References
  • 1 Swift, Gary. Tradeoffs in Flight-Design
    Upset Mitigation in State-of-the-Art FPGAs
    Hardened By Design vs. Design-Level Hardening.
    MAPLD 2004, submission C144.
  • 2 Johnson et al. Persistent Errors in
    SRAM-based FPGAs. MAPLD 2004, submission C135.
  • 3 Carmichael et al. A Triple Module
    Redundancy Scheme for Static Latch-Based FPGAs.
    MAPLD 2004, submission P189.
  • 4 Fabula et al. The NSEU Sensitivity of
    Static Latch Based FPGAs. MAPLD 2004, submission
    P139.
  • 5 Pate-Cornell, Uncertanties in risk
    analysis. Reliability Engineering and System
    Safety vol. 54 (1996) pp95-111.
Write a Comment
User Comments (0)
About PowerShow.com