Single Event Upset - PowerPoint PPT Presentation

About This Presentation
Title:

Single Event Upset

Description:

... circuit designer and test engineer to have the basic knowledge ... Certain behaviors in the state of the art electronic circuits caused by random factors. ... – PowerPoint PPT presentation

Number of Views:704
Avg rating:3.0/5.0
Slides: 23
Provided by: engAu
Category:
Tags: event | single | upset

less

Transcript and Presenter's Notes

Title: Single Event Upset


1
Single Event Upset An Embedded Tutorial
Fan Wang Vishwani D. Agrawal
Department of Electrical and Computer
Engineering Auburn University, AL 36849 USA 21th
International Conf. on VLSI Design, Hyderabad,
India, January 4-8, 2008
2
Motivation for This Work
  • With the continuous downscaling of CMOS
    technologies, the device reliability has become a
    major bottleneck.
  • The sensitivity of electronic systems can
    potentially become a major cause of soft
    (non-permanent) failures.
  • It is necessary for both circuit designer and
    test engineer to have the basic knowledge of soft
    errors caused by the basic radiation mechanisms,
    and the soft error mitigation techniques.

3
Outline
  • Introduction to Soft Errors
  • What is Soft Error?
  • Historical notes
  • Basic radiation mechanisms in silicon
  • Soft error resilience techniques
  • A case study
  • Conclusion

4
Introduction to SEU
  • Certain behaviors in the state of the art
    electronic circuits caused by random factors.
  • Single event upset (SEU) is non-permanent,
    non-functional error.
  • Definition from NASA Thesaurus
  • Single Event Upset (SEU) Radiation-induced
    errors in microelectronic circuits caused when
    charged particles (usually from the radiation
    belts or from cosmic rays) lose energy by
    ionizing the medium through which they pass,
    leaving behind a wake of electron-hole pairs.

5
What is Soft Error
  • A fault is the cause of errors.
  • A non-permanent fault is a non-destructive fault
    and falls into two categories
  • Transient faults, caused by environmental
    conditions like temperature, humidity, pressure,
    voltage, power supply, vibrations, fluctuations,
    electromagnetic interference, ground loops,
    cosmic rays and alpha particles.
  • Intermittent faults caused by non-environmental
    conditions like loose connections, aging
    components, critical timing, resistive or
    capacitive variations and noise in the system.
  • With advances in manufacturing, soft error
    caused by cosmic rays and alpha particles are
    dominant causes of failures in electronic systems.

6
Historical Notes
  • In the period 1954 through 1957 failures in
    digital electronics were reported during the
    above-ground nuclear bomb tests.
  • In 1962, Wallmark and Marcus predicted that
    cosmic rays would start upsetting microcircuits
    due to heavy ionized particle strikes when
    feature sizes become small enough.
  • In 1970s and early 1980s, the effects of
    radiation received attention and more researchers
    examined the physics of these phenomena. Same as
    the fault tolerant computing theory.
  • In 1978, May and Woods of Intel Corporation
    determined that these errors were caused by the
    alpha particles emitted in the radioactive decay
    of uranium and thorium present just in few
    parts-per-million levels in package materials.
  • In 1979, Guenzer and Wolicki reported that the
    error causing particles came not only from
    uranium and thorium but that nuclear reactions
    generated high energy neutrons and protons. The
    term SEU has been in use since this paper.
  • In 1979, Ziegler and Lanford from IBM predicted
    that cosmic rays could result in the same upset
    phenomenon in electronics (not only memories)
    even at sea level.

7
Soft Error Rate of Specific Applications
  • Figure of Merit
  • Fail In Time (FIT)
    2. MTTF (Mean Time To Failure)
  • The number of failures per 109 device hours.
    1 year MTTF 109/(24365) FIT 114,155 FIT
  • SER of contemporary commercial chips is
    controlled to within 1001000 FITs!!!
  • Most hard failure mechanisms produce error rate
    on the order of 1100 FIT
  • Programmable Logic SER is almost 100 times larger
    than combinational logic
  • Soft Error Rate for SRAM-Based FPGAs
  • Smaller design rule and lower supply voltages
  • Used radiation chamber to calculate SEU frequency
    at altitude of 10km at 60N (Sweden)

FPGA XC4010E XC4010XL
Process 0.60um 0.35um
Vcc 5v 3.3v
1 SEU every 1106 hours 2.8105 hours
Projecting this for 3 design rule shrinks and 2
voltage reductions we get 1 SEU every 28.2 hrs
M. Ohlsson, P. Dyreklev, K. Johansson and P.
Alfke, Neutron Single Event Upsets in SRAM-Based
FPGAs, proc. 1998 IEEE Nuclear Space Radiation
Effects Conference Chuck Stroud, FPGA
Architectures and Operation for Tolerating SEUs,
Electrical Engineering VLSI design and test
seminar, Spring 2007, Auburn University.
8
Example SRAM-Based FPGA System
Table cont.
1. Example (1) is tested at Denver, using
SpaceRad 4.5 (a software radiation effects
prediction software program). Source Actel. 2.
All systems are without any protection.
9
Radiation Mechanisms for Silicon (1)
  • Alpha particles are emitted when the nucleus of
    an unstable isotope decays to a lower energy
    state.
  • (dominant soft error cause for DRAM in
    1970s)
  • Uranium and thorium have the highest activity
    among naturally occurring radioactive materials.
  • In the terrestrial environment, major sources of
    radioactive impurities are lead-based isotopes in
    solder bumps of the flip-chip technology, gold
    used for the bond wires and lid plating, aluminum
    in ceramic packages, lead-frame alloys and
    interconnect metalization.

With carefully selected materials, this
mechanism effect can be greatly reduced.
10
Radiation Mechanisms for Silicon (2)
  • High-energy ( gt 1 MeV) neutrons from cosmic
    radiation induces soft errors in semiconductor
    devices via secondary ions produced by the
    neutron reaction with silicon nuclei.
  • Cosmic rays which are of galactic origin react
    with the Earths atmosphere to produce complex
    cascades of secondary particles.
  • Neutrons are the most likely cosmic radiation
    sources to cause SEU in deep-submicron
    semiconductors at terrestrial altitude. The
    neutron flux is dependent on the altitude above
    sea level, the density of the neutron flux
    increases with altitude

MeV Million Electron Volts
Nowadays, Neutron is the major cause among all
fail mechanisms.
11
Radiation Mechanisms for Silicon (3)
  • The secondary radiation induced from the
    interaction of cosmic ray neutrons and boron is
    the third significant source of ionizing
    particles in electronic systems.
  • Low-energy cosmic neutron interactions with the
    isotope boron-10 (10B). 10B is commonly used as
    p-type dopant for junction formation IC package.

Baumann et al, IEEE Trans. Device and Materials
Reliability, vol. 1, no. 1, pp. 1722, 2001.
This mechanism can be greatly reduced or
eliminated by removing source of 10B
12
Single Event Transient (SET)
  • SET is caused by the generation of charge due to
    a high-energy particle passing through a
    sensitive node.
  • Each SET has its unique characteristics like
    polarity, waveform, amplitude, duration, etc.
    depend on particle impact location, particle
    energy, device technology, device supply voltage
    and output load.
  • The off transistors struck by a heavy ion with
    high enough LET in the junction area are most
    sensitive to SEU.
  • Specifically, the channel region of the off-NMOS
    transistor and the drain region of the off-PMOS
    transistor.

Linear Energy Transfer is a measure of the
energy transferred to the device per unit length
as an ionizing particle travels through a
material.
13
More Details of SET Generation
  • (a) Along the path traverses, the particle
    produces a dense radial distribution of
    electron-hole pairs.
  • (b) Outside the depletion region the
    non-equilibrium charge distribution induces a
    temporary funnel-shaped potential distortion
    along the trajectory of the event (drift
    component).
  • (c) Funnel collapses, diffusion component then
    dominates the collection process until all excess
    carriers have been collected, recombined, or
    diffused away from the junction area.
  • (d) Current vs. Time to illustrate the charge
    collection and SET generation.

14
Analytical Model of SET
  • The time constants depend strongly on the type of
    ion, its initial energy and the properties of the
    specific technology.
  • Approximate analytical model for ion track charge
    collection is a double-exponential form. It gives
    an induced current with a rapid rise time but a
    more gradual fall time

Typical values are approximately 1.64 x 10-10sec
for and 5.10x10-11sec for .
Experimental Results from NASA JPL
15
SET in CMOS Inverter
For example, in ami12 technology, when the
output load capacitance is 100fF and the
cumulative collected charge is 0.65pC, the
amplitude of the voltage pulse is 0.65pC/100fF
0.65 x10-12C/100 x10-15F 0.65V .
16
Soft Error Mitigation Techniques
  • The soft error tolerant techniques can be
    classified into two types recovery and
    prevention.
  • Recovery Recovery error after it does occur.
  • Include on-line recovery mechanisms, fault
    tolerant computing, ECC/parity check, redundancy
    etc.
  • Prevention The methods to protect microchips
    from soft-errors before it occurs.
  • The need for a recovery mechanism stems from the
    fact that prevention techniques may not be enough
    for contemporary microchips.
  • Soft error is not the only reason why computer
    systems need to resort to a recovery procedure.
    Random errors due to noise, unreliable
    components, and coupling effects may also require
    the recovery mechanism.

17
Some Mitigation Techniques
  • Prevention Techniques
  • Purify the Fabrication Material
  • Uranium and thorium impurities have been reduced
    below one hundred parts per trillion for high
    reliability.
  • To eliminate 10B, alternative insulators that
    dont contain boron are used.
  • Radiation Hardened Process Technologies
  • SER performance can be greatly improved by
    adapting the process technology either to reduce
    the collected charge or increase the critical
    charge.
  • Specific methods use additional well isolation
    replace bulk silicon with SOI.

10x reduction in SER achieved over conventional
bulk devices when a fully depleted SOI substrate
is used. But SOI is more expensive and parasitic
bipolar action limit further reduction of SER.
18
Picked Mitigation Techniques
  • Recovery Techniques
  • Redundancy
  • To gain higher system reliability by sacrificing
    the minimality of time or space or both.
  • Classic design Triple Modular Redundancy (TMR)
    with majority voter
  • New design time redundancy based on
    C-element gate to compare two samples of
    combinational primary outputs at t0 and t0d.
  • Error Detection and Correction Code (EDAC)
  • Simple solution for memory add a parity bit to
    each memory word.
  • In most situations, it must be combined with a
    system-level approach for error recovery.

S. Mitra, Z. Ming, S. Waqas, N. Seifert, B.
Gill, and K. S. Kim, Combinational Logic Soft
Error Correction, in Proc. International Test
Conference, 2006, pp. 19.
19
A Case Study IBM eServer z990 System
  • z990 configuration
  • z990 contains 4 pluggable nodes connected through
    a planar board.
  • Each node contains up to 64 GB physical memory
    and 32 MB L2 cache for a system capacity of 256
    GB memory and 126 MB L2 cache.
  • Error tolerance techniques used
  • Extensive use of ECC and parity with retry on
    data and controls
  • Full SRAM ECC and parity protection
  • Microprocessor mirroring

20
Conclusion
  • SER in logic and memory chips will continue to
    increase as devices become more sensitive to soft
    errors at sea level
  • Open soft error issues
  • How EDA tools handle soft error hardening?
  • Analysis of radiation mechanisms (too complex to
    be comprehensive)
  • Soft error rate analysis for logics
  • Error mitigation methods

21
Useful References and Further Readings
  • Single Event Phenomena, (Messenger and Ash,
    1993)
  • Ionizing Radiation Effects in MOS Devices and
    Circuits, (Ma and Dressendorfer, 1989)
  • Handbook of Radiation Effects, (A.
    Holmes-Siedle and L. Adams,1993)
  • Fault-Tolerance Techniques for SRAM-Based
    FPGAs, (Kastensmidt, Fernanda Lima, Carro,
    Luigi, Reis, Ricardo, 2006)
  • Test methods and standard JEDEC89, JEDEC89A,
    JEDEC89-2
  • Journals IEEE Trans on Nuclear Science, IEEE
    Trans Reliability
  • NASA Goddards test group
  • http//radhome.gsfc.nasa.gov/radhome
    /papers/seeca5.htm
  • NASA Space Environment and Effects Program
  • http//see.msfc.nasa.gov/

22
Thank You . . .
Write a Comment
User Comments (0)
About PowerShow.com