Dealing with Multiple Simultaneous Faults in Future Technologies - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Dealing with Multiple Simultaneous Faults in Future Technologies

Description:

Carlos A. L. Lisb a Semana Acad mica PPGC/UFRGS 17/10/2006 4 ... Workshop - LATW 2006, pp. 151-156, IEEE Computer Society, New York, March 2006. ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 64
Provided by: infU4
Category:

less

Transcript and Presenter's Notes

Title: Dealing with Multiple Simultaneous Faults in Future Technologies


1
Dealing withMultiple Simultaneous Faultsin
Future Technologies
  • Doutorando Carlos Arthur Lang Lisbôa
  • Orientador Luigi Carro

2
Why Multiple Simultaneous Faults ?
  • Future technologies (2010 and beyond)
  • very small transistors and fewer electrons to
    form the channel (? SETs)
  • transient pulses due to radiation attack will
    last longer than the propagation delays of gates
    and cycle times
  • devices will be more sensitive to the effects of
    electromagnetic noise, neutrons and alpha
    particles

3
Single Event Upset Origin
1 0 1 0 0 0 0 1
0 1 0 1 1 1 1 0
1 1 0 1 1 1 1 0
4
Why Should One Study Multiple Faults ?
  • Changes in paradigm
  • Gates will behave statistically, producing
    correct outputs only a fraction of the time
  • Faster devices ? cycle times shorter than
    duration of transient pulses

5
How to Deal with Multiple Faults ?
  • New paradigm multiple simultaneous faults
  • new fault tolerance techniques will be required
    (TMR will no longer provide enough protection)

6
How to Deal with Multiple Faults ?
  • New paradigm multiple simultaneous faults
  • new fault tolerance techniques will be required
    (TMR will no longer provide enough protection)
  • How to deal with this problem ?
  • new materials and manufacturing technologies must
    be developed
  • OR
  • new design approaches must be taken

7
How to Deal with Multiple Faults ?
  • New paradigm multiple simultaneous faults
  • new fault tolerance techniques will be required
    (TMR will no longer provide enough protection)
  • How to deal with this problem ?
  • new design approaches must be taken (our bet !)

8
Research Evolution - Overview
SRC 2005 TechCon
DATE 06 PhD Forum
DFT 04 WDES 04
DFT 06
Research Report
Majority Logic
Research Report
Bit Stream Operators
Online Hardening
Stochastic Operators
TMR and Analog Voter
Statistical Computation
Low cost redundancy
IOLTS 04
VTS 07 (submitted)
ETS 05 SBCCI 05
MemProc
LATW 06 ETS 06
2004 2005
2006 2007
9
Published Papers
  • Lisbôa, C. and Carro, L., Arithmetic Operators
    Robust to Multiple Simultaneous Upsets, 10th
    IEEE International Online Test Symposium - IOLTS
    2004, IEEE Computer Society, Funchal, Madeira
    Island, Portugal, July 2004.
  • Lisbôa, C. and Carro, L., Highly Reliable
    Arithmetic Multipliers for Future Technologies,
    in Proceedings of the International Workshop on
    Dependable Embedded Systems - WDES 2004 - in
    conjunction with the 23rd International Symposium
    on Reliable Distributed Systems - SRDS 2004, pp.
    13-18. Edited by Becker, L. B. and Kaiser, J.,
    Florianópolis, October 17, 2004.
  • Lisbôa, C. and Carro, L., Arithmetic Operators
    Robust to Multiple Simultaneous Upsets, in
    Proceedings of the 19th IEEE International
    Symposium on Defect and Fault Tolerance in VLSI
    Systems - DFT 2004, pp. 289-297,
    ISBN0-7695-2241-6. IEEE Computer Society, New
    York, October 2004.

10
Published Papers
  • Lisbôa, C. A. L., Carro, L. and Cota, E., RobOps
    - Arithmetic Operators for Future Technologies,
    10th European Test Symposium - ETS 2005, Tallin,
    Estonia, May 2005.
  • Lisbôa, C. A. L., Schüler, E. and Carro, L.,
    Going Beyond TMR for Protection Against Multiple
    Faults, in Proceedings of the 18th Symposium on
    Integrated Circuits and Systems Design - SBCCI
    2005, September 2005.
  • Rhod, E. Lisbôa, C. A. L. and Carro, L., Using
    Memory to Cope with Simultaneous Transient
    Faults, in Proceedings of the 7th Latin-American
    Test Workshop - LATW 2006, pp. 151-156, IEEE
    Computer Society, New York, March 2006.

11
Published Papers
  • Rhod, E. Lisbôa, C. A. L. Michels, Á. and
    Carro, L., Fault Tolerance Against Multiple SEUs
    using Memory-Based Circuits to Improve the
    Architectural Vulnerability Factor, in Informal
    Digest of Papers of the 11th IEEE European Test
    Symposium - ETS 2006, pp. 229-234, IEEE Computer
    Society, New York, May 2006.
  • Michels, Á., Petroli, L., Lisbôa, C. A. L.,
    Kastensmidt, F. and Carro, L. SET Fault Tolerant
    Combinational Circuits Based on Majority Logic,
    in Proceedings of the 21st IEEE International
    Symposium on Defect and Fault Tolerance in VLSI
    Systems - DFT 2006, pp. 345-352, IEEE Computer
    Society, Los Alamitos, CA, October 2006.
  • Lisbôa, C. A. L., Carro, L., Sonza Reorda, M.,
    and Violante, M. Online Hardening of Programs
    against SEUs and SETs, in Proceedings of the
    21st IEEE International Symposium on Defect and
    Fault Tolerance in VLSI Systems - DFT 2006, pp.
    280-288, IEEE Computer Society, Los Alamitos, CA,
    October 2006.

12
Research Approaches - 2004 / 2005
  • Use of stochastic operators
  • Use of bit stream operators
  • Ensuring voter reliability to use n-MR while
    dealing with multiple simultaneous faults

13
Research Evolution - 2004 / 2005
IOLTS 2004
Stochastic Operators
14
Research Evolution - 2004 / 2005
IOLTS 2004
Stochastic Operators
OK for some DSP Applications
15
Research Evolution - 2004 / 2005
DFT 2004 WDES 2004
Bit Stream Operators
Looking for more speed
Stochastic Operators
16
Research Evolution - 2004 / 2005
DFT 2004 WDES 2004
Bit Stream Operators
Small footprint and fast
Looking for more speed
Stochastic Operators
17
Research Evolution - 2004 / 2005
Bit Stream Operators
Looking for more speed
Looking for tolerant converter
Stochastic Operators
Analog Voter
ETS 2005 SBCCI 2005
18
Research Evolution - 2004 / 2005
Bit Stream Operators
Tolerant to multiple faults in n-MR solutions
Looking for more speed
Looking for tolerant converter
Stochastic Operators
TMR and Analog Voter
ETS 2005 SBCCI 2005
19
Research Evolution - 2004 / 2005
SRC 2005 TechCon
Bit Stream Operators
Research Report
Looking for more speed
Looking for tolerant converter
Stochastic Operators
TMR and Analog Voter
20
Research approach - 2006 / 2007
  • cooperation with peers
  • use of memory for computation
  • analog voter majority logic
  • use of an I-IP to harden instructions

21
Research approach - 2006 / 2007
  • cooperation with peers
  • use of memory for computation
  • analog voter majority logic
  • use of an I-IP to harden instructions
  • low cost redundancy using statistical parallel
    computation

22
Research Evolution - 2006 / 2007
DATE 06 PhD Forum
Research Report
23
Research Evolution - 2006 / 2007
DATE 06 PhD Forum
Research Report
MemProc
LATW 06 ETS 06
24
Research Evolution - 2006 / 2007
DATE 06 PhD Forum
Research Report
MemProc
Majority Logic
LATW 06 ETS 06
DFT 06
25
Research Evolution - 2006 / 2007
DATE 06 PhD Forum
Research Report
Low cost redundancy
MemProc
Majority Logic
LATW 06 ETS 06
DFT 06
26
Research Evolution - 2006 / 2007
DATE 06 PhD Forum
DFT 06
Online Hardening
Research Report
Low cost redundancy
MemProc
Majority Logic
LATW 06 ETS 06
DFT 06
27
Research Evolution - 2006 / 2007
DATE 06 PhD Forum
DFT 06
Online Hardening
Research Report
Statistical Computation
Low cost redundancy
MemProc
Majority Logic
VTS 07 (submitted)
LATW 06 ETS 06
DFT 06
28
Current research - motivation
  • future technologies
  • ? faster devices
  • ? transient pulse duration scaling not
    proportional to speed scaling
  • ? transient pulses will last longer than one
    cycle

29
Current research - motivation
  • future technologies
  • ? faster devices
  • ? transient pulse duration scaling not
    proportional to speed scaling
  • ? transient pulses will last longer than one
    cycle
  • techniques relying on time redundancy will fail

30
Current research - motivation
  • alternative approach
  • ? space redundancy
  • ? current solutions area overhead ? 100
  • ? small granularity does not provide low overhead
  • (what can one do with 50 of a MOSFET ?)

31
Current research - motivation
  • proposed solution
  • ? fingerprinting
  • ? parallel processing on subset of possible
    inputs
  • ? small transient fault probability (desired 0)
  • alternative approach
  • ? space redundancy
  • ? current solutions area overhead ? 100
  • ? small granularity does not provide low overhead
  • (what can one do with 50 of a MOSFET ?)

32
Current research - focus
  • use of low cost redundancy and statistical
    computation to cope with transient faults

33
Sample application
  • Freivalds matrix multiplication correctness
  • given matrices A and B, n x n
  • given one algorithm that calculates C A x B
  • goal check if the algorithm performs correctly
    by executing thousands of multiplications and
    comparing the results
  • naive solution calculate again and compare ?
    O(n3)

34
Sample application
  • Freivalds technique
  • 1. generate a random vector r, with values from
    0,1
  • 2. compute vector Cr C ? r ? O(n2)
  • 3. compute vector ABr A ? (B x r) ? O(n2)
  • 4. if C ? A ? B, then PrAbr Cr ? 1/2
  • After k independent repetitions of steps 1, 2 and
    3
  • PrAbr Cr ? 1/2k

35
Sample application
  • Our extension of Freivalds technique
  • 1. generate a random vector r, with values from
    0,1
  • 2. generate a vector rc with rci not(ri) for i
    1n
  • 3. compute Cr C ? r and Crc C ? rc
  • 4. compute ABr A ? (B x r) and ABrc A ? (B x
    rc)
  • 5. if ABr ? Cr OR ABrc ? Crc, then
  • PrAbr ? Cr 1

36
Sample Implementation
  • matrix multiplier with checker
  • application of Freivalds technique

37
Sample Implementation
Area overhead ( of gates)
38
Sample implementation
Time overhead ( of instructions)
39
Sample implementation
Fault injection results
40
PhD program requiremnets
  • 36 credits ?
  • qualifying examination ?
  • 2 foreign languages proficiency exam ?
  • academic week seminar ?
  • Thesis proposal ? February 2007
  • Thesis presentation ? December 2007

41
Questions ?
?
?
?
?
42
Using Stochastic Operators
  • SEU induced transient errors are of random nature
  • Stochastic operators rely on randomness to
    produce approximate results
  • The injection of random faults in the input
    signals processed by stochastic operators did not
    impact the precision of the results

43
Using Stochastic Operators
  • SEU induced transient errors are of random nature
  • Stochastic operators rely on randomness to
    produce approximate results
  • The injection of random faults in the input
    signals processed by stochastic operators did not
    impact the precision of the results
  • Several application areas (DSP) can deal with
    approximate values and still produce acceptable
    results (outputs)

44
Using Stochastic Operators
  • Benefit reduced area of the operators

45
Using Bit Stream Operators
  • Computation principles similar to those of the
    stochastic adder and multiplier
  • Operators can produce bit streams which represent
    the exact results of the operation

46
Using Bit Stream Operators
  • Computation principles similar to those of the
    stochastic adder and multiplier
  • Operators can produce bit streams which represent
    the exact results of the operation
  • Redundancy is added to the bit streams in order
    to stand to multiple bit flips

Adding robustness to the bit stream through
redundancy
47
Using Bit Stream Operators
  • Computation principles similar to those of the
    stochastic adder and multiplier
  • Operators can produce bit streams which represent
    the exact results of the operation
  • Redundancy is added to the bit streams in order
    to stand to multiple bit flips
  • Conversion of bit streams to binary coded values
    is delayed as much as possible, and conversion
    circuits must use TMR or n-MR for protection
    against faults

48
Using Bit Stream Operators
  • Computation principles similar to those of the
    stochastic adder and multiplier
  • Operators can produce bit streams which represent
    the exact results of the operation
  • Redundancy is added to the bit streams in order
    to stand to multiple bit flips
  • Conversion of bit streams to binary coded values
    is delayed as much as possible, and conversion
    circuits must use TMR or n-MR for protection
    against faults
  • Issues to be further investigated size of bit
    streams and area of the conversion circuits

49
What is Wrong with TMR ?
  • TMR protects only against single faults in one of
    the modules

V O T E R
Module 1
correct output
Module 2
correct output
correct output
Module 3
correct output
50
What is Wrong with TMR ?
  • TMR protects only against single faults in one of
    the modules

V O T E R
Module 1
correct output
correct output
Module 3
correct output
51
What is Wrong with TMR ?
  • TMR does not protect against double faults in
    different modules

V O T E R
Module 1
wrong output
wrong output
Module 3
wrong output
52
What is Wrong with TMR ?
  • When a single fault occurs in the voter circuit,
    the voter output may be wrong

V O T E R
Module 1
correct output
Module 2
correct output
correct output
Module 3
correct output
53
What is Wrong with TMR ?
  • When a single fault occurs in the voter circuit,
    the voter output may be wrong

V O T E R
Module 1
correct output
Module 2
correct output ?
correct output
Module 3
correct output
54
Making TMR (n-MR) more reliable
  • Known solutions imply in
  • area, performance and / or power penalties
  • deadlock how to protect the output generator ?

55
Making TMR (n-MR) more reliable
  • Known solutions imply in
  • area, performance and / or power penalties
  • deadlock how to protect the output generator ?
  • Proposed solution
  • use TMR to cope with single faults in the modules

56
Making TMR (n-MR) more reliable
  • Known solutions imply in
  • area, performance and / or power penalties
  • deadlock how to protect the output generator ?
  • Proposed solution
  • use TMR to cope with single faults in the modules
  • replace the digital voter by an analog voter that
  • uses a comparator to generate the output

57
Making TMR (n-MR) more reliable
  • Known solutions imply in
  • area, performance and / or power penalties
  • deadlock how to protect the output generator ?
  • Proposed solution
  • use TMR to cope with single faults in the modules
  • replace the digital voter by an analog voter that
  • uses a comparator to generate the output
  • can support some noise, nevertheless producing
    the correct result

58
The Analog Voter
59
Minimum Area Comparator
Injection of faults in the comparator ()
() using CMOS 0.35µm
60
Electrical Simulation Multiple Faults(SPICE and
CMOS 0.35 ?m)
61
Dealing with Multiple Simultaneous Faults n-MR
The Analog Voter with 5 Inputs (for 5-MR)
62
Dealing with Multiple Simultaneous Faults n-MR
The Analog Voter with 5 Inputs (for 5-MR)
Simulations with injection of 2 simultaneous
faults also succeeded
63
The Analog Voter ... Oops !
Write a Comment
User Comments (0)
About PowerShow.com