Computational Limits of Reliability Evaluation - PowerPoint PPT Presentation

About This Presentation

Title:

Computational Limits of Reliability Evaluation

Description:

DARPA DARPA Computational Limits of Reliability Evaluation Smita Krishnaswamy, George F. Viamontes, Igor L. Markov, and John P. Hayes Univ. of Michigan, Advanced ... – PowerPoint PPT presentation

Number of Views:148

Avg rating:3.0/5.0

Slides: 29

Provided by: blal5

Learn more at: http://web.eecs.umich.edu

Category:

more less

Transcript and Presenter's Notes

Title: Computational Limits of Reliability Evaluation

1
Computational Limits of Reliability Evaluation

Smita Krishnaswamy, George F. Viamontes,
Igor L. Markov, and John P. Hayes
Univ. of Michigan, Advanced Computer Architecture
Lab
Los Alamos National Laboratory

2
Motivation

Problem addressed
Given probabilistic characterization of gate
behavior,propagate SEU information to whole
circuits
E.g., compute the overall error rate,
averagedover all inputs(perhaps, from an input
distribution)
E.g., find worst-case/best-case inputs
How difficult are such computations?
(exact/approximate)
What information can realistically be computed?
How much accuracy can be achieved?

3
Loss of Accuracy Seems Inevitable

Computing/tracking everything is too hard
Accurate models/data are hard to obtain
Hard computations required
Optimization is at least as hard as evaluation
Applications matter rough estimation may not
need as much accuracy as optimization
E.g., given limited area budget, which gates to
harden?
Sensitivity fidelity versus accuracy
Approximate modeling vs approx. computation
Skip complicated models or skip hard computations?

4
Discussion

Minimal modeling of gates
Probability of being hit by a particle with Qcrit
Dependencies on input values
Simplified computation
Consider one path at a time
Consider one input at a time (sampling)
In this work computational aspects
Is it possible to handle all paths and all
inputs?
Can we get enough fidelity to improve reliability?

5
Prior Work

Factors for latching transient errors
Electrical, logical, latching window masking
Shivakumar 2002
Calculation of transient error probabilities for
gates
parameters include gate area, neutron flux,
switching voltage, altitude Mohanram Touba
2002
In SERA, error rates of circuits are approximated
with user-supplied inputs Zhang Shanbhag ICCAD
2004
Fault tolerant architectures
NAND-multiplexing Von Neumann 1956
Reliability improvement by selectively adding
redundancy, TMR Mohanram Touba ITC03
However, applying these methods still requires
careful, accurate analysis

6
Probabilistic Transfer Matrix
output values
0
1
00

Ideal transfer matrix (ITM ) The function of a
correct gate expressed as a matrix
We perturb the 0s and 1s and interpret them as
probabilities
Probabilistic Transfer Matrix (PTM)
A matrix whose (j,k)th entry represents
Poutput k input j
Levin Engin Cybernetics 1964
Patel, Markov Hayes IWLS03
Valid PTMS are stochastic matrices

01
10
11
input values
0
1
00
01
P(output1input10)
10
11
7
Error Representation

PTMs can describe different error behavior for
each input combination
Indeed, the incidence of errors in practice may
depend on input values
For some technologies zero-to-one errors are more
common than one-to-zero
Deterministic (permanent) errors can also be
represented by PTMs
Stuck-at faults
Wrong gates

8
Examples of PTMs
0
1
0
1
00
ITM
00
PTM1 S-a-1
01
01
10
10
11
11
0
1
0
1
PTM 2
PTM 3
00
00
Stochastic S-a-1 (one-way)
01
Wrong-gate (NAND?AND)
01
10
10
11
11
9
Circuit PTMs

Circuit PTMs created from gate PTMs with matrix
algebra
serial composition matrix product
parallel composition tensor product
Given two matrices M (m by n) and N (o by p)
the Tensor Product M?N is an mo by np matrix
whose entriesare given by p(kj)
p(k1j1)p(k2j2)
Gives joint probabilities of all possible
combinations of independent
signal probabilities

10
Fanouts and Wire Permutations

Fanout PTM (0?00, 1?11)
Wire swap (01?10)(10?01)

00 01 10 11
0
1
00 01 10 11
00
01
10
11
11
Example Computing Circuit PTM
12
Complexity of PTM Calculations

Exponential worst-case space complexity
n-input m-output PTM takes space O(2mn)
Store PTMs as algebraic decision diagrams
(ADDs)using the QuIDDPro library for lossless
compression
Viamontes et. al. Quant. Inf. Proc. 2003
ADDs are variants of BDDs, used in synthesis
Bahar,1997
Operations done on compressed forms, results
come out compressed
Scalability (purely combinational circuits)
Current implementation scales up to circuit width
50
Sufficient to handle regular fabrics (FPGAs,
structured ASICs, etc)
Sufficient to handle many deeply pipelined
circuits
Greater scaling ongoing work

13
ADD Representation of PTM

ri-row variables,
ci-column variables
Interleaved ordering beneficial for tensor
products

c1
0 1
r0,r1
00
01
10
11
14
Problem Handling Non-Square Matrices

ADDs usually represent square matrices
PTMs are generally not square (gates have fewer
outputs than inputs)
Obvious extension skipping DD variables
does not work
Causes ambiguity (two matrices below have same
ADD)
Multiplication algorithms choose the second
interpretation,but the resulting matrix is not a
valid PTM (not stochastic)

15
Padding Non-square Matrices with 0s

To facilitate matrix multiplication, use zero
padding
The product of two zero-padded matricesis also a
zero-padded matrix
However, tensor products do not preserve
zero-padding

16
Solution 1 Permutation Method

The columns of the incorrect matrix can be
permuted to obtain correct tensor
Permutation matrix itself is too large
Permutation matrix can be decomposed as tensor
product of
I (identity) matrices
Larger identity matrices tensor products of
smaller ones
Rperms , I with row variables permuted (same as a
wire permutation matrix)
Number of row variables is log( rows)

17
Permutation Method
I
Rperm
Permutation

For gates with higher input to output ratio, use
a series of these permutations
Recursively cut of non-contiguous zero-columns
by half

18
Solution 2 Dummy Output Method

Add dummy outputs to make the number of row and
column variables equal
Tensor with an identity andapply
remove_redundant on an input variable
This adds an output but not an input
Use fan-in matrices to eliminate dummy variables
and perform zero-padding
Fanin-matrix (abstracted identity matrix)
Abstraction summing
over a variable

00
01
10
11
Sum of cols0,1 is new col1 Cols0,1 only differ
in c1
19
Other Operations for PTM Manipulation

Abstraction Rows/Columns corresponding to the
variable being zero and the variable being one
are added together
Remove_redundant fanouts to the same level need
not be represented twice
If two inputs signals are identical then delete
rows where the two variables have different
values (these rows are meaningless)
Can be implemented as a variation on abstraction

20
Example Dummy Output Method

Adding a dummy output Add the first input
variable also as an output

000

Remove rows with different vals for 1st and 3rd
index

001
010
011
100

Tensoring with I adds an input AND an output
Added input is redundant

101
110
resultant matrix
111
21
Example (continued)

3-2 FANIN
Zero-padded Result
22
Evaluation Algorithm
CurrSigs primary outputs While(CurrSigs!
Primary Inputs) For(i0iltCurrSigs.size()i)
Gategate_lookup(Currsigsi) //only returns
gate if all sinks to a sig are done
CurrLeveltensor(CurrLevel,Gate)
zero_track(CurrLevel) //either permutation or
dummy output method
remove_redundant(CurrLevel)
CircuitPTMCircuitPTM CurrLevel CurrSigs
CircuitPTM.inputs()
23
Circuit Reliability

For a circuit with ITM J, PTM M and input distro
p(i)
reliability ?p(i)M(i,j)J(i,j)
This measure can be used to evaluate the
reliability of a circuit made of components of
varying robustness
Can efficiently implement this operation using
ADDs

24
Experiment 1 Reliability Evaluation

Calculate the ITM of standard benchmark circuits
in BLIF
Alter the individual gate PTMs by adding a
probability of error to each input
Recalculate the circuit PTM
Calculate the reliability of the circuit by
comparing the PTM to the ITM

25
(No Transcript)
26
Experiment 2 Gate Susceptibility

Find the most critical gates in a circuit by
calculating the susceptibility of each gate
Calculating susceptibility
Add an error to the gate being evaluated
Leave all other gates ideal
Calculate the probability of error
(1-reliability) of the entire circuit with only
this gate having error
Find the top most critical gates and reduce their
error probability (from 0.5 to .005), calculate
improvement in reliability

27
Gate Susceptibility Data
Orig Top3 imp Top5 imp
C17 .864 .959 11 .98 13.4
Mux .907 .974 7.39 .985 8.6
Parity .603 .637 5.64 .666 10.4
xor5 .047 .068 46.2 .070 50.5
pm1 .375 .429 14.4 .469 25.1
28
Experiment 3 Analyzing von Neumanns NAND-MUX
Architecture

Each signal is repeated n times
The NAND levels act as simple majority
gatesbetween levels of random permutations
Can relax assumptions used in analytical analysis
and evaluate with PTMs

29
Numerical Evaluation of Fault Tolerance

PTM evaluation can be used determine
thresholds error value
required levels for NAND-MUX to be functional

Number of Levels Number of Levels Number of Levels Number of Levels Number of Levels
Error 2 4 6 8 10
.05 .8075 .778 .747 .719 .574
.02 .916 .9144 .9074 .9005 .8175
.005 .9741 .9795 .9789 .9784 .9544
30
Conclusions

It is possible to handle all inputs and all
pathsin reliability evaluation
So far, small circuits only (may be sufficient
for memories, FPGAs, structured ASICs, etc)
So far, for simple gate models only
Within reach time-dependent reliability
Applications quantifiable approximation
Deliberate simplifications to improve scalability
Bootstrapping faster methods
Applications analysis and optimization
Finding most critical componentsin small
circuits and regular fabrics
Hardening a small number of gates
Applications probabilistic test

31
Selected References

R.I. Bahar et al., Algebraic Decision Diagrams
and their Applications," J. of Formal Methods in
Sys. Design10, no.2/3, April-May 1997, pp.
171-206.
V.L.Levin,Probability Analysis of Combination
Systems and their Reliability,' Engin.
Cybernetics, no 6. Nov-Dec. 1964, pp. 78-84.
K. Mohanram and N. A. Touba, Cost-Effective
Approach for Reducing Soft Error Failure Rate in
Logic Circuits,'' ITC, 2003, pp. 893-901.
K.N.Patel, J.P.Hayes, and I.L. Markov,
Evaluating Circuit Reliability Under
Probabilistic Gate-Level Fault Models,'' IWLS May
2003, pp. 59-64.
P. Shivakumar, M. Kistler, et. al, Modeling the
Effect of Technology Trends on Soft Error Rate of
Combinational Logic" Intl. Conf. on Dependable
Systems and Networks, 2002, pp. 389-398.
G. F. Viamontes, I. L. Markov and J. P. Hayes,
Improving Gate-Level Simulation of Quantum
Circuits'',Quantum Information Processing, vol.
2(5), October 2003, pp. 347-380.