Title: Computational Limits of Reliability Evaluation
1Computational Limits of Reliability Evaluation
- Smita Krishnaswamy, George F. Viamontes,
- Igor L. Markov, and John P. Hayes
- Univ. of Michigan, Advanced Computer Architecture
Lab - Los Alamos National Laboratory
2Motivation
- Problem addressed
- Given probabilistic characterization of gate
behavior,propagate SEU information to whole
circuits - E.g., compute the overall error rate,
averagedover all inputs(perhaps, from an input
distribution) - E.g., find worst-case/best-case inputs
- How difficult are such computations?
(exact/approximate) - What information can realistically be computed?
- How much accuracy can be achieved?
3Loss of Accuracy Seems Inevitable
- Computing/tracking everything is too hard
- Accurate models/data are hard to obtain
- Hard computations required
- Optimization is at least as hard as evaluation
- Applications matter rough estimation may not
need as much accuracy as optimization - E.g., given limited area budget, which gates to
harden? - Sensitivity fidelity versus accuracy
- Approximate modeling vs approx. computation
- Skip complicated models or skip hard computations?
4Discussion
- Minimal modeling of gates
- Probability of being hit by a particle with Qcrit
- Dependencies on input values
- Simplified computation
- Consider one path at a time
- Consider one input at a time (sampling)
- In this work computational aspects
- Is it possible to handle all paths and all
inputs? - Can we get enough fidelity to improve reliability?
5Prior Work
- Factors for latching transient errors
- Electrical, logical, latching window masking
Shivakumar 2002 - Calculation of transient error probabilities for
gates - parameters include gate area, neutron flux,
switching voltage, altitude Mohanram Touba
2002 - In SERA, error rates of circuits are approximated
with user-supplied inputs Zhang Shanbhag ICCAD
2004 - Fault tolerant architectures
- NAND-multiplexing Von Neumann 1956
- Reliability improvement by selectively adding
redundancy, TMR Mohanram Touba ITC03 - However, applying these methods still requires
careful, accurate analysis
6Probabilistic Transfer Matrix
output values
0
1
00
- Ideal transfer matrix (ITM ) The function of a
correct gate expressed as a matrix - We perturb the 0s and 1s and interpret them as
probabilities - Probabilistic Transfer Matrix (PTM)
- A matrix whose (j,k)th entry represents
Poutput k input j - Levin Engin Cybernetics 1964
- Patel, Markov Hayes IWLS03
- Valid PTMS are stochastic matrices
-
-
01
10
11
input values
0
1
00
01
P(output1input10)
10
11
7Error Representation
- PTMs can describe different error behavior for
each input combination - Indeed, the incidence of errors in practice may
depend on input values - For some technologies zero-to-one errors are more
common than one-to-zero - Deterministic (permanent) errors can also be
represented by PTMs - Stuck-at faults
- Wrong gates
8Examples of PTMs
0
1
0
1
00
ITM
00
PTM1 S-a-1
01
01
10
10
11
11
0
1
0
1
PTM 2
PTM 3
00
00
Stochastic S-a-1 (one-way)
01
Wrong-gate (NAND?AND)
01
10
10
11
11
9Circuit PTMs
- Circuit PTMs created from gate PTMs with matrix
algebra - serial composition matrix product
- parallel composition tensor product
- Given two matrices M (m by n) and N (o by p)
- the Tensor Product M?N is an mo by np matrix
whose entriesare given by p(kj)
p(k1j1)p(k2j2) - Gives joint probabilities of all possible
- combinations of independent
- signal probabilities
10Fanouts and Wire Permutations
- Fanout PTM (0?00, 1?11)
- Wire swap (01?10)(10?01)
00 01 10 11
0
1
00 01 10 11
00
01
10
11
11Example Computing Circuit PTM
12Complexity of PTM Calculations
- Exponential worst-case space complexity
- n-input m-output PTM takes space O(2mn)
- Store PTMs as algebraic decision diagrams
(ADDs)using the QuIDDPro library for lossless
compression - Viamontes et. al. Quant. Inf. Proc. 2003
- ADDs are variants of BDDs, used in synthesis
Bahar,1997 - Operations done on compressed forms, results
come out compressed - Scalability (purely combinational circuits)
- Current implementation scales up to circuit width
50 - Sufficient to handle regular fabrics (FPGAs,
structured ASICs, etc) - Sufficient to handle many deeply pipelined
circuits - Greater scaling ongoing work
13ADD Representation of PTM
- ri-row variables,
- ci-column variables
- Interleaved ordering beneficial for tensor
products
c1
0 1
r0,r1
00
01
10
11
14Problem Handling Non-Square Matrices
- ADDs usually represent square matrices
- PTMs are generally not square (gates have fewer
outputs than inputs) - Obvious extension skipping DD variables
does not work - Causes ambiguity (two matrices below have same
ADD) - Multiplication algorithms choose the second
interpretation,but the resulting matrix is not a
valid PTM (not stochastic)
15Padding Non-square Matrices with 0s
- To facilitate matrix multiplication, use zero
padding - The product of two zero-padded matricesis also a
zero-padded matrix - However, tensor products do not preserve
zero-padding
16Solution 1 Permutation Method
- The columns of the incorrect matrix can be
permuted to obtain correct tensor - Permutation matrix itself is too large
- Permutation matrix can be decomposed as tensor
product of - I (identity) matrices
- Larger identity matrices tensor products of
smaller ones - Rperms , I with row variables permuted (same as a
wire permutation matrix) - Number of row variables is log( rows)
17Permutation Method
I
Rperm
Permutation
- For gates with higher input to output ratio, use
a series of these permutations - Recursively cut of non-contiguous zero-columns
by half
18Solution 2 Dummy Output Method
- Add dummy outputs to make the number of row and
column variables equal - Tensor with an identity andapply
remove_redundant on an input variable - This adds an output but not an input
- Use fan-in matrices to eliminate dummy variables
and perform zero-padding - Fanin-matrix (abstracted identity matrix)
- Abstraction summing
- over a variable
-
00
01
10
11
Sum of cols0,1 is new col1 Cols0,1 only differ
in c1
19Other Operations for PTM Manipulation
- Abstraction Rows/Columns corresponding to the
variable being zero and the variable being one
are added together - Remove_redundant fanouts to the same level need
not be represented twice - If two inputs signals are identical then delete
rows where the two variables have different
values (these rows are meaningless) - Can be implemented as a variation on abstraction
20Example Dummy Output Method
- Adding a dummy output Add the first input
variable also as an output -
000
- Remove rows with different vals for 1st and 3rd
index
001
010
011
100
- Tensoring with I adds an input AND an output
- Added input is redundant
101
110
resultant matrix
111
21Example (continued)
3-2 FANIN
Zero-padded Result
22Evaluation Algorithm
CurrSigs primary outputs While(CurrSigs!
Primary Inputs) For(i0iltCurrSigs.size()i)
Gategate_lookup(Currsigsi) //only returns
gate if all sinks to a sig are done
CurrLeveltensor(CurrLevel,Gate)
zero_track(CurrLevel) //either permutation or
dummy output method
remove_redundant(CurrLevel)
CircuitPTMCircuitPTM CurrLevel CurrSigs
CircuitPTM.inputs()
23Circuit Reliability
- For a circuit with ITM J, PTM M and input distro
p(i) - reliability ?p(i)M(i,j)J(i,j)
- This measure can be used to evaluate the
reliability of a circuit made of components of
varying robustness - Can efficiently implement this operation using
ADDs
24Experiment 1 Reliability Evaluation
- Calculate the ITM of standard benchmark circuits
in BLIF - Alter the individual gate PTMs by adding a
probability of error to each input - Recalculate the circuit PTM
- Calculate the reliability of the circuit by
comparing the PTM to the ITM
25(No Transcript)
26Experiment 2 Gate Susceptibility
- Find the most critical gates in a circuit by
calculating the susceptibility of each gate - Calculating susceptibility
- Add an error to the gate being evaluated
- Leave all other gates ideal
- Calculate the probability of error
(1-reliability) of the entire circuit with only
this gate having error - Find the top most critical gates and reduce their
error probability (from 0.5 to .005), calculate
improvement in reliability
27Gate Susceptibility Data
Orig Top3 imp Top5 imp
C17 .864 .959 11 .98 13.4
Mux .907 .974 7.39 .985 8.6
Parity .603 .637 5.64 .666 10.4
xor5 .047 .068 46.2 .070 50.5
pm1 .375 .429 14.4 .469 25.1
28Experiment 3 Analyzing von Neumanns NAND-MUX
Architecture
- Each signal is repeated n times
- The NAND levels act as simple majority
gatesbetween levels of random permutations - Can relax assumptions used in analytical analysis
and evaluate with PTMs
29Numerical Evaluation of Fault Tolerance
- PTM evaluation can be used determine
- thresholds error value
- required levels for NAND-MUX to be functional
-
Number of Levels Number of Levels Number of Levels Number of Levels Number of Levels
Error 2 4 6 8 10
.05 .8075 .778 .747 .719 .574
.02 .916 .9144 .9074 .9005 .8175
.005 .9741 .9795 .9789 .9784 .9544
30Conclusions
- It is possible to handle all inputs and all
pathsin reliability evaluation - So far, small circuits only (may be sufficient
for memories, FPGAs, structured ASICs, etc) - So far, for simple gate models only
- Within reach time-dependent reliability
- Applications quantifiable approximation
- Deliberate simplifications to improve scalability
- Bootstrapping faster methods
- Applications analysis and optimization
- Finding most critical componentsin small
circuits and regular fabrics - Hardening a small number of gates
- Applications probabilistic test
31Selected References
- R.I. Bahar et al., Algebraic Decision Diagrams
and their Applications," J. of Formal Methods in
Sys. Design10, no.2/3, April-May 1997, pp.
171-206. - V.L.Levin,Probability Analysis of Combination
Systems and their Reliability,' Engin.
Cybernetics, no 6. Nov-Dec. 1964, pp. 78-84. - K. Mohanram and N. A. Touba, Cost-Effective
Approach for Reducing Soft Error Failure Rate in
Logic Circuits,'' ITC, 2003, pp. 893-901. - K.N.Patel, J.P.Hayes, and I.L. Markov,
Evaluating Circuit Reliability Under
Probabilistic Gate-Level Fault Models,'' IWLS May
2003, pp. 59-64. - P. Shivakumar, M. Kistler, et. al, Modeling the
Effect of Technology Trends on Soft Error Rate of
Combinational Logic" Intl. Conf. on Dependable
Systems and Networks, 2002, pp. 389-398. - G. F. Viamontes, I. L. Markov and J. P. Hayes,
Improving Gate-Level Simulation of Quantum
Circuits'',Quantum Information Processing, vol.
2(5), October 2003, pp. 347-380.