Title: Algorithm-Based Fault Tolerance Theory of Check Placement
1Algorithm-Based Fault ToleranceTheory of Check
Placement
2So Far
- Learned how certain computations could be checked
using algorithm-specific checks. - In any algorithm we can develop checks to verify
any set of data items. - How effective are these checks?
- How many faults can given set of checks detect?
3Abstract Checks
- Suppose we are given (g,h)-checks
- Check defined on g data elements
- If all elements correct, returns 0
- If 0? and ?h elements erroneous, return 1
- If ?h elements erroneous, undefined
4Checking Example
d1
d1
d2
d2
sum
sum
dn
dn
- Assume (2, 1) checks
- 2 elements, 1-failure detect
- Both sets of checks can detect single errors
- Neither can locate individual errors
5But with one more check
d1
d2
sum
dn
n checks ?i. di and sum1 more check sum
- If also check sum
- can detect any pair of errors
- can locate single errors
- Need general theory of effective and efficient
check placement
6Goals
- Need models for correlating processor faults to
data errors - Given fault model and set of checks need to
derive fault detectability and locatability
7Papers covered
- V.S.S. Nair, J.A. Abraham, P. Banerjee.
"Efficient techniques for the analysis of
algorithm-based fault tolerance (ABFT) schemes",
1996. - Choon-Sik Park and Mineo Kaneko, "An Efficient
Technique for Design of ABFT Systems Based on
Modified PD Graph". - Choon-Sik Park, "Algorithm-Based Fault Tolerant
Systems Based on Graph-Theoretic Error
OccurencePropagation Models", 2000. (PhD Thesis) - V.S.S. Nair, J.A. Abraham. "Hierarchical design
and analysis of fault-tolerant multiprocessor
systems using concurrent error detection", 1990.
8Outline
- Matrix-based formalism of Nair et al
- Dependence graph-based formalism of Park et al
- Includes fault propagation models
- Framework for hierarchical fault tolerant systems
by Nair et al - Building fault tolerant systems out of fault
tolerant components
9Basic Framework
d1
P1
C1
d2
C
d3
P2
h
e
d4
C2
c
P3
d5
k
s
d6
C3
P4
d7
- Each processor and check associated with set of
elements
10Basic Framework
- Data(Pi) set of data elements affected by
processor i - If Pi fails, any subset of of Data(Pi) may be
erroneous - No notion of errors propagating based on data
dependences - Data() defines the Processor-Data (PD) Matrix
11Associated PD Matrix
d1
P1
d2
Data Elements
d3
P2
d4
Processors
P3
d5
d6
P4
d7
12Basic Framework
- Check(di) set of checks that check data element
di. - Must be non-empty if we expect to detect errors
- Check defines the Data-Check (DC) Matrix
- Paper focuses on (g,1) checks
- g data elements
- can detect upto 1 fault
13Associated DC Matrix
d1
Checks
C1
d2
C
d3
h
Data Elements
e
d4
C2
c
d5
k
s
d6
C3
d7
- C1 and C2 are (3,1) checks
- C3 is a (2,1) check
14The PC Matrix
- Finally, associate processors and checks
- Processor-check (PC) matrix PD?DC
Checks
DC
Data Elements
PD
?
Data Elements
Processors
elements verified by check
PC
Processors
15Using the PC Matrix
- PC matrix shows if we can detect single-processor
errors - Assume all checks are (g,h) checks
- If each row of PC has all entries ?h failure of
that process will be detected - Regardless of which entries actually become
erroneous
elements verified by check
PC
Processors
16Using the PC Matrix
- If each row of PC has all entries ?h failure of
that process will be detected
d1
P1
C1
d2
C
d3
P2
h
e
d4
C2
c
P3
d5
k
s
d6
C3
P4
d7
elements verified by check
PC
Processors
17Relaxing Detectability
- Condition is too conservative
- Suppose we have (3, 2) checks
- Pis PD row is
- There are 2 checks. DC matrix
- PC Matrix
d1
P1
C1
d2
d3
d4
C2
d5
18Relaxing Detectability
- C1 may be overwhelmed by errors
- Will not notice error ltd1, d2 d5gt
- By above criterion system cant detect failure in
P1
d1
P1
C1
d2
d3
d4
C2
d5
19Reaching New Detectability Definition
d1
P1
C1
d2
d3
d4
C2
d5
- But how could C1 be overwhelmed?
- When all 3 of its elements have errors
- Recall, these are (3,2) checks
20Reaching New Detectability Definition
d1
P1
C1
d2
d3
d4
C2
d5
- But C1 and C2 overlap on d5
- Thus if C1 overwhelmed, C2 detects error
- It is not overwhelmed
- Thus, for any error pattern can see if any check
will notice
21Trivial Algorithm 2
- Try every possible error pattern
- Exponentially many of them
- For each pattern see if some check will detect
it - Before ensured that no check overwhelmed
- Pro Correct and not conservative
- Con Expensive
22New Definition of Detectability
- Work with error patterns
- Ex ltd1, d2, d5gt, ltd1, d3, d4gt, ltd3gt, etc.
- If one check detects given error pattern, no
problem if other checks overwhelmed - Repeat until all error patterns detected
- If some check not overwhelmed, eliminate all
detectable error patterns from consideration
23Example of Detectability Algorithm
d1
P1
C1
d2
d3
C2
P2
(2,1) checks
d4
C3
d5
C4
- Is failure of P1 detectable?
- P1 fails ? d1, d2 and/or d3 may have errors
- C1, C2 overwhelmed
- C3 not overwhelmed
24Example of Detectability Algorithm
d1
P1
C1
d2
d3
C2
P2
(2,1) checks
d4
C3
d5
C4
- Look at errors C3 can detect d3
- Remove them from consideration
- Since any error pattern involving d3 will be
detected
25Example of Detectability Algorithm
d1
P1
C1
d2
C2
P2
(2,1) checks
d4
C3
d5
C4
- Look at remaining error patterns combinations of
d1 and/or d2 - Now C2 not overwhelmed
- Remove any error patterns involving d2
26Example of Detectability Algorithm
d1
P1
C1
C2
P2
(2,1) checks
d4
C3
d5
C4
- Look at remaining error patterns d1
- C1 not overwhelmed
- Remove any of its error patterns
27Example of Detectability Algorithm
P1
C1
C2
P2
(2,1) checks
d4
C3
d5
C4
- All of P1s error patterns detected
- We are done!
28Failing Check Processors
- What if processor performing check fails?
- Add pseudo data elements to represent
processors - Each check will also check its processors
pseudo-data element - New element has ? weight, so error in it will
overwhelm any check
29Final System
P1
C1
d2
d3
C2
P2
(2,1) checks
d4
C3
d5
C4
d6
d7
- Check C3 is in P1
- Checks C1, C2 and C4 on P2
30The Infinities
d1
P1
C1
d2
d3
C2
P2
(2,1) checks
d4
C3
d5
C4
d6
d7
Checks
DC
Data Elements
PD
Processors
Data Elements
elements verified by check
PC
Processors
31The Infinities
d1
P1
C1
d2
d3
C2
P2
(2,1) checks
d4
C3
d5
C4
d6
elements verified by check
PC
d7
Processors
- If P1 fails, C1 and C2 overwhelmed
- C3 also overwhelmed by ?1
- Because C3 runs on failed P1
- Only C4 not overwhelmed
32The Infinities
d1
P1
C1
d2
d3
C2
P2
(2,1) checks
d4
C3
d5
C4
d6
elements verified by check
PC
d7
Processors
- Remove all error patterns detected by C4
- Any that include d2
33The Infinities
d1
P1
C1
d3
C2
P2
(2,1) checks
d4
C3
d5
C4
d6
elements verified by check
PC
d7
C4s entry must become 0Others may go lower
Processors
- C1 and C2 no longer overwhelmed
- Remove error patterns detected by C1 and C2
- Any that include d1 and d3
34The Infinities
P1
C1
C2
P2
(2,1) checks
d4
C3
d5
C4
d6
elements verified by check
PC
d7
C1s and C2s entries must become 0Others may go
lower
Processors
- Now P1s row is all 0s and ?s
- All real data elements successfully checked
- Only pseudo-elements remain
- Dont care
35The Infinities
P1
C1
C2
P2
(2,1) checks
d4
C3
d5
C4
d6
elements verified by check
PC
d7
Processors
- Note failure of P2 not detectable
- d5 only checked by C4, which runs on P2
- Thus, entry will never drop to ?
36Multi-Process Errors
- Want to know if system detect failures of ?r
processors - For every subset of r processors
- Take union of all data elements they touched
- Pretend each r-set is single processor
- Use above algorithm to check if all resulting
error patterns detectable
37Fault Locatability
- We only see errors, not faults
- For each error pattern, want to know which fault
caused it - Given two fault patterns, are they
distinguishable? - Only if they have different patterns of failed
checks - Will give intuition for analysis
380-1 Disagreement
- Take rows Ri and Rj of rPC (faults Fi and Fj)
- For every possible error pattern in Ri and Rj
look at what each check says on this pattern - If check responses different on each pattern Fi
and Fj can be differentiated
391-0 Disagreement
- Want to differentiate faults Fi and Fi?Fj ?j
- Compare each error pattern of Fi and Fj Eik and
Ejl - If some check meets Eik on 1? ?h spots and
meets Eil on 0 spots then Ejk and Ejk?Ejl
distinguishable - If this is true for all error patterns then Fi
and Fi?Fj distinguishable
401-0 Disagreement Example
411-0 Disagreement Example
- Clearly, Eik and Ejl look different
- Eik?Ejl corresponds to fault pattern
- Checks would say
- Different from Eik or Ejl Distinguishable!
42Fault Locatability
- If can show 1-0 disagreement between every
single-process fault and every r-process
faultSystem is r-fault locatable - Algorithm for locatability is obscure
- Read the paper
43Summary
- Presented matrix-based framework for evaluating
error detectability locatability - Framework deals with arbitrary errors
- More work by V.S.S. Nair with other coauthors
44Outline
- Matrix-based formalism of Nair et al
- Dependence graph-based formalism of Park et al
- Includes fault propagation models
- Framework for hierarchical fault tolerant systems
by Nair et al - Building fault tolerant systems out of fault
tolerant components
45Graph-Based Framework
- Developed by Choon-Sik Park
- Does in graphs what Nair et al work does in
matrices - Assumes (g,1) checks
- Differences
- Different definition of fault locatability
- Unknown if equivalent
- Presents more limited fault?error models
- As opposed to anything and everything
- Will first present general view, then specific
error models
46Basic Picture
Errors
Faults
Data
Checks
c
Fi
eiu
Fj
c
ejv
Processor?Data, Data?Data dependence info
maintained
47k-Faults
- Faults may cause number of possible errors
- For given fault, many errors possible
- If given error happens, all associated data
elements definitely corrupted - k-Faults faults generating errors that corrupt
?k data elements
Errors
Faults
Data
Fi
eiu
48Fault Detectability
- System is k-fault detectable if for every error
pattern ? check c s.t. c?eiu1 - ? means intersection of affected data elements
- Proof
- If there exists such check then every error
pattern induced by fault will be detected - If k-fault detectable then must ? some check that
reliably yells for any possible error pattern - Can allow the check that yells to be the check in
definition
49Fault Management
- k-fault detectability If a fault affects ?k data
elements then checks will detect it - k-fault locatability For all faults that affect
?k data elements, can tell any pair of faults
apart - Will examine all fault patterns Fi that come from
?k data elements failing
50Fault Locatability 1
- To locate faults, must ensure that different
faults cause different errors - Theorem 1System k-fault locatable only if for
error patterns eiu, ejv (from faults Fi and Fj)
eiu?ejv?? - ? ? symmetric difference
- Proof clearIf two faults can show up as same
error, cant tell them apart
51Fault Locatability 2
- Theorem 2System k-fault locatable only if for
error patterns eiu, ejv ? checks c and c' s.t. - c?(eiu?ejv)1 (recall all checks are (g,1))
- c?(eiu?ejv)0
- If c?(eiu-ejv)1 then c'?ejv)1
- If c?(ejv-eiu)1 then c'?eiu)1
- Intuition Trying to make tuple ltc,c'gt be
different and ?lt0,0gt on errors eiu and ejv
52Fault Locatability Illustration
eiu
(eiu-ejv)
(eiu?ejv)
(ejv-eju)
ejv
(eiu?ejv)
53Fault Locatability Illustration
eiu
(eiu-ejv)
(eiu?ejv)
c
(ejv-eju)
ejv
- c?(eiu?ejv)1
- i.e. c overlaps one element ?(eiu?ejv)
- (because of (g,1) checks)
(eiu?ejv)
54Fault Locatability Illustration
eiu
(eiu-ejv)
(eiu?ejv)
c
(ejv-eju)
ejv
- c?(eiu?ejv)0
- i.e. c only touches on the part that is unique to
ejv
(eiu?ejv)
55Fault Locatability Illustration
c'
eiu
OR
(eiu-ejv)
(eiu?ejv)
c
(ejv-eju)
ejv
- If c?(ejv-eiu)1 then c'?eiu)1
- If c notices ejv make sure that c notices eiu
(eiu?ejv)
56Fault Locatability Illustration
c'
eiu
OR
(eiu-ejv)
(eiu?ejv)
c
(ejv-eju)
ejv
- Error eiultc,c'gtlt0,1gt
- Error ejvltc,c'gtlt1,?gt
- Patterns distinguishable
- Either error detected
(eiu?ejv)
57Fault Locatability 2
- Theorem 2System k-fault locatable only if for
error patterns eiu, ejv ? checks c and c' s.t. - c?(eiu?ejv)1 (recall all checks are (g,1))
- c?(eiu?ejv)0
- If c?(eiu-ejv)1 then c'?ejv)1
- If c?(ejv-eiu)1 then c'?eiu)1
- This, is above true for every pair of error
patterns, system k-fault detectable
58Extra Fault Detectability
- Theorem if system is k-fault locatable then it
is 2k-fault detectable - Must show for any fault Fl in ?2k processors, ?
resulting errors elw, ? check c. c?elw1 - Note Failures of ?2k processors result in ?2?
errors as failures of ?k data elements - Thus, can break up elw (eiu?ejv), coming from
k-fault patterns Fi and Fj
59Extra Fault Detectability
- Theorem if system is k-fault locatable then it
is 2k-fault detectable - Must show ?eiu,ejv ? check c. c?(eiu?ejv)1
- If (eiu?ejv) happens, both c and c' will notice
eiu
(eiu-ejv)
(eiu?ejv)
c
(ejv-eju)
OR
c'
ejv
60Fault?Error Models
- So far trying to deal with arbitrary errors
- Actual model of how faults turn into errors not
defined - i.e. arbitrary
- This is unnecessarily general
- Should focus on realistic models of error
generation and propagation - Makes it easier to design reliable systems
61Single-Input-Driven Model
- Output of computation erroneous if any input(s)
are - Even if processor is faulty
- If processor is faulty, its computations may or
may not be erroneous - (this is where we use data dependence
information) - Will focus on how model treats single-processor
failures
62SID Model Picture
Pi
Data
- data elements on Pi
- Synonymous with sets of data elements on Pi
- Focus on single-processor failures
63Fault Model in Practice
- If Pi fails, any subset of Diws may have error
- If Diw has error, any data depending on it has
error - Bijection between Diw and errors Eiw
Pi
Data
64Single-Fault Detectability in SID
Pi
Data
- Brute-Force algorithm
- ? sets of Eiws
- If ? check c s.t. c?(?Eiws)1 then this error
pattern detectable - If all patterns detectable, system is
single-fault detectable
c
65Too Conservative
- Like before, algorithm too conservative
- Examines exponentially many error patterns
- Suppose set of errors
detected via check c - i.e. c?E1
- Look at
c
66Too Conservative
- Clearly, all overlap with c on one element
- Thus, each one detectable
- Similarly, all unions containing
detectable - Therefore, if a set of errors detectable, all
unions containing suberrors also detectable - And thus, no need to check them
Can ignore E1, E2, E1?E2, E1?E3, E1?E2, E1? E2?
E3 Cant ignore E3
c
67New Definition of Detectability
- (start with all possible errors)
- For each check cs
- Check that detectable
- Now ignore detectable subsets of
- Remove detectable subsets
- Repeat to ensure rest of also detectable
68Detectability Example
- Check ( )
- c1 meets E1 and E2
c1
69Detectability Example
- Check ( )
- c1 meets E1 and E2
- Remove them to get
c1
70Detectability Example
- Check
- C2 meets E3 and E4
- Also meets E2 but on error E2, c1 will ring
c1
c2
71Detectability Example
- Check
- C2 meets E3 and E4
- Also meets E2 but on error E2, c1 will ring
- Remove them to get
c1
c2
72Detectability Example
c1
c2
c3
73Detectability Example
- Check
- C3 meets E5
- Remove it to get
c1
c2
c3
74Detectability Example
- Check
- C3 meets E6
- Recall circles on left are data on processor I
c1
c2
c3
c4
75Detectability Example
- Check
- C3 meets E6
- Recall circles on left are data on processor I
- Remove it to get
c1
c2
c3
c4
76Detectability Example
c1
c2
c3
c4
77Single-Fault Locatability in SID
- Basic definitionMust exist enough checks s.t.
all error patterns produced by failure of Pi
differentiable from error patterns of Pj - Involves a lot of error patterns
- Start with brute-force definition
78Brute-Force Definition
- ? error patterns EqEi1, Ei5, Eiw, from Pi
? checks and s.t. -
- Detects error E
-
- Ignores any error from Pj
- detect Ej and all subsets via above
algorithm - And vice versa (since s may ring on Pis
errors) - Result
- Any error pattern in Ei, none in Ej will ring
some cq - Every pattern in Ej detectable
79Responses of Checks
- On error pattern Eq (due to failure of Pi)
- On any error Ej due to failure of Pj
- Can brute-force evaluate test on every possible Eq
At least one must be 1 (else Ej not detectable)
80Brute Force Too Exhaustive
- Recall that if
then same true for all sets containing E1, Er - Thus, can eliminate many of the steps above
81New Definition of Locatability
- (start with all possible Pi errors)
- For each check cs
- Check cs detects
- But not Ej
- Ensure that Ej is detectable via above algorithm
82New Definition of Locatability
- Syndrome of Ei and detectable subsets
- Syndrome of Ej all subsets
- Can now ignore detectable subsets of
- Remove detectable subsets
- Repeat until all covered
- Do same for
- In paper, steps for and interleaved
At least one must be 1 (else Ej not detectable)
83Summary
- Presented graph-based framework for evaluating
error detectability locatability - Framework deals with arbitrary errors
- Can be specialized to a simpler fault model
Single-Input Driven - Choon-Sik Parks thesis presents the
Multiple-Input Driven model - More realistic but complex
84Outline
- Matrix-based formalism of Nair et al
- Dependence graph-based formalism of Park et al
- Includes fault propagation models
- Framework for hierarchical fault tolerant systems
by Nair et al - Building fault tolerant systems out of fault
tolerant components
85Building Larger Systems
- Now know how to analyze systems for detectability
locatability - For large systems this can be very hard/expensive
- Large systems typically made up of smaller
components - Simplifies fault tolerance design
86Basic Idea
- Have component with known detectability (t)
locatability (l) - Construct system S out of k components
- What is resulting fault tolerance?
87Basic Idea
- System fault tolerance no better than for
individual component - If gtt data elements fail in same component, error
not detected - If gtl elements fail in component, will not locate
- Detectability locatability ratio tends to 0 as
system size increases!
88Hierarchical Design
- To build fault tolerant systems must introduce
checks with new components - Will present hierarchical design scheme with
specific detectability locatability guarantees - Assumptions
- All (g,h) checks have same h
- No restriction on g
- Every processor produces only one data element
- Same true for blocks of processors
- Checks are fault tolerant
- Claims that this doesnt change problem
89Basic Component
- Start off with basic system
- System has internal checks
- Fault detectability t
- Fault locatability l
B
90Basic Component
- Then replicate it k-fold
- Assumptions
- copies are independent
- (i.e. do not affect each others data)
- Each system produces one data element
B1
B2
Bk
91Basic Component
- Then replicate it k-fold
- And add additional checks across all copies
- Process repeated d-1 times to get d-level
hierarchical system
B1
B2
Bk
c1
c2
cr
92Detectability 1?k?h
- Theorem 1
- If 1?k?h then hierarchical system can detect
?B?kd-1 errors - Proof
- Base case d2
- Suppose every element has error
- Each check must deal with k?herrors
- But they are (g,h) checks andwill detect such
errors - Thus, system can detect ?B?k errors
B1
B2
Bk
c1
c2
cr
93Detectability 1?k?h
- Theorem 1
- If 1?k?h then hierarchical system can detect
?B?kd-1 errors - Proof
- Inductive case d1
- Components Bi each have ?B?kd-2elements
- By argument above, system detects
?(B?kd-2)?kB?kd-1 errors - Argument works because sub-systemsat each level
produce one data element
B1
B2
Bk
c1
c2
cr
94Detectability kgth
- Theorem 2
- If kgth then hierarchical system can detect
?(t1)(h1)d-1-1 errors - Proof
- Base case d2
- Suppose (t1)(h1) errors with h1 copies of B
having t1 errors each - Detectability of B t, so internalchecks will
not notice errors - 2nd level checks will get h1 errors each will
not notice - Thus, ? error pattern of size (t1)(h1) that
will not be detected
B1
B2
Bk
c1
c2
cr
95Detectability kgth
- Theorem 2
- If kgth then hierarchical system can detect
?(t1)(h1)d-1-1 errors - Proof
- Base case d2
- Suppose (t1)(h1)-1 errors
- By pigeonhole principle, some unithas ?t errors
or some 2nd levelcheck has ?h errors - Thus, some check at 1st or 2nd levelwill ring
- Thus, system detectability (t1)(h1)-1
B1
B2
Bk
c1
c2
cr
96Detectability kgth
- Theorem 2
- If kgth then hierarchical system can detect
?(t1)(h1)d-1-1 errors - Proof
- Inductive case d1
- Components Bi detect ?Td errors
- By induction, Td (t1)(h1)d-1-1
- By argument above, system detects ?(Td1)(h1)-1
errors - Thus, system detectability (t1)(h1)d-1
B2
B1
Bk
c1
c2
cr
97Locatability
- Theorem 3
- If kgt1 then hierarchical system can locate
?2d-1(l1)-1 errors - Proof
- Base case d2
- Suppose fault pattern of 2(l1) errors, l1
errors in two Bis - Bi Bj cant locate the errors
- 2nd level checks may locate erroneous rows, not
columns - Thus, ? unlocatable fault pattern of size 2(l1)
B2
Bk
B1
c1
c2
cr
98Locatability
- Theorem 3
- If kgt1 then hierarchical system can locate
?2d-1(l1)-1 errors - Proof
- Base case d2
- Suppose fault pattern of 2(l1)-1
- At most one Bi may have ?l1 errors
- If none do, were done
- Remaining ?l errors distributed among other Bjs
B2
B1
Bk
c1
c2
cr
99Locatability
Bi
Bj
Bk
c1
c2
cr
- Let Bi have lr errors (r?1)
100Locatability
Bi
Bj
Bk
c1
c2
cr
- Let Bi have lr errors (r?1)
- Remaining Bjs share remaining l-r1 errors
- ?(lr)-(l-r1)2r-1 rows only have errors in Bi
- 2r-1 rows when all l-r1 errors are in same Bj
101Finding Overwhelmed Unit
- First, find the Bi that have gtl errors
- All but one sub-system detects and locates errors
correctly - Overwhelmed subsystem
- Detects correctly
- Locatability l ? Detectability gt 2l
- Citation of 1973 paper by Russel Kime
- Error location mistakes
102Finding Overwhelmed Unit
- In ?2r-1 rows only Bi has error
- Thus, no other row will claim an error there
- 2nd-level checks will catch these errors
- Bis checks cant lie about it
- Will definitely know these are errors
Bi
Bj
Bk
l1
Known errorsUknown errors No error
?2r-1
103Finding Overwhelmed Unit
- Number of errors in Bi lr
- Number of known errors ? 2r-1
- Number of unknown errors in Bi ? (lr)-(2r-1)
l-r1 - Since r?1, l-r1?l
- Bis checks can identify ?l errors
- Error patterns ?l produce unique check alert
patterns - This data enough to identify remaining unknown
errors
104Locatability
- Theorem 3
- If kgt1 then hierarchical system can locate
?2d-1(l1)-1 errors - Proof
- Base case d2
- Can Locate errors size ?2(l1)-1
- Inductive case d1
- Components Bi can locate ?2d-1(l1)-1 errors
- By argument above, system locates
?2(2d-1(l1)-1)1-1 2d(l1)-1 errors
105Summary
- Presented systematic way to build hierarchical
systems with good fault-detection properties - For d-level system composed of identical
independent components - Component detectabilityt, locatabilityl
106Conclusion
- Formalisms for analyzing fault detectability
locatability - Matrix-based formalism of Nair et al
- Dependence graph-based formalism of Park et al
- Includes fault propagation models
- Framework for hierarchical fault tolerant systems
by Nair et al - Building fault tolerant systems out of fault
tolerant components
107Conclusion
- These schemes have complex rules for acceptable
check placements - Requires detailed analysis of system to place
them manually - More detailed analysis if checks are
hand-designed - Likely since few known automatic techniques
- Overall, approach can support automatic solutions
but currently very manual