Failure Mode Assumptions and Assumption Coverage - PowerPoint PPT Presentation

About This Presentation
Title:

Failure Mode Assumptions and Assumption Coverage

Description:

At what rate they may fail? The Amount of redundancy needed ... Single-user Service Model. Service items: si, i=1,2,... Values of si: vsi ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 37
Provided by: Q63
Learn more at: https://www.cs.wustl.edu
Category:

less

Transcript and Presenter's Notes

Title: Failure Mode Assumptions and Assumption Coverage


1
Failure Mode Assumptions and Assumption Coverage
  • David Powell

2
Fault-Tolerance
  • Key questions
  • How components may fail?
  • ? Prevention strategies
  • At what rate they may fail?
  • ? The Amount of redundancy needed
  • What are the important type of faults?
  • Types of redundancy needed
  • The relation between dependability, redundancy
    and faults?
  • General FT design guidelines

3
An F-T Paradox/Dilemma
  • More faulty
  • ? More redundancy
  • ?More possibility of faults
  • ???

4
Solution- Some Key Steps
  • Classify, quantify and verify the assumptions

5
Type of Failures
6
Overview
  • Single-user service
  • Service Model
  • Potential Errors
  • Multiple-user service
  • Service Model
  • Potential Errors

7
Single-user Service Model
  • Service items si, i1,2,
  • Values of si vsi
  • Observation time of si tsi
  • Service Model
  • Si ltvsi, tsigt
  • An omniscient observer

8
Correctness Model
  • Service item si is correct iff
  • (vsi? SVi) ? (tsi? STi)
  • SVi and STi are respectively the specified sets
    of values and times for service item si

9
Potential Errors
  • Arbitrary value error si vsi? SVi
  • Noncode error si vsi? CV (CV defines a code)
  • Arbitrary timing error si tsi? STi
  • Early timing error si tsi lt min(STi)
  • Late timing error si tsi gt max(STi)
  • Omission error si tsi ?
  • Impromptu error si (vsi ?) ? (tsi ?)

10
Multi-user Service Model
  • Service item sisi(1), si(2),, si(n),
  • Service model ltvsi(u), tsi(u)gt, all i,u
  • New issues consistency

11
Correctness Model
  • vsi(u) the value of service item i on process u
  • vsi-- the value of service item i
  • SVi the set of specified service item i
  • tsi(u) the observation time of service item i on
    process u
  • STi(u) the range of specified observation time
    of service item i on process u
  • ?uv -- the time bound of related occurrences

12
Examples of Potential Errors
  • Consistent value error
  • Consistent timing error
  • Semi-consistent value error

13
Failure Mode Assumptions
  • Attempt to formalize the concept of an assumed
    failure mode
  • By assertions on the sequences of service items
    delivered by a component

14
Examples of Value Error Assertions
  • No value errors occur (Vnone)
  • ?i , vsi ? SVi
  • The only value errors that occur are noncode
    value errors (Vn)
  • ?i , (vsi ? SVi) ? (vsi? CV )
  • Arbitrary value error can occur (Varb)
  • ?i , (vsi ? SVi) ? (vsi? SVi )

15
Examples of Timing Error Assertions
  • No timing error occurs (Tnone)
  • The only timing errors are omission errors (TO)
  • The only timing errors are late timing errors
    (TL)
  • The only timing errors are early timing errors
    (TE)
  • Arbitrary timing error can occur (Tarb)
  • Permanent omission/crash (Tp)
  • Bounded omission degree (TBk)

16
Timing Error Implications
17
Failure Mode Assertions(FMA)
  • A complete FMA entails an assertion on errors
    occurring on both value and time domains
  • By taking the Cartesian production of the two
    domains, we get a family of FMA

18
FMA Implication Graph
19
So what?
  • The FMA classification and implication graph can
    serve as a guideline to design families of FT
    algorithms that can process errors in increasing
    severity!

20
Assumption Coverage
  • Establishing a link between assumed component
    failure mode and system dependability
  • (The design a FT system relies on the assumption
    they make)
  • (The dependability of a FT system is related to
    the failure mode they assume)

21
Motivation
  • Components may fail
  • They may fail in a bad way ? leads to a violation
    of assumptions of the system
  • The system, in turn, can fail
  • Question to what degree can a component FMA
    prove to be true in the real system?

22
The Coverage of the Assumption
  • Definition
  • P(X) Pr X true component failed
  • P(Varb ? Tarb) 1
  • P(Vnone ? Tnone) 0

23
Coverage of an FT system
  • PS(X)
  • Pr correct error processing X true
  • Pr X true component failed

24
Influence of Assumption Coverage on System
Dependability
  • A Case Study

25
The System
  • A system of n processors
  • Connected via unidirectional message-passing bus
  • Each processor carries out the same computation
    steps
  • The result of each processing step is
    communicated to all other processors
  • Each process has a decision function (DF)
  • The DF is applied to the results received from
    other processors
  • Each processor and its associated bus is viewed
    as a single component

26
Fail-Silent Processor-bus
  • A fail-silent processor
  • Only has semi-consistent value errors
  • Always produces message on time
  • Or ceases to produce messages forever
  • If a message is delivered to a processor, it is
    to be delivered to all processors with consistent
    fixed delay

27
Fail-Consistent Processor Bus
  • Only semi-consistent value errors may occur
  • Faulty processors may send erroneous values
  • Consistent timing error may occur

28
Fail-uncontrolled Processor Bus
  • Arbitrary timing error
  • Arbitrary value error

29
Implications of Assumption Coverage
  • Failure mode relations
  • Coverage relations

30
Dependability Expressions From Markov Models
  • r e ?t
  • ? failure rate

31
A Life-critical Application
  • System reliability objective R gt 1-10-9 over 10
    hours
  • Single processor reliability
  • r e-?t
  • 1/? 5 years

32
(No Transcript)
33
A Money-Critical Application
  • It is about availability of the system rather
    than reliability of the system
  • Please look at the paper for more details

34
Unavailability v.s. Coverage
35
Conclusion
  • A formalism for describing component failure
    modes
  • Multiplicity of value and timing errors
  • The notion of assumption coverage
  • The relation between dependability, availability
    and assumption coverage

36
Thank you
Write a Comment
User Comments (0)
About PowerShow.com