A Theory of FaultTolerance - PowerPoint PPT Presentation

About This Presentation
Title:

A Theory of FaultTolerance

Description:

E.g., Cruise control that only works in wet weather ... T. p. 16. Levels of Fault-Tolerance. Failsafe (program p' is failsafe f-tolerant for spec from S) ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 20
Provided by: ebne
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: A Theory of FaultTolerance


1
  • A Theory of Fault-Tolerance

2
Unifying Fault-Tolerance Approaches
  • Several disciplines with focus on different
    faults and specific architectures
  • Crash recovery
  • Atomic transactions
  • Fault-tolerance of digital systems
  • Fault-tolerance in message-passing systems
  • Verification of fault-tolerance
  • Application-specific
  • Verify recovery and safely terminate (mask the
    faults)
  • Less attention given to non-maskable faults

Arora 1992 A foundation of fault-tolerant
computing, PhD thesis, University of Texas-
Austin, 1992.
3
A Foundation of Fault-Tolerant Computing
  • Provide a uniform definition of fault-tolerance
  • Provide verification methods independent of
    technology, architecture, or application

Arora 1992 A foundation of fault-tolerant
computing, PhD thesis, University of Texas-
Austin, 1992.
4
Program and Fault
  • Program model synchronization skeleton of
    finite-state programs
  • Finite number of variables with finite domains
  • Finite number of processes
  • State a valuation of program variables
  • Finite state space Sp
  • Program p, Fault f ?
    Sp ? Sp
  • Use Dijkstras Guarded Commands (actions) as a
    shorthand to represent program and fault
    transitions
  • Guard ? Statement

Sp
Program
Fault
5
Examples of Intermittent Faults
  • Intermittent faults
  • Sudden acceleration in cruise control systems
  • E.g., Cruise control that only works in wet
    weather
  • Malfunction in a component of an electronic
    circuit when the voltage goes beyond a threshold
  • x and y are two points of contacts in a circuit
    that have independent voltages. However, when the
    voltage level of x goes beyond 3.5 v, y gets the
    same voltage as x. We model this class of faults
    by the following guarded command
  • x gt 3.5 ? y x

6
Examples of Transient Faults
  • Transient faults
  • A hardware interrupt routine gets called without
    any interrupt being raised by hardware devices
  • Solar radiation corrupts the communication and
    the navigation systems
  • The variables of the controlling software of
    space shuttles may be corrupted by transient
    solar radiations
  • true ? x ?
  • The above guarded command means that at any state
    of the system, the variable x may be corrupted
    due to transient faults

7
Transient vs. Intermittent Faults
  • Transient faults are difficult (if not
    impossible) to reproduce
  • Can we reproduce solar radiations?
  • Intermittent faults may be reproduced under
    certain conditions
  • E.g., pressing the Ctrl key causes the system
    to reset

8
State Predicate
  • State predicate X X ? Sp
  • Closure X is closed in p
  • Projection pX
  • (s0, s1) (s0, s1) ? p ? s0 ? X ? s1 ? X

Sp
9
Program Computations
  • Program computations
  • Infinite sequences of program transitions

10
Specification, Invariant, and Fault-Span
  • Safety specification something bad never
    happens
  • Formal representation ? Sp ? Sp
    (set of bad transitions)
  • E.g., transitions that change the value of a
    counter from non-zero values to zero
  • Liveness specification something good will
    eventually happen
  • In the absence of faults, fault-tolerant program
    p satisfies the liveness specification of the
    fault-intolerant program p
  • Invariant S, fault-span T ? Sp

Sp
11
Token Ring Example
  • Processes P0, P1, P2, P3
  • Variables x0 , x1 , x2 , x3 (domain 0, 1,
    ?)
  • Dijkstras Guarded Commands (actions)
  • Guard ? Statement
  • Fault-intolerant program
  • Process P0
  • TR0 (x0 1) ? (x3 1) ? x0 0
  • TR0 (x0 0) ? (x3 0) ? x0 1

12
Token Ring Example Continued
  • Processes P1, P2, P3
  • TRi (xi 0)?(x(i-1) 1) ? xi 1
  • TRi (xi 1)?(x(i-1) 0) ? xi 0
  • Fault transitions process-restart
  • true ? xj ?

13
Token Ring Example Continued
  • Invariant
  • (state is represented as a tuple ltx0, x1, x2,
    x3gt)
  • lt0, 0, 0, 0gt, lt0, 1, 1, 1gt,
  • lt1, 0, 0, 0gt, lt0, 0, 1, 1gt,
  • lt1, 1, 0, 0gt, lt0, 0, 0, 1gt
  • lt1, 1, 1, 0gt,
  • lt1, 1, 1, 1gt,
  • Safety Specification
  • Corrupted value does not affect a non-corrupted
    process
  • There is only one token in the ring
  • Liveness of the fault-intolerant program
  • Token should be circulated infinitely often

14
Defining Fault-Tolerance Closure
  • Let S be a state predicate of a program p,
  • S is closed in p iff for every action G -gt
    st
  • executing st in a state of (S ? G) results in
    a state in S

Sp
15
Defining Fault-Tolerance Convergence
  • Let S and T be state predicates of program p
  • T converges to S in p iff
  • S is closed in p
  • T is closed in p
  • Starting in T, each computation of p reaches a
    state in S

Sp
16
Levels of Fault-Tolerance
  • Failsafe (program p is failsafe
    f-tolerant for spec from S)
  • Guarantee safety in the presence of faults
  • Nonmasking (program p is nonmasking f-tolerant
    for spec from S)
  • Guarantee recovery in the presence of faults
  • Masking (program p is masking f-tolerant
    for spec from S)
  • Guarantee safety and recovery in the presence of
    faults

Sp
Safety-violating transitions
17
Component-Based Design of Fault-Tolerance
  • A fault-tolerant program
  • A fault-intolerant program
  • Fault-tolerance components
  • Two types of fault-tolerance components necessary
    and sufficient for the design of faults
    tolerance
  • detectors and correctors

Kulkarni 1999 Component-Based Design of
Fault-tolerance, PhD thesis, The Ohio State
University, 1999.
18
Synthesis of Fault-Tolerance
  • It is difficult to anticipate all classes of
    faults at the design time
  • New classes of faults requires the addition of
    corresponding level of fault-tolerance
  • Can we do it automatically?

Fault-intolerant program p
Synthesis Algorithm
Fault-tolerant program p
f
Ebnenasir 2004 Automatic Synthesis of
Fault-tolerance, PhD thesis, Michigan State
University, 2004.
19
Conclusion
  • Fault-tolerance is an important factor in the
    survivability of software systems
  • A well-defined need for
  • the design of correct fault-tolerant programs
  • the design of programs that tolerate multiple
    classes of faults (multitolerance)
  • development methodologies that provide
    correctness guarantees
  • Automatic addition of fault-tolerance generates a
    program that is correct by construction
  • Future work
  • Developing tools for automation
Write a Comment
User Comments (0)
About PowerShow.com