SENG 521 Software Reliability - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

SENG 521 Software Reliability

Description:

Department of Electrical & Computer Engineering, University of Calgary ... system will continue without the fault immediately manifesting itself once more. ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 22
Provided by: Behrou3
Category:

less

Transcript and Presenter's Notes

Title: SENG 521 Software Reliability


1
SENG 521Software Reliability Testing
  • Fault Tolerant Software Systems Techniques (Part
    4a)

Department of Electrical Computer Engineering,
University of Calgary B.H. Far (far_at_enel.ucalgary.
ca) http//www.enel.ucalgary.ca/far/Lectures/SENG
521/04a/
2
What Is Fault Tolerance
  • A fault-tolerant computing system must be capable
    of providing specified services in the presence
    of a bounded number of failures.
  • These failures could occur because of faults
    present in either the components of the system or
    in the systems design.
  • Building large computing systems is a complex
    task fault-tolerance requirements could make the
    task even more difficult unless appropriate
    system structuring concepts are utilized.

3
Problems
  • The traditional approaches to fault tolerance in
    hardware systems have been based on coping with
    the effects of well-understood failure modes of
    physical components.
  • Conventional hardware fault tolerance methods are
    rarely powerful enough to cope with deficiencies
    of design.
  • Consequently, most hardware fault tolerance
    techniques cannot be applied in software, where
    almost all faults are design faults.

4
History
  • Defensive programming
  • Implementing relatively ad hoc methods are used
    to minimize the damage which could arise from the
    damage of presence of residual bugs.
  • Dual software technique
  • Implementing two distinct versions of the same
    software and executing them. Any discrepancy in
    the outputs of the two versions may trigger an
    alarm.

5
Fault Tolerance Phases /1
  • Phase 1 Error detection
  • For a fault to be tolerated, it must first be
    detected. Thus the starting point for
    fault-tolerance techniques is observing failures.
  • Phase 2 Damage assessment
  • It is necessary to assess the extent to which the
    system state has been damaged or corrupted.
  • If the delay involved between the manifestation
    of a fault (failure) and the detection of its
    cause (error) is large then it is likely that the
    damage to the system state will be more severe
    than if the latency interval were shorter.

6
Fault Tolerance Phases /2
  • Phase 3 Error recovery
  • Error recovery techniques must be utilized in
    order to obtain a normal, error-free system
    state.
  • There are two different kinds of recovery
    technique.
  • Backward recovery technique consists of
    discarding the current (corrupted) state in favor
    of an earlier state Therefore, mechanisms are
    needed to record and store system states.
  • Forward recovery technique involves making use of
    the current (corrupted) state to construct an
    error-free state.

7
Fault Tolerance Phases /3
  • Phase 4 Fault treatment continued service
  • Once recovery has been undertaken, it is
    essential to ensure that the normal operation of
    the system will continue without the fault
    immediately manifesting itself once more.
  • The first aspect of fault treatment is to attempt
    to locate the fault.
  • Following this, steps can be taken either to
    repair the fault or to reconfigure the rest of
    the system to avoid the fault.

8
Recovery Block Mechanism
  • Syntax of a recovery block construct
  • ensureltacceptance testgtby P0 else-by P1 else
    fail
  • It depicts a software system with 3 components,
    the two procedures P0 (the primary) and P1 (the
    alternative), and the acceptance test.
  • The design of the system is the control structure
    implied by the syntax.
  • Assume that the acceptance test is perfect (i.e.,
    detects all violations of the specification) then
    the recovery block P1 will tolerate all the
    faults of procedure P0 that could lead to its
    failure.

9
Example
  • Fault tolerance phases
  • Error detection acceptance test (a Boolean
    expression) is used.
  • Damage assessment only the program in execution
    is assumed to be affected.
  • Error recovery (backward in this case) consists
    of recovering the state of the executing program
    to that at the beginning of the recovery block.
  • Fault treatment the program in execution
    (primary or alternative) is assumed to be faulty,
    so its faults are avoided by executing the next
    alternative (if any).

10
Design Technique /1
  • Robust Software Systems (Anderson and Lee 1981,
    etc.)
  • Construction of a robust module requires
  • Exception handlers for coping with exceptions
    propagated from lower levels and
  • Boolean expressions for detecting exceptions
    arising in the module itself, and their exception
    handlers.
  • It is often possible (and desirable for the sake
    of simplicity) to map several exceptions onto a
    single handler.

11
Design Technique /2
  • Assuming the use of data abstractions (abstract
    data types) in program development.
  • The software system is structured into a
    hierarchy of modules represented by an acyclic
    graph.
  • Modules are represented by nodes and arrow from a
    node A to node B means that there are one or more
    operations in A that a successful completion of
    that operation depends on the successful
    completion of some operation provided by B in
    other words, B provides certain services to A.

12
Design Technique /3
  • A normal chain of events consist of some
    procedure of A making a call on B, and B
    calls a lower level module (say F), this call
    returns normally, and subsequent As call returns
    normally.

13
Design Technique /4
  • Exception cases
  • A call from B to a lower level module returns
    an exception and this is passed to A
  • A call from B to a lower level module returns
    an exception but B has exception handlers that
    can handle this and provides a normal service to
    A
  • A Boolean expression in B - inserted specifically
    for detecting an error (exception) - evaluates to
    false. This is handled by either of
  • Exception is masked, in which case B will
    return normally to A
  • An exceptional return is obtained by A

14
Notation
  • A procedure P, besides the normal return, also
    provides an exceptional return E
  • procedure P(--) signals E
  • The invoker of P can define the exceptional
    continuation to be some operation H which is
    called the handler of E
  • P(--) E ? H
  • In P the following constructs can be inserted
  • T? .. signal E (1)
  • OL? .. signal E (2)
  • (1) represents an exception is detected by a run
    time test T.
  • (2) represents the case when invocation of an
    operation 'O' results in an exceptional return L
    which in turn could lead to the signaling of
    exception E.
  • When an exception is signaled using construct (1)
    or (2), the control passes to the handler of that
    exception (H).

15
Example Expected Events
  • Design of a procedure P which adds three positive
    integers.
  • The procedure uses operation ' ' and an
    overflow signal exception 'OV'.
  • procedure P (var i,j,kinteger) signals OW
  • begin
  • iij OV ? signal OW
  • iik OV ? ii-j signal OW
  • end
  • An important aspect of exception handling
    clean-up operation
  • If all the procedures of a module follow this
    strategy, we get a module with the following
    highly desirable property
  • Either the module produces results that reflect
    the desired normal service to the caller, or no
    results are produced and an exceptional return is
    obtained by the caller.

16
Unexpected Events /1
  • The execution of P does not terminate.
  • A lower level exception is detected for which
    there is no exception handler in P.
  • The execution of P terminates normally (the
    invoker obtains a normal return) but the results
    produced by P are not in accordance with the
    specification.
  • Situations (1) and (2) will eventually cause a
    failure of the module situation (3) represents
    the case where the module has failed but this
    event has not yet been detected by the system.

17
Unexpected Events /2
  • To cope with such cases, we can employ a default
    exception handler
  • procedure P (--) signals E
  • begin
  • end? "default handler"
  • The control goes to this handler during the
    execution of P whenever an exception is detected
    for which there is no handler.

18
Unexpected Events /3
  • Case (1) It is possible to start a timer
    concurrently with the invocation of P the time
    out exception will then be handled by the
    default handler.
  • Case (2) All the lower level exceptions with no
    programmed handlers will similarly be handled by
    the default handler.
  • Case (3) Make use of run time checks to detect
    possible violations of specifications to minimize
    the danger of undetected failures.

19
Unexpected Events /4
  • What strategy should be adopted by the default
    handler?
  • The simplest thing to do is to undo any
    side-effects produced by the procedure and to
    signal a fail exception.
  • When the invoker receives a fail exception, it
    means that the called module has failed to
    provide the specified service.

20
Design Guidelines
  • For a given module, carefully analyze the cases
    that could prevent the module from providing the
    desired normal services.
  • Make use of exception handlers either to mask the
    effects of such undesired, but expected,
    exceptions or to signal an appropriate exception
    to the caller of the module.
  • Make use of default exception handlers or
    recovery blocks to obtain a measure of tolerance
    against design faults.

21
Discussion
  • The capability of tolerating design faults rests
    largely on the coverage of run-time checks
    (i.e. acceptance tests) for detecting errors.
  • Often, it is not possible to check completely
    within a procedure that the results produced have
    been according to the specification (e.g. for a
    routine that sorts its input, the check that the
    output has been sorted would be as complex as the
    routine itself).
  • Hence run-time checks are often limited to
    checking certain critical aspects of the
    specification.
  • This means that the possibility of undetected
    failures cannot be ruled out entirely.
Write a Comment
User Comments (0)
About PowerShow.com