Fault Tolerance in Distributed Systems - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Fault Tolerance in Distributed Systems

Description:

Title: PowerPoint Presentation Last modified by: G kay Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 45
Provided by: edut1551
Category:

less

Transcript and Presenter's Notes

Title: Fault Tolerance in Distributed Systems


1
Fault Tolerance in Distributed Systems
  • Gökay Burak AKKUS
  • Cmpe516 Fault Tolerant Computing

2
Distributed Systems
  • Main focus on Services based systems
  • Web Services
  • Grid Computing...

3
Service Orientation
  • diverse programming languages
  • on diverse platforms
  • Span organisational boundaries
  • Service Oriented Architectures (SOA)
  • Web Services
  • Grid Computing
  • SOA is an architectural model that emphasises
    properties of interoperability and location
    transparency
  • Collection of services
  • each service can be considered as a resource that
    is either provided or consumed

4
Dependability
  • Dependability is a collective term that
    encompasses
  • Reliability
  • Performance
  • Maintainability
  • Security
  • Reliability is the part of dependability
    concerned with the probability that a given
    system will behave according to its requirements

5
SOAs
  • the development and integration of complex
    systems by representing software functionality as
    discoverable services on a network.
  • A traditional way to increase the dependability
    of distributed systems is through the use of
    fault tolerance techniques

6
  • The approach of design diversity
  • Multi-Version design (MVD)
  • availability of multiple functionally-equivalent
    services

7
Comparison
  • Single-version system
  • Traditional MVD system
  • Provenance-aware MVD system

8
CMF
  • Common mode failure
  • one of shared services fail,
  • then the failure may propagate back to the
    calling services.
  • occurs when independent or nonindependent faults
    lead to similar errors between versions of an MVD
    system.

9
  • Such failures are a worst case scenario in a
    fault-tolerant system as such failures may be
    passed through the system undetected
  • often safer to return no result, and alert an
    operator and/or place a system in a safestate,
    than it is to allow an undetected error occur.

10
(No Transcript)
11
CMF by failure of a shared service
  • reduces the confidence that can be placed in the
    results of design diversity-based fault tolerance
    schemes
  • Provenance introduced as a solution to this
    problem

12
Provenance
  • The provenance of a piece of data is the
    documentation of process that led to that data.
  • Provenance can be used for
  • verifying a process,
  • reproduction of a process
  • and providing context to a piece of result data

13
Provenance in the context of SOAs
  • interaction provenance
  • for some data, interaction provenance is the
    documentation of interactions between actors that
    led to the data
  • actor provenance
  • For some data, actor provenance is documentation
    that can only be provided by a particular actor
    pertaining to the process that led to the data
  • In a workflow based SOA interaction, provenance
    provides a record of the invocations of all the
    services that are used in a given workflow,
    including the input and output data of the
    various invoked services.

14
Usage of provenance
  • Through an analysis of interaction provenance,
    patterns in workflow execution can be detected
  • The data of whether a common service was invoked
    by various other services in a workflow can be
    used in a fault tolerance algorithm to see if any
    faults in a workflow stem from the misbehaviour
    of one service.

15
  • Provenance provides a picture of a system's
    current and past operational state, which can be
    used to isolate and detect faults
  • A scheme that performs voting on the results of
    functionally-equivalent services in order to mask
    faults of the fault model (next slide) is proposed

16
(No Transcript)
17
(No Transcript)
18
PReServ
  • Provenance Recording for Services
  • a Java-based Web Services implementation of the
    Provenance Recording Protocol
  • provenance aware SOA by using 3 components
  • A provenance store that stores, and allows for
    queries of provenance
  • A client side library for communicating with the
    provenance store
  • A handler for the Apache Axis Web Service
    container that automatically records interaction
    provenance for Axis based services and clients by
    recording incoming and outgoing SOAP messages in
    a specified provenance store.

19
MVD system
  • A service i invokes k services in its workflow
  • a counter Ck stores the number of times a service
    k is invoked by MVD channel workflows in the
    system.
  • if i produces a result that agrees with the
    consensus result, then every Sk in that services
    workflow is increased by one, else Sk is set to
    0.
  • weightings of each service k is then calculated as

20
Voting
  • FT Grid system used for voting
  • Based on weighting eliminated results are
    obtained
  • User defined values are also added for voting
    process

21
  • If a service k1 has a degree of 1, then only one
    MVD channel invokes that service
  • If k1 has a degree of 2, then two MVD channels
    invoke it
  • then bias the weightings of Sk based on
    user-defined settings
  • Example
  • a user specifies a bias of 0.95 for a servicewith
    a degree of 2
  • then the final weighting of a service where Si
    has a degree of 2
  • Wi Si 0.95
  • if any service within a given channel fall below
    a user-defined minimum weighting, then that
    channel is discarded from the voting process.

22
Experiments
  • a total of 12 web services developed and spread
    across 5 machines
  • using Apache Tomcat/Axis as a hosting environment
  • each with provenance functionality, and each
    registered with a UDDI server.
  • 5 Import Duty services developed
  • 4 Exchange Rate services developed
  • 3 Tax Lookup services developed

23
(No Transcript)
24
  • simulate a design defect and/or malicious attack
    by perturbing code in two of the exchange rate
    services ER3 and ER4
  • probability of failure (in this case, returning
    an incorrect value) of 0.33 and 0.5 respectively.

25
Applied Experiments
  • Experiment 1
  • Execute a single version client-side application
    that invokes a random import duty service,
    passing it a randomly generated set of
    parameters.
  • then compare the result it receives against the
    fault-free local import duty service, and logs
    whether or not a correct answer has been returned.

26
  • Experiment-2
  • execute a client-side MVD application with no
    provenance capability
  • application invokes all 5 import duty services,
    and waits for the first three results to be
    returned.
  • application discards the results of any import
    duty service whose weighting falls below a
    user-defined value, and performs consensus voting
    on the remaining results.
  • if no consensus be reached, or the number of
    channels to vote on are less than three, then the
    client waits for an additional MVD channel to
    return results,
  • checks the channels weighting to see whether it
    should be discarded, and then votes accordingly.
  • consensus is reached, or all 5 channels have been
  • This continues until either consensus is reached,
    or all 5 channels have been invoked
  • then compare the results

27
  • Experiment-3
  • execute an MVD client-side application with
    provenance capability.
  • Client invokes all 5 import duty services, and
    waits for the first three results to be returned.
  • Analyzes provenance records of these channels,
    and discards the results of any channel that
    includes a service that falls below a minimum,
    user-defined weighting.
  • if no consensus be reached, or the number of
    channels to vote on be less than three, then the
    MVD application waits for an additional channel
    to return results, checks to see if this channel
    should be discarded, and then votes accordingly.
  • This continues until either consensus is reached,
    or all 5 channels have been invoked
  • Results from the voter are then compared against
    the local fault free import duty service.

28
Experimental Results
  • Each experiment iterates 1000 times
  • Each experiment is repeated three times.
  • test system
  • Apache Tomcat 5.0.28
  • Web Services implemented using Apache Axis 1.1,
  • 5 dual 3Ghz Xeon processor machines
  • Fedora Core Linux 2

29
Generation of Weightings
  • history-based weighting scheme used
  • a client application similar to provenance-aware
    MVD scheme is ran
  • history weightings based on the consensus results
    of 1000 invocations of all five import duty
    services
  • No logging or verification of results

30
(No Transcript)
31
(No Transcript)
32
  • the weightings of ER3 and ER4 show significant
    deviations
  • This is due to the faults that are injected into
    ER3 and ER4
  • Based on the results
  • minimum acceptable weightings are set

33
(No Transcript)
34
Experiment 1- Single version system with no
provenance capability
  • 1000 tests on a random import duty service
  • 164 incorrect results
  • 16.4 undetected incorrect results
  • Time for UDDI query of import duty service
    279.72 ms
  • Total time until a result 3895 ms.

35
(No Transcript)
36
  • Common-mode failures are frequent
  • each channel has an approximately the same
    weighting value as there is no provenance data
  • So unreliable channels are not discarded from
    voting
  • Total time for result 4842 ms
  • 1 sec longer

37
(No Transcript)
38
MVD system with provenance capability
  • No single common-mode failure occurs
  • Timing approximately the same value of
    experiment-2

39
(No Transcript)
40
Conclusion
  • Solutions for the provision of dependability in
    service-oriented architectures are needed
  • Approach To extend the concept of
    design-diversity-based fault tolerance schemes
    (such as multi-version design) to the
    service-oriented paradigm
  • Leverage the benefits of SOAs in order to produce
    cheaper MVD systems that has traditionally been
    the case
  • Problem Without the knowledge of the workflow of
    the services that forms channels within the MVD
    system, the potential arises for multiple
    channels to depend on the same service
  • Lead to increased incidence of common mode failure

41
Conclusion
  • The technique of provenance to analyze a
    services workflow is proposed
  • An initial scheme that uses provenance to
    calculate weightings of channels within an MVD
    system based on their workflow is detailed
  • A system is implemented to demonstrate the
    effectiveness of the scheme
  • Three different client applications is used to
    test approach
  • Single-version system Fail on 16.4 of test
    iterations
  • Traditional MVD fault tolerance Fail on 7.6 of
    test iterations
  • Provenance-aware MVD scheme Failure rate of 0.6
  • More dependable, no-common mode failures
    occurring negligible performance overhead

42
Finally
  • This paper
  • Details the potential for provenance data to be
    used during the voting process of an MVD scheme
  • Implements an initial proof-of-concept for the
    approach
  • Future work will include
  • investigation into obtaining QoS indicators from
    the metadata of each service in an MVD channels
    workflow (facilitated through actor provenance)
    and applying these to the weighting algorithm
  • investigating the relationship between shared
    components and common-mode failure in more detail
    (to more finely tune voting scheme)

43
References
  • A Provenance-Aware Weighted Fault Tolerance
    Scheme for Service Based Applications, 2005
  • FT-Grid A Fault-Tolerance System for e-Science,
    2005

44
Questions?
Write a Comment
User Comments (0)
About PowerShow.com