Adaptive and Distributed SystemLevel Diagnosis - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Adaptive and Distributed SystemLevel Diagnosis

Description:

After executing the task the two units send their outputs to the tester; ... The tester keeps this information until it determines the state of the tested unit: ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 24
Provided by: diUn
Category:

less

Transcript and Presenter's Notes

Title: Adaptive and Distributed SystemLevel Diagnosis


1
Adaptive and Distributed System-Level Diagnosis
Computer Science Department, University of Pisa
  • Seminars for the PhD in Computer Science
  • Luiz Carlos Pessoa Albini

2
Adaptive Diagnosis
  • Introduced by Nakajima in 1981
  • Centralized
  • The central diagnoser chooses the order of the
    tests to be executed by the units during the
    diagnosis process
  • Complete testing graph
  • Any unit is capable of testing any other unit
  • Objective identify one faultfree unit which
    will be used as a tester of the rest of the
    system.
  • Main Idea
  • Units decide the next tests based-on previous
    tests results.

3
Distributed Diagnosis
  • Does not exist a central diagnoser
  • Units that perform the tests achieve diagnosis
  • Algorithm is executed on many or all units
    simultaneously
  • The complexity of a distributed diagnosis
    algorithm depends on the test and communication
    interconnection
  • If the system test graph satisfies only the
    diagnosability requirements the distributed
    diagnosis algorithm will probably be complex
  • If the interconnection network is regular then
    the distributed diagnosis algorithm may be
    simplified by making use of the structure of the
    system.

4
SELF
  • Kuhl and Reddy in 1981 introduce the distributed
    system-level diagnosis
  • SELF algorithm
  • Tests are fixed (non-adaptive)
  • Each unit tests its pre-designated neighbors
  • Broadcast the test results
  • Each unit performs the diagnosis based on its own
    tests and on the received testing results
  • Diagnosis can be incorrect because broadcasts can
    be routed by faulty units.

5
New_SELF
  • Proposed by Hosseini, Kuhl and Reddy in 1984
  • New_SELF
  • First algorithm for distributed diagnosis of
    distributed systems
  • Works correctly if the number of faults is no
    more than t
  • Fixed test assignment (non-adaptive)
  • Each unit independently determines the state of
    the system based on the results of its own tests
    and on the results received from other units
  • Failures and repairs are accepted, but a unit
    cannot fail and then recovery without its failure
    being detected
  • Ensures the accuracy of the test-results by
    restricting the forwarding of testing results to
    fault-free units
  • Requires that every fault-free unit receives all
    test results from other fault-free units.

6
Somani and Agarwal
  • Distributed local diagnosis approach for regular
    structures proposed by Somani and Agarwal in
    1989
  • For most regular structures, all fault sets of
    size up to t are uniquely diagnosed by the
    algorithm in linear time complexity
  • For large regular structures, most fault sets of
    size up to O(log N) and
  • O( ) are uniquely diagnosed

7
Algorithm Evaluation
  • Latency
  • Maximum time interval for all fault-free units
    diagnose an event
  • Normally expressed in number of testing rounds.
  • Maximum Number of Tests
  • Number of tests performed by all fault-free
    units
  • Normally expressed in number of tests per testing
    round.
  • Diagnosability
  • Limit for the number of fault units in the
    system.
  • Testing round
  • Maximum time interval needed by the fault-free
    units to finish their tests
  • Can be different for different algorithms.

8
Adaptive-DSD
  • Adaptive-DSD - Adaptive Distributed System-Level
    Diagnosis Algorithm.
  • Characteristics
  • Fully-connected systems
  • A fault-free unit performs tests until it finds
    another fault-free unit, or testing all units as
    faulty
  • Diagnositc graph is a circle when all units are
    fault-free
  • A fault-free unit is tested at most once per
    testing round
  • Testing round maximum time interval needed by
    the fault-free units to test another fault-free
    unit, or test all units as faulty.

9
Adaptive-DSD
  • Latency
  • O(N) testing rounds.
  • Maximum Number of Tests
  • O(N) tests in the worst case.
  • Diagnosability
  • (N-1)-diagnosable.

10
Hi-ADSD
  • Characteristics
  • Fully-connected system
  • Units are grouped in clusters
  • A fault-free unit performs tests until it finds
    another fault-free unit, or testing all units as
    faulty
  • Diagnostic graph is a hypercube when all units
    are fault-free
  • When a fault-free unit tests another fault-free
    unit, it gets diagnostic information from the
    tested unit about the entire cluster
  • Testing round maximum time interval needed by
    the fault-free units to test another fault-free
    unit, or test all units as faulty.

11
Hi-ADSD
  • Latency
  • Log2 N testing rounds.

1
5
4
7
2
3
12
Hi-ADSD
  • Maximum Number of Tests
  • O(N2) in the worst case.
  • Diagnosability
  • (N-1)-diagnosable.

1
5
4
7
2
3
13
Hi-ADSD with Detours
  • Characteristics
  • Fully-connected systems
  • Units are grouped in clusters
  • When a faulty unit is tested the tester uses
    detours to get the information about this blocked
    cluster, instead of perform other tests
  • Detours are alternative paths that are used when
    the original path is faulty with the same
    diagnostic distance
  • Diagnostic Distance Size of the shortest path
    between two units in the diagnostic graph
  • If the original path and the detours are faulty
    the algorithm works as the Hi-ADSD and performs
    more tests
  • Testing round Maximum time interval needed for
    the fault-free units to execute the normal test
    plus the extra tests needed.

14
Hi-ADSD with Detours
  • Latency
  • Log2 N testing rounds.
  • Maximum Number of Tests
  • N logN tests per logN testing rounds in the worst
    case
  • Worst case example No faulty units.
  • Diagnosability
  • (N-1)-diagnosable.

15
Hi-ADSD with Timestamps
  • Characteristics
  • Fully-connected systems
  • Units are grouped in clusters
  • All clusters with N/2 units
  • Units can be in more than one cluster
  • Event diagnosis
  • Diagnosis of dynamic fault and repair events
  • Timestamps
  • State changing counters of each unit of the
    system
  • Used to date the information
  • Testing round Maximum time interval needed for
    the fault-free units to execute the normal test
    plus the extra tests needed.

16
Hi-ADSD with Timestamps
  • Latency
  • Smaller than other algorithms in average
  • Log2 N testing rounds in the worst case.
  • Maximum Number of Tests
  • N logN tests per logN testing rounds in the worst
    case.
  • Diagnosability
  • (N-1)-diagnosable.

17
Broadcast Comparison
  • Will be presented latter

18
Generalized Model for Distributed Comparison
  • Tests are made through comparisons of tasks
    results
  • A unit sends a task to two units
  • After executing the task the two units send their
    outputs to the tester
  • The tester compares the outputs
  • If the comparison produces a match
  • The two tested units are considered fault-free
  • If the comparison produces a mismatch
  • The tester knows that at least one of the tested
    units is faulty, but does not know which one.

19
Generalized Model for Distributed Comparison
  • Assumptions
  • A fault-free unit comparing outputs produced by
    two fault-free units always produces a match
  • A fault-free unit comparing outputs produced by a
    faulty unit and any other unit, faulty or
    fault-free, always produces a mismatch
  • The time for a fault-free unit to produce na
    output for a task is bounded
  • After a unit is tested in a certain testing
    round, its state does not change until the end of
    the testing round.

20
Hi-Comp
  • Characteristics
  • Fully-connected systems
  • Units are grouped in clusters
  • All clusters with N/2 units
  • Units can be in more than one cluster
  • Event diagnosis
  • Diagnosis of dynamic fault and repair events
  • Together, with the output of a test, the tested
    unit sends to the tester the diagnostic
    informations
  • The tester keeps this information until it
    determines the state of the tested unit
  • If fault-free The tester updates its own
    information
  • If faulty The tester discard the information
  • Testing round Time interval need for all
    fault-free units to get diagnostic information
    about all units in the system
  • By the end of a testing round all units are
    classified as faulty or fault-free.

21
Hi-Comp
0
12
10
9
6
5
3
7
14
13
11
15
22
Hi-Comp
  • Latency
  • Log N testing rounds in the worst case.
  • Maximum Number of Tests
  • N3 testing rounds in the worst case.
  • Diagnosability
  • (N-1)-diagnosable.

23
Conclusions
  • Without a central observer
  • Some algorithms needs a fully-connected system
  • Local area networks
  • In fully-connected systems
  • At the end of a testing round the diagnosis is
    complete
  • Sometimes incorrect, depends on the number of
    testing rounds elapsed from the event.
  • Algoriths are not comparable based on the testing
    round as they use different definitions of
    testing round.
Write a Comment
User Comments (0)
About PowerShow.com