Static and Dynamic Fault Diagnosis - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Static and Dynamic Fault Diagnosis

Description:

In the distributed diagnosis model there is no central controller, and all good ... Distributed diagnosis is reducible to the 'cooperative collect' problem, and can ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 34

Provided by: richard171

Learn more at: https://cis.temple.edu

Category:

more less

Transcript and Presenter's Notes

Title: Static and Dynamic Fault Diagnosis

1
Static and Dynamic Fault Diagnosis

Richard Beigel
Univ. Illinois at Chicago
and DIMACS

2
Nonstandard computing architectures

Perceptrons and small-depth circuits
Optically interconnected multiprocessors
DNA computing

Self-diagnosing Systems
3
Brief history of system-level fault diagnosis

Preparata et al 67
static, nonadaptive
Nakajima 81
static, adaptive, serial
Hakimi Nakajima 84
static, adaptive, parallel

4
Recent advances in system-level diagnosis

Distributed diagnosis
Diagnosing intermittent faults
Diagnosis with errors
Fast parallel diagnosis of static faults
Ongoing diagnosis and repair of dynamic faults

5
Fault diagnosis problem

Given n processors
a primitive by which each processor can test any
other
a reliable external controller that observes test
results
Determine which are good and which are faulty
Assume perfect communication in a complete network

6
Whats so hard about that?
Say Ah
Ha Ha!
OK, you pass
Faulty processors may give incorrect test results
7
Possible test results
8
A majority of processors must be goodfor
diagnosis to be possible
Were all good Theyre all faulty
Were all good Theyre all faulty
9
Serial diagnosis of static faults

n processors, at most t faults, t lt n/2
Nonadaptive diagnosis
n(t1) tests are necessary and sufficient
Preparata et al 67
Adaptive diagnosis
nt-1 tests are necessary and sufficient
Nakajima 81

10
Distributed diagnosis of static faults

In the distributed diagnosis model there is no
central controller, and all good processors must
learn the status of the other processors.
Distributed diagnosis is reducible to the
cooperative collect problem, and can be solved
with tests Aspnes-Hurwood 96

11
INTERMITTENT FAULTS AND ERRORS

Work in progress by Beigel and Fu

12
Intermittent faults

An intermittent fault may appear faulty in some
tests and good in others
We cannot hope to diagnose intermittent faults as
such because they might exhibit consistent
behavior in all tests
Goal correctly diagnose all other processors

13
Errors

An error is a misdiagnosis by a good processor.
Note the similarity to an intermittent fault

faulty
good
good
14
Results

In rounds, we can perform static diagnosis
assuming that a majority of the processors are
good and at most t of them are intermittently
faulty.
In rounds, we can perform static diagnosis in
the presence of errors. Assuming at most t
errors per round, the results will be within
of a correct diagnosis.

15
PARALLEL DIAGNOSIS OF STATIC FAULTS

Perform many tests simultaneously

16
Parallel diagnosis of static faults

84 Hakimi Schmeichel O(n/logn)
90 S H Otsuka Sullivan O(logn)
89 Beigel Kosaraju Sullivan O(1)
93 Beigel Margulis Spielman 32
94 Beigel Hurwood Kahale 10
best lower bound 5

17
Digraphs

tester testee
testing round directed matching

18
SHOS 90 generates a large mutual admiration
society

MAS strongly connected component with all good
edges
Either
all nodes good, or
all nodes faulty

g
g
g
g
g
g
g
g
g
g
19
SHOS 90O(logn) pairing algorithm

Pair up processors

Pair up pairs

Pair up fours

20
What about processors that dont like each other?

Build one chain for each good processor we found
(4 rounds)
Most chains must have a good processor in each
level (count!)
Total 4 1 rounds

21
Beigel-Margulis-Spielman 94

non (32 rounds)
Find several MASs of size including
at least one good MAS
Large MASs test each other and all remaining
processors in 4 rounds

constructive (84 rounds)
Find several MASs of size including
at least one good MAS
Large MASs test each other and all remaining
processors in 6 rounds

22
Expander graphs guarantee a good big MAS

In the Cayley graphs of Margulis and LPS with
p37, every n/2-node induced subgraph contains a
strong component of size
(cf Alon Chung 88, who find long paths)
degree of undirected graph 38
78 directed matchings cover graph
78 6 84 rounds

23
Random graphs guarantee a good big MAS

If G consists of 14 directed Hamiltonian paths on
n vertices then, whp, every n/2-node induced
subgraph contains a strong component of size
28 directed matchings cover graph
28 4 32 rounds

24
Beigel-Hurwood-Kahale 95 speeds up BMS 94

In k1 rounds build MASs of size
also build one chain of dont-likes
each MAS can be in simultaneous tests
Perform Gs directed matchings in 1 round
Process chain in 2 or 3 more rounds
Constructive 13 rounds. Non 10 rounds.

25
Lower boundUpper bound for smaller t

n processors, at most t faults
If 5 rounds are necessary
If 4 rounds suffice
algorithm uses lower-degree expanders

26
DIAGNOSIS AND REPAIR OF DYNAMIC FAULTS

Processors fail each round,
but algorithm may order repairs

27
Ongoing diagnosis and repair of dynamic faults

Processors may fail each round, but algorithm may
order repairs
In each round
1. perform tests
2. direct that up to t processors are repaired
3. at most t processors fail
Goal bound number of faults at all times

28
Results for n processorsat most t failures per
round

When t gt 70 and n gt 376tlogt 50t, we can
maintain n - 64tlogt - 10t good processors at all
times
This works even if the number of faults exceeds
n/2
When n 640 and t 1, we can maintain 520 good
processors at all times.

29
Whys this hard?

We cant determine the status of a chosen
processor because its testers might fail right
before we choose them

Mutual admiration societies dont work either

30
SIFT and WINNOW

SIFT finds a large set G consisting of processors
that were good when SIFT started running, and a
small set F containing some faulty processors
WINNOW uses G to diagnose most of the faulty
processors in F
Algorithm SIFT, WINNOW, repair, repeat

31
SIFT algorithm

Let r 2logt
In 2r rounds form undirected hypercubes of size
Put MASs into G, others into F
MASs must have been entirely good at start of
SIFT, and are still mostly good

32
WINNOW algorithm

Choose a processor P in F
For 2logt rounds,
test P and every processor that has tested P so
far, using testers in G
If the tests always call P faulty but dont call
any of the others faulty then we can be sure that
P really is faulty
Most old faults are diagnosed, but 4tlogt new
ones could accumulate.

33
Summary