Dependability Considerations in Distributed Control Systems - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Dependability Considerations in Distributed Control Systems

Description:

Number of Views:27

Avg rating:3.0/5.0

Slides: 17

Provided by: miri78

Category:

more less

Transcript and Presenter's Notes

Title: Dependability Considerations in Distributed Control Systems

1
Dependability Considerations in Distributed
Control Systems

2
Dependability

3
Motivation

4
Research Objectives

Dependable Distributed Systems (DeDiSys) research
project with the European Union.
What are the most frequent causes of faults in
distributed control systems?
What mitigation mechanisms are available?
How to improve availability by trading it against
constraint consistency?
What is constraint consistency in control systems?

5
Reliability

Reliability, , is the probability that a
system will perform as specified for a given
period of time.
Typically exponential
Alternative measure is the mean time to failure
(MTTF/MTBF)

6
Reliability of Composed Systems

Weakest link reliability of a coupled composed
system is less than the reliability of its least
reliable constituent
Redundancy reliability of a redundant subsystem
is greater than the reliability of its most
reliable constituent

7
Maintainability and Availability

Maintainability how long it takes to repair a
system after a failure.
The measure is mean time to repair (MTTR)
Availability percentage of time the system is
actually available during periods when it should
be available.
Directly experienced by users!
Expressed in percent. In marketing, also with
number of nines(e.g., 99.999 availability ?
unavailable 7 min/year).
Example a gas station (working hours 6AM to 10PM
16 hours)
Ran out of gas at 10AM (2h)
Pump malfunction at 2PM (2h)
Availability 12h/16h 75

8
Research Methodology

9
Faults in Distributed Systems

10
Improving Hardware MTTF

11
Improving Software MTTF

Ensure that overflows of variables that
constantly increase (handle IDs, timers,
counters, ...) are properly handled.
Ensure all resources are properly released when
no longer needed (memory leaks, )
Use a managed platform (Java, .NET)
Use auto-pointers (C)
Avoid using heap storage on a per-transaction
basis (may result in memory fragmentation) e.g.,
use free-lists
Restart a process in a controllable fashion
(rejuvenation)
Isolate processes through inter-process
communication
Recovery
Recover state after a crash
Effective for host and process crashes
Automated repair

12
Decreasing MTTR

Foresee failures during design
The major difference between a thing that might
go wrong and a thing that cannot possibly go
wrong is that when a thing that cannot possibly
go wrong goes wrong it usually turns out to be
impossible to get at or repair.
Douglas Adams Mostly Harmless
Provide good diagnostics
Alarms
Detailed description of where and when an error
occurred
Logs
State-dump at failures
ADC buffers after a beam dump
Status of synchronization primitives
Memory dump
Automated fail-over
In combination with redundancy
Passive replica must have up-to-date state of the
primary copy
Fault detection (network ping, analog signal, )

13
Consistency/Availability Trade-Off
Consistency
Availability

Control systems Air-traffic control Fly-by-wire Dr
ive-by-wire
14
Constraint Consistency in Control Systems

Constraints rules that one or more objects must
satisfy, for example
If and only if serverChannel.monitors.contains(cli
ent)then client.isSubscribedTo(serverChannel)
serverChannel.value clientChannel.value
server.getFromDatabase(x) database.get(x)
If client.referencesComponent(component)then
component.isReferencedBy(client)
Can some constraints be temporarily relaxed in
presence of faults?
If so, how to reconcile the system in a
consistent state when faults are removed?

15
Future Work