Title: System Models, Dependability and Structuring Brian Randell
1System Models, Dependability and
StructuringBrian Randell
2Models and Structure
- One of the major uses of a model of a real system
is to facilitate understanding and analysis of
how and whether the system will work. - This presumes that the model is significantly
less complex than the system, but nevertheless
sufficiently representative of it, w.r.t. the
particular issues of concern. - Even so, there may well problems coping with the
complexity of the model, leave alone of the
modelled system. - Structuring (dividing and conquering) is the
main means of coping with the complexity of
models (and systems). - My aims to summarise a few (mainly very old)
ideas concerning system dependability, and to
reveal some (new untried) ideas concerning system
models and their structuring. - Basic Concepts and Taxonomy of Dependable and
Secure Computing, Avizienis, A., Laprie, J.-C.,
Randell, B. and Landwehr, C. IEEE Trans. on
Dependable and Secure Computing, Vol. 1, No. 1,
pp 11-33, 2004
3Dependability
- Dependability relates to the notion of a
failure. - A dependable system is one whose failures are not
unacceptably frequent or severe. - Particular types of failures (e.g. producing
wrong results, ceasing to operate, revealing
secret information, causing loss of life, etc.)
relate to what can be regarded as different
special cases of dependability reliability,
availability, confidentiality, safety,etc. - Complex real systems (e.g. of hardware, software
and people) do actually fail from time to time
(!), and reducing the frequency and severity of
their failures is a major challenge. - There are severe terminological confusions in
this field, e.g. regarding the relationship of
dependability and security, and the very concept
of failure. - from some given viewpoint
4Three Basic Concepts
- A system failure occurs when the delivered
service is regarded as deviating from fulfilling
the system function. - An error is that part of the system state which
is liable to lead to subsequent failure an error
affecting the service is an indication that a
failure occurs or has occurred. The adjudged or
hypothesised cause of an error is a fault. - (Note errors do not necessarily lead to failures
this may be avoided by chance or design
component failures do not necessarily constitute
faults to the surrounding system this depends
on how the surrounding system is relying on the
component). - These three concepts (an event, a state, and a
cause) must be distinguished, whatever names you
choose to use for them. - Identifying failures and errors, as well as
faults, involves judgement.
5The Failure/Fault/Error Chain
- A failure occurs when an error passes through
the system-user interface and affects the service
delivered by the system a system of course
being composed of components which are themselves
systems. This failure may be significant, and
thus constitute a fault, to the enclosing system.
Thus the manifestation of failures, faults and
errors follows a fundamental chain - . . . ? failure ? fault ? error ? failure ? fault
?. . . - i.e.
- . . . ? event ? cause ? state ? event ??cause ? .
. . - This chain can flow from one system to
- another system that it is interacting with.
- the system which it is part of.
- a system which it creates or sustains.
- Typically, a failure will be found to be due to
multiple co-incident faults, e.g. the activity of
a hacker exploiting a bug left by a programmer.
6System Failures
- Identifying failures (and hence errors and
faults), even understanding the concepts, is
difficult when - there can be uncertainties about system
boundaries. - the very complexity of the systems (and of any
specifications) is often a major difficulty. - the determination of possible causes or
consequences of failure can be a very subtle
process. - any provisions for preventing faults from causing
failures may themselves be fallible. - Attempting to enumerate a systems possible
failures beforehand is normally impracticable. - Instead, one can appeal to the notion of a
judgemental system.
7Systems Come in Threes!
- The environment of a system is the wider system
that it affects (by its correct functioning, and
by its failures), and that it is affected by. - What constitutes correct (failure-free)
functioning might be implied by a system
specification assuming that this exists, and is
complete, accurate and agreed. (Often the
specification is part of the problem!) - However, in principle a third system, a
judgemental system, is involved in determining
whether any particular activity (or inactivity)
of a system in a given environment constitutes or
would constitute a failure. - The judgemental system and the environmental
system might be one and the same, and the
judgement might be instant or delayed. - The judgemental system might itself fail as
judged by some yet higher system and different
judges, or the same judge at different times,
might come to different judgements.
8Judgemental Systems
- This term is deliberately broad it covers from
built-in failure detector circuits to the
retrospective activities of a court of enquiry
(just as the term system is meant to range from
simple hardware devices to complex computer-based
systems, composed of h/w, s/w people). - Thus the judging activity may be clear-cut and
automatic, or essentially subjective though
even in the latter case a degree of
predictability is essential, otherwise the system
designers task would be impossible - The judgement is an action by a system, and so
can in principle fail either positively or
negatively. - This possibility is allowed for in the legal
system, hence the concept of crown courts, appeal
courts, etc. - As appropriate, judgemental systems should use
evidence concerning the alleged failure, any
prior contractual agreements and system
specifications, certification records, government
guidelines, advice from regulators, prior
practice, common sense, etc., etc.
9A Role for (Formal) Modelling?
- If one could express even some of these ideas in
a formal notation, one might facilitate - the analysis of system failures.
- the analysis and design of (fault-tolerant)
systems themselves. - The notation Ive been experimenting with is that
of Occurrence Graphs (aka Causal Nets). - OGs represent what (allegedly) happened, or might
happen, and why, in a system they model system
behaviour, not actual systems. - Simple graphs can be shown pictorially.
- They can be expressed algebraically, and have a
formal semantics. - Tools exist for their analysis and manipulation
and even for synthesizing systems from them (in
simple cases). - My thought experiments have concerned what might
be called Structured Occurrence Graphs. - Their structure results from notions like the
fundamental F-E-F chain. - This structure provides complexity reduction, and
so could facilitate (automated) failure analyses,
or possibly system synthesis.
10(Structured) Occurrence Graph Notation
The (simple and perhaps new) idea is to introduce
various types of (formal) relations between OGs,
and treat a set of such related OGs as a
Structured OG.
11Concept Minimization
- The dependability and security definitions aim to
identify and define a minimum set of basic
concepts (or nouns), such as fault, error and
failure and then elaborate on these using
adjectives, such as transient, internal,
catastrophic, etc. - One major problem in such work is to avoid
circular definitions, and to minimise the number
pre-existing definitions used. - (We used to use the word reliance in the
definition of dependability a regrettable
near-circularity.) - We thus chose to take as the basic starting point
conventional dictionary definitions for just
system, judgement and state. - The work on occurrence graphs was sparked by the
belated realisation that the concepts of system
and state were not separate, but just a
question of abstraction. - In fact (separate) OGs can use the same symbol
a place to represent both systems and states
12Systems their Behaviour
The markings in the conditions in the lower OG
are in effect colourings which identify the
system concerned here they thus relate states
to systems.
13System Updating
The relations between the upper and lower graphs
show which state sequences are associated with
the systems before they were modified, and which
with the modified systems.
14Dynamic System Evolution
This shows the history of an online modification
of some systems, in which the modified systems
continue on from the states that had been reached
by the original systems.
15System Creation Evolution
This shows some of the earlier history of the two
systems, i.e. that system 1 created system 2, and
that both then went through some independent
further evolution.
16(Retrospective) Judgement
Analysis of a Structured OG typically involves
following causal arrows (possibly in both
directions) within OGs, and relations between OGs.
17Concluding Remarks (Hopes!)
- Possibilities for taking advantage of the
complexity-reduction provided by an SOGs
structuring include - A judgemental system, having identified some
system event as a failure, could analyze records
forming a Structured OG in an attempt to identify
(i) the fault(s) that should be blamed for the
failure, and/or (ii) the erroneous states that
could and should be corrected or compensated for. - (This could be just a way of describing
(semi)formally what is often currently done by
expert investigators in the aftermath of a major
system failure. However, one can imagine
attempting to automate the recording and analysis
using extensions of existing tools of actual
occurrence graphs.) - Structured OGs might be usable for modelling
complex system behaviour prior to system
deployment, so as to facilitate the use of some
form of automated model-checking in order to
verify at least some aspects of the design of the
system(s). - In principle (again largely using existing tools)
one could even synthesize a system from such a
checked Structured OG.